A wrapper around the stdlib `tokenize` which roundtrips.
Project Links
Meta
Author: Anthony Sottile
Requires Python: >=3.9
Classifiers
Programming Language
- Python :: 3
- Python :: 3 :: Only
- Python :: Implementation :: CPython
- Python :: Implementation :: PyPy
tokenize-rt
The stdlib tokenize module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL and
UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and
tokens_to_src to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
Installation
pip install tokenize-rt
Usage
datastructures
tokenize_rt.Offset(line=None, utf8_byte_offset=None)
A token offset, useful as a key when cross referencing the ast and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)
Construct a token
name: one of the token names listed intoken.tok_nameorESCAPED_NLorUNIMPORTANT_WSsrc: token's source as textline: the line number that this token appears on.utf8_byte_offset: the utf8 byte offset that this token appears on in the line.
tokenize_rt.Token.offset
Retrieves an Offset for this token.
converting to and from Token representations
tokenize_rt.src_to_tokens(text: str) -> List[Token]
tokenize_rt.tokens_to_src(Iterable[Token]) -> str
additional tokens added by tokenize-rt
tokenize_rt.ESCAPED_NL
tokenize_rt.UNIMPORTANT_WS
helpers
tokenize_rt.NON_CODING_TOKENS
A frozenset containing tokens which may appear between others while not
affecting control flow or code:
COMMENTESCAPED_NLNLUNIMPORTANT_WS
tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]
yields (index, token) pairs. Useful for rewriting source.
tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]
find the indices of the string parts of a (joined) string literal
ishould start at the end of the string literal- returns
()(an empty tuple) for things which are not string literals
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
Differences from tokenize
tokenize-rtaddsESCAPED_NLfor a backslash-escaped newline "token"tokenize-rtaddsUNIMPORTANT_WSfor whitespace (discarded intokenize)tokenize-rtnormalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)even in python 2.tokenize-rtnormalizes python 2 long literals (4l/4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).
Sample usage
6.2.0
May 23, 2025
6.1.0
Oct 22, 2024
6.0.0
Aug 04, 2024
5.2.0
Jul 30, 2023
5.1.0
Jun 10, 2023
5.0.0
Oct 03, 2022
4.2.1
Oct 21, 2021
4.2.0
Oct 21, 2021
4.1.0
Jan 26, 2021
4.0.0
Feb 28, 2020
3.2.0
Jul 07, 2019
3.1.0
Jul 05, 2019
3.0.1
Jun 16, 2019
3.0.0
Jun 16, 2019
2.2.0
Feb 28, 2019
2.1.0
Oct 07, 2018
2.0.1
Jul 26, 2017
2.0.0
Jul 14, 2017
1.0.0
Jun 02, 2017
Wheel compatibility matrix
Files in release
No dependencies