1:mod:`tokenize` --- Tokenizer for Python source 2=============================================== 3 4.. module:: tokenize 5 :synopsis: Lexical scanner for Python source code. 6.. moduleauthor:: Ka Ping Yee 7.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> 8 9**Source code:** :source:`Lib/tokenize.py` 10 11-------------- 12 13The :mod:`tokenize` module provides a lexical scanner for Python source code, 14implemented in Python. The scanner in this module returns comments as tokens as 15well, making it useful for implementing "pretty-printers," including colorizers 16for on-screen displays. 17 18To simplify token stream handling, all :ref:`operators` and :ref:`delimiters` 19tokens are returned using the generic :data:`token.OP` token type. The exact 20type can be determined by checking the second field (containing the actual 21token string matched) of the tuple returned from 22:func:`tokenize.generate_tokens` for the character sequence that identifies a 23specific operator token. 24 25The primary entry point is a :term:`generator`: 26 27.. function:: generate_tokens(readline) 28 29 The :func:`generate_tokens` generator requires one argument, *readline*, 30 which must be a callable object which provides the same interface as the 31 :meth:`~file.readline` method of built-in file objects (see section 32 :ref:`bltin-file-objects`). Each call to the function should return one line 33 of input as a string. Alternately, *readline* may be a callable object that 34 signals completion by raising :exc:`StopIteration`. 35 36 The generator produces 5-tuples with these members: the token type; the token 37 string; a 2-tuple ``(srow, scol)`` of ints specifying the row and column 38 where the token begins in the source; a 2-tuple ``(erow, ecol)`` of ints 39 specifying the row and column where the token ends in the source; and the 40 line on which the token was found. The line passed (the last tuple item) is 41 the *logical* line; continuation lines are included. 42 43 .. versionadded:: 2.2 44 45An older entry point is retained for backward compatibility: 46 47 48.. function:: tokenize(readline[, tokeneater]) 49 50 The :func:`.tokenize` function accepts two parameters: one representing the input 51 stream, and one providing an output mechanism for :func:`.tokenize`. 52 53 The first parameter, *readline*, must be a callable object which provides the 54 same interface as the :meth:`~file.readline` method of built-in file objects (see 55 section :ref:`bltin-file-objects`). Each call to the function should return one 56 line of input as a string. Alternately, *readline* may be a callable object that 57 signals completion by raising :exc:`StopIteration`. 58 59 .. versionchanged:: 2.5 60 Added :exc:`StopIteration` support. 61 62 The second parameter, *tokeneater*, must also be a callable object. It is 63 called once for each token, with five arguments, corresponding to the tuples 64 generated by :func:`generate_tokens`. 65 66All constants from the :mod:`token` module are also exported from 67:mod:`tokenize`, as are two additional token type values that might be passed to 68the *tokeneater* function by :func:`.tokenize`: 69 70 71.. data:: COMMENT 72 73 Token value used to indicate a comment. 74 75 76.. data:: NL 77 78 Token value used to indicate a non-terminating newline. The NEWLINE token 79 indicates the end of a logical line of Python code; NL tokens are generated when 80 a logical line of code is continued over multiple physical lines. 81 82Another function is provided to reverse the tokenization process. This is useful 83for creating tools that tokenize a script, modify the token stream, and write 84back the modified script. 85 86 87.. function:: untokenize(iterable) 88 89 Converts tokens back into Python source code. The *iterable* must return 90 sequences with at least two elements, the token type and the token string. Any 91 additional sequence elements are ignored. 92 93 The reconstructed script is returned as a single string. The result is 94 guaranteed to tokenize back to match the input so that the conversion is 95 lossless and round-trips are assured. The guarantee applies only to the token 96 type and token string as the spacing between tokens (column positions) may 97 change. 98 99 .. versionadded:: 2.5 100 101.. exception:: TokenError 102 103 Raised when either a docstring or expression that may be split over several 104 lines is not completed anywhere in the file, for example:: 105 106 """Beginning of 107 docstring 108 109 or:: 110 111 [1, 112 2, 113 3 114 115Note that unclosed single-quoted strings do not cause an error to be 116raised. They are tokenized as ``ERRORTOKEN``, followed by the tokenization of 117their contents. 118 119Example of a script re-writer that transforms float literals into Decimal 120objects:: 121 122 def decistmt(s): 123 """Substitute Decimals for floats in a string of statements. 124 125 >>> from decimal import Decimal 126 >>> s = 'print +21.3e-5*-.1234/81.7' 127 >>> decistmt(s) 128 "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')" 129 130 >>> exec(s) 131 -3.21716034272e-007 132 >>> exec(decistmt(s)) 133 -3.217160342717258261933904529E-7 134 135 """ 136 result = [] 137 g = generate_tokens(StringIO(s).readline) # tokenize the string 138 for toknum, tokval, _, _, _ in g: 139 if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens 140 result.extend([ 141 (NAME, 'Decimal'), 142 (OP, '('), 143 (STRING, repr(tokval)), 144 (OP, ')') 145 ]) 146 else: 147 result.append((toknum, tokval)) 148 return untokenize(result) 149 150