• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2.. _lexical:
3
4****************
5Lexical analysis
6****************
7
8.. index:: lexical analysis, parser, token
9
10A Python program is read by a *parser*.  Input to the parser is a stream of
11*tokens*, generated by the *lexical analyzer*.  This chapter describes how the
12lexical analyzer breaks a file into tokens.
13
14Python reads program text as Unicode code points; the encoding of a source file
15can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
16for details.  If the source file cannot be decoded, a :exc:`SyntaxError` is
17raised.
18
19
20.. _line-structure:
21
22Line structure
23==============
24
25.. index:: line structure
26
27A Python program is divided into a number of *logical lines*.
28
29
30.. _logical-lines:
31
32Logical lines
33-------------
34
35.. index:: logical line, physical line, line joining, NEWLINE token
36
37The end of a logical line is represented by the token NEWLINE.  Statements
38cannot cross logical line boundaries except where NEWLINE is allowed by the
39syntax (e.g., between statements in compound statements). A logical line is
40constructed from one or more *physical lines* by following the explicit or
41implicit *line joining* rules.
42
43
44.. _physical-lines:
45
46Physical lines
47--------------
48
49A physical line is a sequence of characters terminated by an end-of-line
50sequence.  In source files and strings, any of the standard platform line
51termination sequences can be used - the Unix form using ASCII LF (linefeed),
52the Windows form using the ASCII sequence CR LF (return followed by linefeed),
53or the old Macintosh form using the ASCII CR (return) character.  All of these
54forms can be used equally, regardless of platform. The end of input also serves
55as an implicit terminator for the final physical line.
56
57When embedding Python, source code strings should be passed to Python APIs using
58the standard C conventions for newline characters (the ``\n`` character,
59representing ASCII LF, is the line terminator).
60
61
62.. _comments:
63
64Comments
65--------
66
67.. index:: comment, hash character
68   single: # (hash); comment
69
70A comment starts with a hash character (``#``) that is not part of a string
71literal, and ends at the end of the physical line.  A comment signifies the end
72of the logical line unless the implicit line joining rules are invoked. Comments
73are ignored by the syntax.
74
75
76.. _encodings:
77
78Encoding declarations
79---------------------
80
81.. index:: source character set, encoding declarations (source file)
82   single: # (hash); source encoding declaration
83
84If a comment in the first or second line of the Python script matches the
85regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an
86encoding declaration; the first group of this expression names the encoding of
87the source code file. The encoding declaration must appear on a line of its
88own. If it is the second line, the first line must also be a comment-only line.
89The recommended forms of an encoding expression are ::
90
91   # -*- coding: <encoding-name> -*-
92
93which is recognized also by GNU Emacs, and ::
94
95   # vim:fileencoding=<encoding-name>
96
97which is recognized by Bram Moolenaar's VIM.
98
99If no encoding declaration is found, the default encoding is UTF-8.  If the
100implicit or explicit encoding of a file is UTF-8, an initial UTF-8 byte-order
101mark (b'\xef\xbb\xbf') is ignored rather than being a syntax error.
102
103If an encoding is declared, the encoding name must be recognized by Python
104(see :ref:`standard-encodings`). The
105encoding is used for all lexical analysis, including string literals, comments
106and identifiers.
107
108
109.. _explicit-joining:
110
111Explicit line joining
112---------------------
113
114.. index:: physical line, line joining, line continuation, backslash character
115
116Two or more physical lines may be joined into logical lines using backslash
117characters (``\``), as follows: when a physical line ends in a backslash that is
118not part of a string literal or comment, it is joined with the following forming
119a single logical line, deleting the backslash and the following end-of-line
120character.  For example::
121
122   if 1900 < year < 2100 and 1 <= month <= 12 \
123      and 1 <= day <= 31 and 0 <= hour < 24 \
124      and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
125           return 1
126
127A line ending in a backslash cannot carry a comment.  A backslash does not
128continue a comment.  A backslash does not continue a token except for string
129literals (i.e., tokens other than string literals cannot be split across
130physical lines using a backslash).  A backslash is illegal elsewhere on a line
131outside a string literal.
132
133
134.. _implicit-joining:
135
136Implicit line joining
137---------------------
138
139Expressions in parentheses, square brackets or curly braces can be split over
140more than one physical line without using backslashes. For example::
141
142   month_names = ['Januari', 'Februari', 'Maart',      # These are the
143                  'April',   'Mei',      'Juni',       # Dutch names
144                  'Juli',    'Augustus', 'September',  # for the months
145                  'Oktober', 'November', 'December']   # of the year
146
147Implicitly continued lines can carry comments.  The indentation of the
148continuation lines is not important.  Blank continuation lines are allowed.
149There is no NEWLINE token between implicit continuation lines.  Implicitly
150continued lines can also occur within triple-quoted strings (see below); in that
151case they cannot carry comments.
152
153
154.. _blank-lines:
155
156Blank lines
157-----------
158
159.. index:: single: blank line
160
161A logical line that contains only spaces, tabs, formfeeds and possibly a
162comment, is ignored (i.e., no NEWLINE token is generated).  During interactive
163input of statements, handling of a blank line may differ depending on the
164implementation of the read-eval-print loop.  In the standard interactive
165interpreter, an entirely blank logical line (i.e. one containing not even
166whitespace or a comment) terminates a multi-line statement.
167
168
169.. _indentation:
170
171Indentation
172-----------
173
174.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping
175
176Leading whitespace (spaces and tabs) at the beginning of a logical line is used
177to compute the indentation level of the line, which in turn is used to determine
178the grouping of statements.
179
180Tabs are replaced (from left to right) by one to eight spaces such that the
181total number of characters up to and including the replacement is a multiple of
182eight (this is intended to be the same rule as used by Unix).  The total number
183of spaces preceding the first non-blank character then determines the line's
184indentation.  Indentation cannot be split over multiple physical lines using
185backslashes; the whitespace up to the first backslash determines the
186indentation.
187
188Indentation is rejected as inconsistent if a source file mixes tabs and spaces
189in a way that makes the meaning dependent on the worth of a tab in spaces; a
190:exc:`TabError` is raised in that case.
191
192**Cross-platform compatibility note:** because of the nature of text editors on
193non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the
194indentation in a single source file.  It should also be noted that different
195platforms may explicitly limit the maximum indentation level.
196
197A formfeed character may be present at the start of the line; it will be ignored
198for the indentation calculations above.  Formfeed characters occurring elsewhere
199in the leading whitespace have an undefined effect (for instance, they may reset
200the space count to zero).
201
202.. index:: INDENT token, DEDENT token
203
204The indentation levels of consecutive lines are used to generate INDENT and
205DEDENT tokens, using a stack, as follows.
206
207Before the first line of the file is read, a single zero is pushed on the stack;
208this will never be popped off again.  The numbers pushed on the stack will
209always be strictly increasing from bottom to top.  At the beginning of each
210logical line, the line's indentation level is compared to the top of the stack.
211If it is equal, nothing happens. If it is larger, it is pushed on the stack, and
212one INDENT token is generated.  If it is smaller, it *must* be one of the
213numbers occurring on the stack; all numbers on the stack that are larger are
214popped off, and for each number popped off a DEDENT token is generated.  At the
215end of the file, a DEDENT token is generated for each number remaining on the
216stack that is larger than zero.
217
218Here is an example of a correctly (though confusingly) indented piece of Python
219code::
220
221   def perm(l):
222           # Compute the list of all permutations of l
223       if len(l) <= 1:
224                     return [l]
225       r = []
226       for i in range(len(l)):
227                s = l[:i] + l[i+1:]
228                p = perm(s)
229                for x in p:
230                 r.append(l[i:i+1] + x)
231       return r
232
233The following example shows various indentation errors::
234
235    def perm(l):                       # error: first line indented
236   for i in range(len(l)):             # error: not indented
237       s = l[:i] + l[i+1:]
238           p = perm(l[:i] + l[i+1:])   # error: unexpected indent
239           for x in p:
240                   r.append(l[i:i+1] + x)
241               return r                # error: inconsistent dedent
242
243(Actually, the first three errors are detected by the parser; only the last
244error is found by the lexical analyzer --- the indentation of ``return r`` does
245not match a level popped off the stack.)
246
247
248.. _whitespace:
249
250Whitespace between tokens
251-------------------------
252
253Except at the beginning of a logical line or in string literals, the whitespace
254characters space, tab and formfeed can be used interchangeably to separate
255tokens.  Whitespace is needed between two tokens only if their concatenation
256could otherwise be interpreted as a different token (e.g., ab is one token, but
257a b is two tokens).
258
259
260.. _other-tokens:
261
262Other tokens
263============
264
265Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist:
266*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace
267characters (other than line terminators, discussed earlier) are not tokens, but
268serve to delimit tokens. Where ambiguity exists, a token comprises the longest
269possible string that forms a legal token, when read from left to right.
270
271
272.. _identifiers:
273
274Identifiers and keywords
275========================
276
277.. index:: identifier, name
278
279Identifiers (also referred to as *names*) are described by the following lexical
280definitions.
281
282The syntax of identifiers in Python is based on the Unicode standard annex
283UAX-31, with elaboration and changes as defined below; see also :pep:`3131` for
284further details.
285
286Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
287include the uppercase and lowercase letters ``A`` through
288``Z``, the underscore ``_`` and, except for the first character, the digits
289``0`` through ``9``.
290Python 3.0 introduced additional characters from outside the ASCII range (see
291:pep:`3131`).  For these characters, the classification uses the version of the
292Unicode Character Database as included in the :mod:`unicodedata` module.
293
294Identifiers are unlimited in length.  Case is significant.
295
296.. productionlist:: python-grammar
297   identifier: `xid_start` `xid_continue`*
298   id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
299   id_continue: <all characters in `id_start`, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
300   xid_start: <all characters in `id_start` whose NFKC normalization is in "id_start xid_continue*">
301   xid_continue: <all characters in `id_continue` whose NFKC normalization is in "id_continue*">
302
303The Unicode category codes mentioned above stand for:
304
305* *Lu* - uppercase letters
306* *Ll* - lowercase letters
307* *Lt* - titlecase letters
308* *Lm* - modifier letters
309* *Lo* - other letters
310* *Nl* - letter numbers
311* *Mn* - nonspacing marks
312* *Mc* - spacing combining marks
313* *Nd* - decimal numbers
314* *Pc* - connector punctuations
315* *Other_ID_Start* - explicit list of characters in `PropList.txt
316  <https://www.unicode.org/Public/15.1.0/ucd/PropList.txt>`_ to support backwards
317  compatibility
318* *Other_ID_Continue* - likewise
319
320All identifiers are converted into the normal form NFKC while parsing; comparison
321of identifiers is based on NFKC.
322
323A non-normative HTML file listing all valid identifier characters for Unicode
32415.1.0 can be found at
325https://www.unicode.org/Public/15.1.0/ucd/DerivedCoreProperties.txt
326
327
328.. _keywords:
329
330Keywords
331--------
332
333.. index::
334   single: keyword
335   single: reserved word
336
337The following identifiers are used as reserved words, or *keywords* of the
338language, and cannot be used as ordinary identifiers.  They must be spelled
339exactly as written here:
340
341.. sourcecode:: text
342
343   False      await      else       import     pass
344   None       break      except     in         raise
345   True       class      finally    is         return
346   and        continue   for        lambda     try
347   as         def        from       nonlocal   while
348   assert     del        global     not        with
349   async      elif       if         or         yield
350
351
352.. _soft-keywords:
353
354Soft Keywords
355-------------
356
357.. index:: soft keyword, keyword
358
359.. versionadded:: 3.10
360
361Some identifiers are only reserved under specific contexts. These are known as
362*soft keywords*.  The identifiers ``match``, ``case``, ``type`` and ``_`` can
363syntactically act as keywords in certain contexts,
364but this distinction is done at the parser level, not when tokenizing.
365
366As soft keywords, their use in the grammar is possible while still
367preserving compatibility with existing code that uses these names as
368identifier names.
369
370``match``, ``case``, and ``_`` are used in the :keyword:`match` statement.
371``type`` is used in the :keyword:`type` statement.
372
373.. versionchanged:: 3.12
374   ``type`` is now a soft keyword.
375
376.. index::
377   single: _, identifiers
378   single: __, identifiers
379.. _id-classes:
380
381Reserved classes of identifiers
382-------------------------------
383
384Certain classes of identifiers (besides keywords) have special meanings.  These
385classes are identified by the patterns of leading and trailing underscore
386characters:
387
388``_*``
389   Not imported by ``from module import *``.
390
391``_``
392   In a ``case`` pattern within a :keyword:`match` statement, ``_`` is a
393   :ref:`soft keyword <soft-keywords>` that denotes a
394   :ref:`wildcard <wildcard-patterns>`.
395
396   Separately, the interactive interpreter makes the result of the last evaluation
397   available in the variable ``_``.
398   (It is stored in the :mod:`builtins` module, alongside built-in
399   functions like ``print``.)
400
401   Elsewhere, ``_`` is a regular identifier. It is often used to name
402   "special" items, but it is not special to Python itself.
403
404   .. note::
405
406      The name ``_`` is often used in conjunction with internationalization;
407      refer to the documentation for the :mod:`gettext` module for more
408      information on this convention.
409
410      It is also commonly used for unused variables.
411
412``__*__``
413   System-defined names, informally known as "dunder" names. These names are
414   defined by the interpreter and its implementation (including the standard library).
415   Current system names are discussed in the :ref:`specialnames` section and elsewhere.
416   More will likely be defined in future versions of Python.  *Any* use of ``__*__`` names,
417   in any context, that does not follow explicitly documented use, is subject to
418   breakage without warning.
419
420``__*``
421   Class-private names.  Names in this category, when used within the context of a
422   class definition, are re-written to use a mangled form to help avoid name
423   clashes between "private" attributes of base and derived classes. See section
424   :ref:`atom-identifiers`.
425
426
427.. _literals:
428
429Literals
430========
431
432.. index:: literal, constant
433
434Literals are notations for constant values of some built-in types.
435
436
437.. index:: string literal, bytes literal, ASCII
438   single: ' (single quote); string literal
439   single: " (double quote); string literal
440   single: u'; string literal
441   single: u"; string literal
442.. _strings:
443
444String and Bytes literals
445-------------------------
446
447String literals are described by the following lexical definitions:
448
449.. productionlist:: python-grammar
450   stringliteral: [`stringprefix`](`shortstring` | `longstring`)
451   stringprefix: "r" | "u" | "R" | "U" | "f" | "F"
452               : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
453   shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
454   longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
455   shortstringitem: `shortstringchar` | `stringescapeseq`
456   longstringitem: `longstringchar` | `stringescapeseq`
457   shortstringchar: <any source character except "\" or newline or the quote>
458   longstringchar: <any source character except "\">
459   stringescapeseq: "\" <any source character>
460
461.. productionlist:: python-grammar
462   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
463   bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
464   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
465   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
466   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
467   longbytesitem: `longbyteschar` | `bytesescapeseq`
468   shortbyteschar: <any ASCII character except "\" or newline or the quote>
469   longbyteschar: <any ASCII character except "\">
470   bytesescapeseq: "\" <any ASCII character>
471
472One syntactic restriction not indicated by these productions is that whitespace
473is not allowed between the :token:`~python-grammar:stringprefix` or
474:token:`~python-grammar:bytesprefix` and the rest of the literal. The source
475character set is defined by the encoding declaration; it is UTF-8 if no encoding
476declaration is given in the source file; see section :ref:`encodings`.
477
478.. index:: triple-quoted string, Unicode Consortium, raw string
479   single: """; string literal
480   single: '''; string literal
481
482In plain English: Both types of literals can be enclosed in matching single quotes
483(``'``) or double quotes (``"``).  They can also be enclosed in matching groups
484of three single or double quotes (these are generally referred to as
485*triple-quoted strings*). The backslash (``\``) character is used to give special
486meaning to otherwise ordinary characters like ``n``, which means 'newline' when
487escaped (``\n``). It can also be used to escape characters that otherwise have a
488special meaning, such as newline, backslash itself, or the quote character.
489See :ref:`escape sequences <escape-sequences>` below for examples.
490
491.. index::
492   single: b'; bytes literal
493   single: b"; bytes literal
494
495Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
496instance of the :class:`bytes` type instead of the :class:`str` type.  They
497may only contain ASCII characters; bytes with a numeric value of 128 or greater
498must be expressed with escapes.
499
500.. index::
501   single: r'; raw string literal
502   single: r"; raw string literal
503
504Both string and bytes literals may optionally be prefixed with a letter ``'r'``
505or ``'R'``; such constructs are called :dfn:`raw string literals`
506and :dfn:`raw bytes literals` respectively and treat backslashes as
507literal characters.  As a result, in raw string literals, ``'\U'`` and ``'\u'``
508escapes are not treated specially.
509
510.. versionadded:: 3.3
511   The ``'rb'`` prefix of raw bytes literals has been added as a synonym
512   of ``'br'``.
513
514   Support for the unicode legacy literal (``u'value'``) was reintroduced
515   to simplify the maintenance of dual Python 2.x and 3.x codebases.
516   See :pep:`414` for more information.
517
518.. index::
519   single: f'; formatted string literal
520   single: f"; formatted string literal
521
522A string literal with ``'f'`` or ``'F'`` in its prefix is a
523:dfn:`formatted string literal`; see :ref:`f-strings`.  The ``'f'`` may be
524combined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw
525formatted strings are possible, but formatted bytes literals are not.
526
527In triple-quoted literals, unescaped newlines and quotes are allowed (and are
528retained), except that three unescaped quotes in a row terminate the literal.  (A
529"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.)
530
531.. index:: physical line, escape sequence, Standard C, C
532   single: \ (backslash); escape sequence
533   single: \\; escape sequence
534   single: \a; escape sequence
535   single: \b; escape sequence
536   single: \f; escape sequence
537   single: \n; escape sequence
538   single: \r; escape sequence
539   single: \t; escape sequence
540   single: \v; escape sequence
541   single: \x; escape sequence
542   single: \N; escape sequence
543   single: \u; escape sequence
544   single: \U; escape sequence
545
546.. _escape-sequences:
547
548
549Escape sequences
550^^^^^^^^^^^^^^^^
551
552Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and
553bytes literals are interpreted according to rules similar to those used by
554Standard C.  The recognized escape sequences are:
555
556+-------------------------+---------------------------------+-------+
557| Escape Sequence         | Meaning                         | Notes |
558+=========================+=================================+=======+
559| ``\``\ <newline>        | Backslash and newline ignored   | \(1)  |
560+-------------------------+---------------------------------+-------+
561| ``\\``                  | Backslash (``\``)               |       |
562+-------------------------+---------------------------------+-------+
563| ``\'``                  | Single quote (``'``)            |       |
564+-------------------------+---------------------------------+-------+
565| ``\"``                  | Double quote (``"``)            |       |
566+-------------------------+---------------------------------+-------+
567| ``\a``                  | ASCII Bell (BEL)                |       |
568+-------------------------+---------------------------------+-------+
569| ``\b``                  | ASCII Backspace (BS)            |       |
570+-------------------------+---------------------------------+-------+
571| ``\f``                  | ASCII Formfeed (FF)             |       |
572+-------------------------+---------------------------------+-------+
573| ``\n``                  | ASCII Linefeed (LF)             |       |
574+-------------------------+---------------------------------+-------+
575| ``\r``                  | ASCII Carriage Return (CR)      |       |
576+-------------------------+---------------------------------+-------+
577| ``\t``                  | ASCII Horizontal Tab (TAB)      |       |
578+-------------------------+---------------------------------+-------+
579| ``\v``                  | ASCII Vertical Tab (VT)         |       |
580+-------------------------+---------------------------------+-------+
581| :samp:`\\\\{ooo}`       | Character with octal value      | (2,4) |
582|                         | *ooo*                           |       |
583+-------------------------+---------------------------------+-------+
584| :samp:`\\x{hh}`         | Character with hex value *hh*   | (3,4) |
585+-------------------------+---------------------------------+-------+
586
587Escape sequences only recognized in string literals are:
588
589+-------------------------+---------------------------------+-------+
590| Escape Sequence         | Meaning                         | Notes |
591+=========================+=================================+=======+
592| :samp:`\\N\\{{name}\\}` | Character named *name* in the   | \(5)  |
593|                         | Unicode database                |       |
594+-------------------------+---------------------------------+-------+
595| :samp:`\\u{xxxx}`       | Character with 16-bit hex value | \(6)  |
596|                         | *xxxx*                          |       |
597+-------------------------+---------------------------------+-------+
598| :samp:`\\U{xxxxxxxx}`   | Character with 32-bit hex value | \(7)  |
599|                         | *xxxxxxxx*                      |       |
600+-------------------------+---------------------------------+-------+
601
602Notes:
603
604(1)
605   A backslash can be added at the end of a line to ignore the newline::
606
607      >>> 'This string will not include \
608      ... backslashes or newline characters.'
609      'This string will not include backslashes or newline characters.'
610
611   The same result can be achieved using :ref:`triple-quoted strings <strings>`,
612   or parentheses and :ref:`string literal concatenation <string-concatenation>`.
613
614
615(2)
616   As in Standard C, up to three octal digits are accepted.
617
618   .. versionchanged:: 3.11
619      Octal escapes with value larger than ``0o377`` produce a
620      :exc:`DeprecationWarning`.
621
622   .. versionchanged:: 3.12
623      Octal escapes with value larger than ``0o377`` produce a
624      :exc:`SyntaxWarning`. In a future Python version they will be eventually
625      a :exc:`SyntaxError`.
626
627(3)
628   Unlike in Standard C, exactly two hex digits are required.
629
630(4)
631   In a bytes literal, hexadecimal and octal escapes denote the byte with the
632   given value. In a string literal, these escapes denote a Unicode character
633   with the given value.
634
635(5)
636   .. versionchanged:: 3.3
637      Support for name aliases [#]_ has been added.
638
639(6)
640   Exactly four hex digits are required.
641
642(7)
643   Any Unicode character can be encoded this way.  Exactly eight hex digits
644   are required.
645
646
647.. index:: unrecognized escape sequence
648
649Unlike Standard C, all unrecognized escape sequences are left in the string
650unchanged, i.e., *the backslash is left in the result*.  (This behavior is
651useful when debugging: if an escape sequence is mistyped, the resulting output
652is more easily recognized as broken.)  It is also important to note that the
653escape sequences only recognized in string literals fall into the category of
654unrecognized escapes for bytes literals.
655
656.. versionchanged:: 3.6
657   Unrecognized escape sequences produce a :exc:`DeprecationWarning`.
658
659.. versionchanged:: 3.12
660   Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future
661   Python version they will be eventually a :exc:`SyntaxError`.
662
663Even in a raw literal, quotes can be escaped with a backslash, but the
664backslash remains in the result; for example, ``r"\""`` is a valid string
665literal consisting of two characters: a backslash and a double quote; ``r"\"``
666is not a valid string literal (even a raw string cannot end in an odd number of
667backslashes).  Specifically, *a raw literal cannot end in a single backslash*
668(since the backslash would escape the following quote character).  Note also
669that a single backslash followed by a newline is interpreted as those two
670characters as part of the literal, *not* as a line continuation.
671
672
673.. _string-concatenation:
674
675String literal concatenation
676----------------------------
677
678Multiple adjacent string or bytes literals (delimited by whitespace), possibly
679using different quoting conventions, are allowed, and their meaning is the same
680as their concatenation.  Thus, ``"hello" 'world'`` is equivalent to
681``"helloworld"``.  This feature can be used to reduce the number of backslashes
682needed, to split long strings conveniently across long lines, or even to add
683comments to parts of strings, for example::
684
685   re.compile("[A-Za-z_]"       # letter or underscore
686              "[A-Za-z0-9_]*"   # letter, digit or underscore
687             )
688
689Note that this feature is defined at the syntactical level, but implemented at
690compile time.  The '+' operator must be used to concatenate string expressions
691at run time.  Also note that literal concatenation can use different quoting
692styles for each component (even mixing raw strings and triple quoted strings),
693and formatted string literals may be concatenated with plain string literals.
694
695
696.. index::
697   single: formatted string literal
698   single: interpolated string literal
699   single: string; formatted literal
700   single: string; interpolated literal
701   single: f-string
702   single: fstring
703   single: {} (curly brackets); in formatted string literal
704   single: ! (exclamation); in formatted string literal
705   single: : (colon); in formatted string literal
706   single: = (equals); for help in debugging using string literals
707
708.. _f-strings:
709.. _formatted-string-literals:
710
711f-strings
712---------
713
714.. versionadded:: 3.6
715
716A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal
717that is prefixed with ``'f'`` or ``'F'``.  These strings may contain
718replacement fields, which are expressions delimited by curly braces ``{}``.
719While other string literals always have a constant value, formatted strings
720are really expressions evaluated at run time.
721
722Escape sequences are decoded like in ordinary string literals (except when
723a literal is also marked as a raw string).  After decoding, the grammar
724for the contents of the string is:
725
726.. productionlist:: python-grammar
727   f_string: (`literal_char` | "{{" | "}}" | `replacement_field`)*
728   replacement_field: "{" `f_expression` ["="] ["!" `conversion`] [":" `format_spec`] "}"
729   f_expression: (`conditional_expression` | "*" `or_expr`)
730               :   ("," `conditional_expression` | "," "*" `or_expr`)* [","]
731               : | `yield_expression`
732   conversion: "s" | "r" | "a"
733   format_spec: (`literal_char` | `replacement_field`)*
734   literal_char: <any code point except "{", "}" or NULL>
735
736The parts of the string outside curly braces are treated literally,
737except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced
738with the corresponding single curly brace.  A single opening curly
739bracket ``'{'`` marks a replacement field, which starts with a
740Python expression. To display both the expression text and its value after
741evaluation, (useful in debugging), an equal sign ``'='`` may be added after the
742expression. A conversion field, introduced by an exclamation point ``'!'`` may
743follow.  A format specifier may also be appended, introduced by a colon ``':'``.
744A replacement field ends with a closing curly bracket ``'}'``.
745
746Expressions in formatted string literals are treated like regular
747Python expressions surrounded by parentheses, with a few exceptions.
748An empty expression is not allowed, and both :keyword:`lambda`  and
749assignment expressions ``:=`` must be surrounded by explicit parentheses.
750Each expression is evaluated in the context where the formatted string literal
751appears, in order from left to right.  Replacement expressions can contain
752newlines in both single-quoted and triple-quoted f-strings and they can contain
753comments.  Everything that comes after a ``#`` inside a replacement field
754is a comment (even closing braces and quotes). In that case, replacement fields
755must be closed in a different line.
756
757.. code-block:: text
758
759   >>> f"abc{a # This is a comment }"
760   ... + 3}"
761   'abc5'
762
763.. versionchanged:: 3.7
764   Prior to Python 3.7, an :keyword:`await` expression and comprehensions
765   containing an :keyword:`async for` clause were illegal in the expressions
766   in formatted string literals due to a problem with the implementation.
767
768.. versionchanged:: 3.12
769   Prior to Python 3.12, comments were not allowed inside f-string replacement
770   fields.
771
772When the equal sign ``'='`` is provided, the output will have the expression
773text, the ``'='`` and the evaluated value. Spaces after the opening brace
774``'{'``, within the expression and after the ``'='`` are all retained in the
775output. By default, the ``'='`` causes the :func:`repr` of the expression to be
776provided, unless there is a format specified. When a format is specified it
777defaults to the :func:`str` of the expression unless a conversion ``'!r'`` is
778declared.
779
780.. versionadded:: 3.8
781   The equal sign ``'='``.
782
783If a conversion is specified, the result of evaluating the expression
784is converted before formatting.  Conversion ``'!s'`` calls :func:`str` on
785the result, ``'!r'`` calls :func:`repr`, and ``'!a'`` calls :func:`ascii`.
786
787The result is then formatted using the :func:`format` protocol.  The
788format specifier is passed to the :meth:`~object.__format__` method of the
789expression or conversion result.  An empty string is passed when the
790format specifier is omitted.  The formatted result is then included in
791the final value of the whole string.
792
793Top-level format specifiers may include nested replacement fields. These nested
794fields may include their own conversion fields and :ref:`format specifiers
795<formatspec>`, but may not include more deeply nested replacement fields. The
796:ref:`format specifier mini-language <formatspec>` is the same as that used by
797the :meth:`str.format` method.
798
799Formatted string literals may be concatenated, but replacement fields
800cannot be split across literals.
801
802Some examples of formatted string literals::
803
804   >>> name = "Fred"
805   >>> f"He said his name is {name!r}."
806   "He said his name is 'Fred'."
807   >>> f"He said his name is {repr(name)}."  # repr() is equivalent to !r
808   "He said his name is 'Fred'."
809   >>> width = 10
810   >>> precision = 4
811   >>> value = decimal.Decimal("12.34567")
812   >>> f"result: {value:{width}.{precision}}"  # nested fields
813   'result:      12.35'
814   >>> today = datetime(year=2017, month=1, day=27)
815   >>> f"{today:%B %d, %Y}"  # using date format specifier
816   'January 27, 2017'
817   >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging
818   'today=January 27, 2017'
819   >>> number = 1024
820   >>> f"{number:#0x}"  # using integer format specifier
821   '0x400'
822   >>> foo = "bar"
823   >>> f"{ foo = }" # preserves whitespace
824   " foo = 'bar'"
825   >>> line = "The mill's closed"
826   >>> f"{line = }"
827   'line = "The mill\'s closed"'
828   >>> f"{line = :20}"
829   "line = The mill's closed   "
830   >>> f"{line = !r:20}"
831   'line = "The mill\'s closed" '
832
833
834Reusing the outer f-string quoting type inside a replacement field is
835permitted::
836
837   >>> a = dict(x=2)
838   >>> f"abc {a["x"]} def"
839   'abc 2 def'
840
841.. versionchanged:: 3.12
842   Prior to Python 3.12, reuse of the same quoting type of the outer f-string
843   inside a replacement field was not possible.
844
845Backslashes are also allowed in replacement fields and are evaluated the same
846way as in any other context::
847
848   >>> a = ["a", "b", "c"]
849   >>> print(f"List a contains:\n{"\n".join(a)}")
850   List a contains:
851   a
852   b
853   c
854
855.. versionchanged:: 3.12
856   Prior to Python 3.12, backslashes were not permitted inside an f-string
857   replacement field.
858
859Formatted string literals cannot be used as docstrings, even if they do not
860include expressions.
861
862::
863
864   >>> def foo():
865   ...     f"Not a docstring"
866   ...
867   >>> foo.__doc__ is None
868   True
869
870See also :pep:`498` for the proposal that added formatted string literals,
871and :meth:`str.format`, which uses a related format string mechanism.
872
873
874.. _numbers:
875
876Numeric literals
877----------------
878
879.. index:: number, numeric literal, integer literal
880   floating-point literal, hexadecimal literal
881   octal literal, binary literal, decimal literal, imaginary literal, complex literal
882
883There are three types of numeric literals: integers, floating-point numbers, and
884imaginary numbers.  There are no complex literals (complex numbers can be formed
885by adding a real number and an imaginary number).
886
887Note that numeric literals do not include a sign; a phrase like ``-1`` is
888actually an expression composed of the unary operator '``-``' and the literal
889``1``.
890
891
892.. index::
893   single: 0b; integer literal
894   single: 0o; integer literal
895   single: 0x; integer literal
896   single: _ (underscore); in numeric literal
897
898.. _integers:
899
900Integer literals
901----------------
902
903Integer literals are described by the following lexical definitions:
904
905.. productionlist:: python-grammar
906   integer: `decinteger` | `bininteger` | `octinteger` | `hexinteger`
907   decinteger: `nonzerodigit` (["_"] `digit`)* | "0"+ (["_"] "0")*
908   bininteger: "0" ("b" | "B") (["_"] `bindigit`)+
909   octinteger: "0" ("o" | "O") (["_"] `octdigit`)+
910   hexinteger: "0" ("x" | "X") (["_"] `hexdigit`)+
911   nonzerodigit: "1"..."9"
912   digit: "0"..."9"
913   bindigit: "0" | "1"
914   octdigit: "0"..."7"
915   hexdigit: `digit` | "a"..."f" | "A"..."F"
916
917There is no limit for the length of integer literals apart from what can be
918stored in available memory.
919
920Underscores are ignored for determining the numeric value of the literal.  They
921can be used to group digits for enhanced readability.  One underscore can occur
922between digits, and after base specifiers like ``0x``.
923
924Note that leading zeros in a non-zero decimal number are not allowed. This is
925for disambiguation with C-style octal literals, which Python used before version
9263.0.
927
928Some examples of integer literals::
929
930   7     2147483647                        0o177    0b100110111
931   3     79228162514264337593543950336     0o377    0xdeadbeef
932         100_000_000_000                   0b_1110_0101
933
934.. versionchanged:: 3.6
935   Underscores are now allowed for grouping purposes in literals.
936
937
938.. index::
939   single: . (dot); in numeric literal
940   single: e; in numeric literal
941   single: _ (underscore); in numeric literal
942.. _floating:
943
944Floating-point literals
945-----------------------
946
947Floating-point literals are described by the following lexical definitions:
948
949.. productionlist:: python-grammar
950   floatnumber: `pointfloat` | `exponentfloat`
951   pointfloat: [`digitpart`] `fraction` | `digitpart` "."
952   exponentfloat: (`digitpart` | `pointfloat`) `exponent`
953   digitpart: `digit` (["_"] `digit`)*
954   fraction: "." `digitpart`
955   exponent: ("e" | "E") ["+" | "-"] `digitpart`
956
957Note that the integer and exponent parts are always interpreted using radix 10.
958For example, ``077e010`` is legal, and denotes the same number as ``77e10``. The
959allowed range of floating-point literals is implementation-dependent.  As in
960integer literals, underscores are supported for digit grouping.
961
962Some examples of floating-point literals::
963
964   3.14    10.    .001    1e100    3.14e-10    0e0    3.14_15_93
965
966.. versionchanged:: 3.6
967   Underscores are now allowed for grouping purposes in literals.
968
969
970.. index::
971   single: j; in numeric literal
972.. _imaginary:
973
974Imaginary literals
975------------------
976
977Imaginary literals are described by the following lexical definitions:
978
979.. productionlist:: python-grammar
980   imagnumber: (`floatnumber` | `digitpart`) ("j" | "J")
981
982An imaginary literal yields a complex number with a real part of 0.0.  Complex
983numbers are represented as a pair of floating-point numbers and have the same
984restrictions on their range.  To create a complex number with a nonzero real
985part, add a floating-point number to it, e.g., ``(3+4j)``.  Some examples of
986imaginary literals::
987
988   3.14j   10.j    10j     .001j   1e100j   3.14e-10j   3.14_15_93j
989
990
991.. _operators:
992
993Operators
994=========
995
996.. index:: single: operators
997
998The following tokens are operators:
999
1000.. code-block:: none
1001
1002
1003   +       -       *       **      /       //      %      @
1004   <<      >>      &       |       ^       ~       :=
1005   <       >       <=      >=      ==      !=
1006
1007
1008.. _delimiters:
1009
1010Delimiters
1011==========
1012
1013.. index:: single: delimiters
1014
1015The following tokens serve as delimiters in the grammar:
1016
1017.. code-block:: none
1018
1019   (       )       [       ]       {       }
1020   ,       :       !       .       ;       @       =
1021   ->      +=      -=      *=      /=      //=     %=
1022   @=      &=      |=      ^=      >>=     <<=     **=
1023
1024The period can also occur in floating-point and imaginary literals.  A sequence
1025of three periods has a special meaning as an ellipsis literal. The second half
1026of the list, the augmented assignment operators, serve lexically as delimiters,
1027but also perform an operation.
1028
1029The following printing ASCII characters have special meaning as part of other
1030tokens or are otherwise significant to the lexical analyzer:
1031
1032.. code-block:: none
1033
1034   '       "       #       \
1035
1036The following printing ASCII characters are not used in Python.  Their
1037occurrence outside string literals and comments is an unconditional error:
1038
1039.. code-block:: none
1040
1041   $       ?       `
1042
1043
1044.. rubric:: Footnotes
1045
1046.. [#] https://www.unicode.org/Public/15.1.0/ucd/NameAliases.txt
1047