1 2.. _lexical: 3 4**************** 5Lexical analysis 6**************** 7 8.. index:: lexical analysis, parser, token 9 10A Python program is read by a *parser*. Input to the parser is a stream of 11*tokens*, generated by the *lexical analyzer*. This chapter describes how the 12lexical analyzer breaks a file into tokens. 13 14Python reads program text as Unicode code points; the encoding of a source file 15can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120` 16for details. If the source file cannot be decoded, a :exc:`SyntaxError` is 17raised. 18 19 20.. _line-structure: 21 22Line structure 23============== 24 25.. index:: line structure 26 27A Python program is divided into a number of *logical lines*. 28 29 30.. _logical-lines: 31 32Logical lines 33------------- 34 35.. index:: logical line, physical line, line joining, NEWLINE token 36 37The end of a logical line is represented by the token NEWLINE. Statements 38cannot cross logical line boundaries except where NEWLINE is allowed by the 39syntax (e.g., between statements in compound statements). A logical line is 40constructed from one or more *physical lines* by following the explicit or 41implicit *line joining* rules. 42 43 44.. _physical-lines: 45 46Physical lines 47-------------- 48 49A physical line is a sequence of characters terminated by an end-of-line 50sequence. In source files and strings, any of the standard platform line 51termination sequences can be used - the Unix form using ASCII LF (linefeed), 52the Windows form using the ASCII sequence CR LF (return followed by linefeed), 53or the old Macintosh form using the ASCII CR (return) character. All of these 54forms can be used equally, regardless of platform. The end of input also serves 55as an implicit terminator for the final physical line. 56 57When embedding Python, source code strings should be passed to Python APIs using 58the standard C conventions for newline characters (the ``\n`` character, 59representing ASCII LF, is the line terminator). 60 61 62.. _comments: 63 64Comments 65-------- 66 67.. index:: comment, hash character 68 single: # (hash); comment 69 70A comment starts with a hash character (``#``) that is not part of a string 71literal, and ends at the end of the physical line. A comment signifies the end 72of the logical line unless the implicit line joining rules are invoked. Comments 73are ignored by the syntax. 74 75 76.. _encodings: 77 78Encoding declarations 79--------------------- 80 81.. index:: source character set, encoding declarations (source file) 82 single: # (hash); source encoding declaration 83 84If a comment in the first or second line of the Python script matches the 85regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an 86encoding declaration; the first group of this expression names the encoding of 87the source code file. The encoding declaration must appear on a line of its 88own. If it is the second line, the first line must also be a comment-only line. 89The recommended forms of an encoding expression are :: 90 91 # -*- coding: <encoding-name> -*- 92 93which is recognized also by GNU Emacs, and :: 94 95 # vim:fileencoding=<encoding-name> 96 97which is recognized by Bram Moolenaar's VIM. 98 99If no encoding declaration is found, the default encoding is UTF-8. In 100addition, if the first bytes of the file are the UTF-8 byte-order mark 101(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported, 102among others, by Microsoft's :program:`notepad`). 103 104If an encoding is declared, the encoding name must be recognized by Python. The 105encoding is used for all lexical analysis, including string literals, comments 106and identifiers. 107 108.. XXX there should be a list of supported encodings. 109 110 111.. _explicit-joining: 112 113Explicit line joining 114--------------------- 115 116.. index:: physical line, line joining, line continuation, backslash character 117 118Two or more physical lines may be joined into logical lines using backslash 119characters (``\``), as follows: when a physical line ends in a backslash that is 120not part of a string literal or comment, it is joined with the following forming 121a single logical line, deleting the backslash and the following end-of-line 122character. For example:: 123 124 if 1900 < year < 2100 and 1 <= month <= 12 \ 125 and 1 <= day <= 31 and 0 <= hour < 24 \ 126 and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date 127 return 1 128 129A line ending in a backslash cannot carry a comment. A backslash does not 130continue a comment. A backslash does not continue a token except for string 131literals (i.e., tokens other than string literals cannot be split across 132physical lines using a backslash). A backslash is illegal elsewhere on a line 133outside a string literal. 134 135 136.. _implicit-joining: 137 138Implicit line joining 139--------------------- 140 141Expressions in parentheses, square brackets or curly braces can be split over 142more than one physical line without using backslashes. For example:: 143 144 month_names = ['Januari', 'Februari', 'Maart', # These are the 145 'April', 'Mei', 'Juni', # Dutch names 146 'Juli', 'Augustus', 'September', # for the months 147 'Oktober', 'November', 'December'] # of the year 148 149Implicitly continued lines can carry comments. The indentation of the 150continuation lines is not important. Blank continuation lines are allowed. 151There is no NEWLINE token between implicit continuation lines. Implicitly 152continued lines can also occur within triple-quoted strings (see below); in that 153case they cannot carry comments. 154 155 156.. _blank-lines: 157 158Blank lines 159----------- 160 161.. index:: single: blank line 162 163A logical line that contains only spaces, tabs, formfeeds and possibly a 164comment, is ignored (i.e., no NEWLINE token is generated). During interactive 165input of statements, handling of a blank line may differ depending on the 166implementation of the read-eval-print loop. In the standard interactive 167interpreter, an entirely blank logical line (i.e. one containing not even 168whitespace or a comment) terminates a multi-line statement. 169 170 171.. _indentation: 172 173Indentation 174----------- 175 176.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping 177 178Leading whitespace (spaces and tabs) at the beginning of a logical line is used 179to compute the indentation level of the line, which in turn is used to determine 180the grouping of statements. 181 182Tabs are replaced (from left to right) by one to eight spaces such that the 183total number of characters up to and including the replacement is a multiple of 184eight (this is intended to be the same rule as used by Unix). The total number 185of spaces preceding the first non-blank character then determines the line's 186indentation. Indentation cannot be split over multiple physical lines using 187backslashes; the whitespace up to the first backslash determines the 188indentation. 189 190Indentation is rejected as inconsistent if a source file mixes tabs and spaces 191in a way that makes the meaning dependent on the worth of a tab in spaces; a 192:exc:`TabError` is raised in that case. 193 194**Cross-platform compatibility note:** because of the nature of text editors on 195non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the 196indentation in a single source file. It should also be noted that different 197platforms may explicitly limit the maximum indentation level. 198 199A formfeed character may be present at the start of the line; it will be ignored 200for the indentation calculations above. Formfeed characters occurring elsewhere 201in the leading whitespace have an undefined effect (for instance, they may reset 202the space count to zero). 203 204.. index:: INDENT token, DEDENT token 205 206The indentation levels of consecutive lines are used to generate INDENT and 207DEDENT tokens, using a stack, as follows. 208 209Before the first line of the file is read, a single zero is pushed on the stack; 210this will never be popped off again. The numbers pushed on the stack will 211always be strictly increasing from bottom to top. At the beginning of each 212logical line, the line's indentation level is compared to the top of the stack. 213If it is equal, nothing happens. If it is larger, it is pushed on the stack, and 214one INDENT token is generated. If it is smaller, it *must* be one of the 215numbers occurring on the stack; all numbers on the stack that are larger are 216popped off, and for each number popped off a DEDENT token is generated. At the 217end of the file, a DEDENT token is generated for each number remaining on the 218stack that is larger than zero. 219 220Here is an example of a correctly (though confusingly) indented piece of Python 221code:: 222 223 def perm(l): 224 # Compute the list of all permutations of l 225 if len(l) <= 1: 226 return [l] 227 r = [] 228 for i in range(len(l)): 229 s = l[:i] + l[i+1:] 230 p = perm(s) 231 for x in p: 232 r.append(l[i:i+1] + x) 233 return r 234 235The following example shows various indentation errors:: 236 237 def perm(l): # error: first line indented 238 for i in range(len(l)): # error: not indented 239 s = l[:i] + l[i+1:] 240 p = perm(l[:i] + l[i+1:]) # error: unexpected indent 241 for x in p: 242 r.append(l[i:i+1] + x) 243 return r # error: inconsistent dedent 244 245(Actually, the first three errors are detected by the parser; only the last 246error is found by the lexical analyzer --- the indentation of ``return r`` does 247not match a level popped off the stack.) 248 249 250.. _whitespace: 251 252Whitespace between tokens 253------------------------- 254 255Except at the beginning of a logical line or in string literals, the whitespace 256characters space, tab and formfeed can be used interchangeably to separate 257tokens. Whitespace is needed between two tokens only if their concatenation 258could otherwise be interpreted as a different token (e.g., ab is one token, but 259a b is two tokens). 260 261 262.. _other-tokens: 263 264Other tokens 265============ 266 267Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist: 268*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace 269characters (other than line terminators, discussed earlier) are not tokens, but 270serve to delimit tokens. Where ambiguity exists, a token comprises the longest 271possible string that forms a legal token, when read from left to right. 272 273 274.. _identifiers: 275 276Identifiers and keywords 277======================== 278 279.. index:: identifier, name 280 281Identifiers (also referred to as *names*) are described by the following lexical 282definitions. 283 284The syntax of identifiers in Python is based on the Unicode standard annex 285UAX-31, with elaboration and changes as defined below; see also :pep:`3131` for 286further details. 287 288Within the ASCII range (U+0001..U+007F), the valid characters for identifiers 289are the same as in Python 2.x: the uppercase and lowercase letters ``A`` through 290``Z``, the underscore ``_`` and, except for the first character, the digits 291``0`` through ``9``. 292 293Python 3.0 introduces additional characters from outside the ASCII range (see 294:pep:`3131`). For these characters, the classification uses the version of the 295Unicode Character Database as included in the :mod:`unicodedata` module. 296 297Identifiers are unlimited in length. Case is significant. 298 299.. productionlist:: python-grammar 300 identifier: `xid_start` `xid_continue`* 301 id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property> 302 id_continue: <all characters in `id_start`, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property> 303 xid_start: <all characters in `id_start` whose NFKC normalization is in "id_start xid_continue*"> 304 xid_continue: <all characters in `id_continue` whose NFKC normalization is in "id_continue*"> 305 306The Unicode category codes mentioned above stand for: 307 308* *Lu* - uppercase letters 309* *Ll* - lowercase letters 310* *Lt* - titlecase letters 311* *Lm* - modifier letters 312* *Lo* - other letters 313* *Nl* - letter numbers 314* *Mn* - nonspacing marks 315* *Mc* - spacing combining marks 316* *Nd* - decimal numbers 317* *Pc* - connector punctuations 318* *Other_ID_Start* - explicit list of characters in `PropList.txt 319 <https://www.unicode.org/Public/13.0.0/ucd/PropList.txt>`_ to support backwards 320 compatibility 321* *Other_ID_Continue* - likewise 322 323All identifiers are converted into the normal form NFKC while parsing; comparison 324of identifiers is based on NFKC. 325 326A non-normative HTML file listing all valid identifier characters for Unicode 3274.1 can be found at 328https://www.unicode.org/Public/13.0.0/ucd/DerivedCoreProperties.txt 329 330 331.. _keywords: 332 333Keywords 334-------- 335 336.. index:: 337 single: keyword 338 single: reserved word 339 340The following identifiers are used as reserved words, or *keywords* of the 341language, and cannot be used as ordinary identifiers. They must be spelled 342exactly as written here: 343 344.. sourcecode:: text 345 346 False await else import pass 347 None break except in raise 348 True class finally is return 349 and continue for lambda try 350 as def from nonlocal while 351 assert del global not with 352 async elif if or yield 353 354 355.. _soft-keywords: 356 357Soft Keywords 358------------- 359 360.. index:: soft keyword, keyword 361 362.. versionadded:: 3.10 363 364Some identifiers are only reserved under specific contexts. These are known as 365*soft keywords*. The identifiers ``match``, ``case`` and ``_`` can 366syntactically act as keywords in contexts related to the pattern matching 367statement, but this distinction is done at the parser level, not when 368tokenizing. 369 370As soft keywords, their use with pattern matching is possible while still 371preserving compatibility with existing code that uses ``match``, ``case`` and ``_`` as 372identifier names. 373 374 375.. index:: 376 single: _, identifiers 377 single: __, identifiers 378.. _id-classes: 379 380Reserved classes of identifiers 381------------------------------- 382 383Certain classes of identifiers (besides keywords) have special meanings. These 384classes are identified by the patterns of leading and trailing underscore 385characters: 386 387``_*`` 388 Not imported by ``from module import *``. 389 390``_`` 391 In a ``case`` pattern within a :keyword:`match` statement, ``_`` is a 392 :ref:`soft keyword <soft-keywords>` that denotes a 393 :ref:`wildcard <wildcard-patterns>`. 394 395 Separately, the interactive interpreter makes the result of the last evaluation 396 available in the variable ``_``. 397 (It is stored in the :mod:`builtins` module, alongside built-in 398 functions like ``print``.) 399 400 Elsewhere, ``_`` is a regular identifier. It is often used to name 401 "special" items, but it is not special to Python itself. 402 403 .. note:: 404 405 The name ``_`` is often used in conjunction with internationalization; 406 refer to the documentation for the :mod:`gettext` module for more 407 information on this convention. 408 409 It is also commonly used for unused variables. 410 411``__*__`` 412 System-defined names, informally known as "dunder" names. These names are 413 defined by the interpreter and its implementation (including the standard library). 414 Current system names are discussed in the :ref:`specialnames` section and elsewhere. 415 More will likely be defined in future versions of Python. *Any* use of ``__*__`` names, 416 in any context, that does not follow explicitly documented use, is subject to 417 breakage without warning. 418 419``__*`` 420 Class-private names. Names in this category, when used within the context of a 421 class definition, are re-written to use a mangled form to help avoid name 422 clashes between "private" attributes of base and derived classes. See section 423 :ref:`atom-identifiers`. 424 425 426.. _literals: 427 428Literals 429======== 430 431.. index:: literal, constant 432 433Literals are notations for constant values of some built-in types. 434 435 436.. index:: string literal, bytes literal, ASCII 437 single: ' (single quote); string literal 438 single: " (double quote); string literal 439 single: u'; string literal 440 single: u"; string literal 441.. _strings: 442 443String and Bytes literals 444------------------------- 445 446String literals are described by the following lexical definitions: 447 448.. productionlist:: python-grammar 449 stringliteral: [`stringprefix`](`shortstring` | `longstring`) 450 stringprefix: "r" | "u" | "R" | "U" | "f" | "F" 451 : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" 452 shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"' 453 longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""' 454 shortstringitem: `shortstringchar` | `stringescapeseq` 455 longstringitem: `longstringchar` | `stringescapeseq` 456 shortstringchar: <any source character except "\" or newline or the quote> 457 longstringchar: <any source character except "\"> 458 stringescapeseq: "\" <any source character> 459 460.. productionlist:: python-grammar 461 bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) 462 bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" 463 shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' 464 longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' 465 shortbytesitem: `shortbyteschar` | `bytesescapeseq` 466 longbytesitem: `longbyteschar` | `bytesescapeseq` 467 shortbyteschar: <any ASCII character except "\" or newline or the quote> 468 longbyteschar: <any ASCII character except "\"> 469 bytesescapeseq: "\" <any ASCII character> 470 471One syntactic restriction not indicated by these productions is that whitespace 472is not allowed between the :token:`~python-grammar:stringprefix` or 473:token:`~python-grammar:bytesprefix` and the rest of the literal. The source 474character set is defined by the encoding declaration; it is UTF-8 if no encoding 475declaration is given in the source file; see section :ref:`encodings`. 476 477.. index:: triple-quoted string, Unicode Consortium, raw string 478 single: """; string literal 479 single: '''; string literal 480 481In plain English: Both types of literals can be enclosed in matching single quotes 482(``'``) or double quotes (``"``). They can also be enclosed in matching groups 483of three single or double quotes (these are generally referred to as 484*triple-quoted strings*). The backslash (``\``) character is used to escape 485characters that otherwise have a special meaning, such as newline, backslash 486itself, or the quote character. 487 488.. index:: 489 single: b'; bytes literal 490 single: b"; bytes literal 491 492Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an 493instance of the :class:`bytes` type instead of the :class:`str` type. They 494may only contain ASCII characters; bytes with a numeric value of 128 or greater 495must be expressed with escapes. 496 497.. index:: 498 single: r'; raw string literal 499 single: r"; raw string literal 500 501Both string and bytes literals may optionally be prefixed with a letter ``'r'`` 502or ``'R'``; such strings are called :dfn:`raw strings` and treat backslashes as 503literal characters. As a result, in string literals, ``'\U'`` and ``'\u'`` 504escapes in raw strings are not treated specially. Given that Python 2.x's raw 505unicode literals behave differently than Python 3.x's the ``'ur'`` syntax 506is not supported. 507 508.. versionadded:: 3.3 509 The ``'rb'`` prefix of raw bytes literals has been added as a synonym 510 of ``'br'``. 511 512.. versionadded:: 3.3 513 Support for the unicode legacy literal (``u'value'``) was reintroduced 514 to simplify the maintenance of dual Python 2.x and 3.x codebases. 515 See :pep:`414` for more information. 516 517.. index:: 518 single: f'; formatted string literal 519 single: f"; formatted string literal 520 521A string literal with ``'f'`` or ``'F'`` in its prefix is a 522:dfn:`formatted string literal`; see :ref:`f-strings`. The ``'f'`` may be 523combined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw 524formatted strings are possible, but formatted bytes literals are not. 525 526In triple-quoted literals, unescaped newlines and quotes are allowed (and are 527retained), except that three unescaped quotes in a row terminate the literal. (A 528"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.) 529 530.. index:: physical line, escape sequence, Standard C, C 531 single: \ (backslash); escape sequence 532 single: \\; escape sequence 533 single: \a; escape sequence 534 single: \b; escape sequence 535 single: \f; escape sequence 536 single: \n; escape sequence 537 single: \r; escape sequence 538 single: \t; escape sequence 539 single: \v; escape sequence 540 single: \x; escape sequence 541 single: \N; escape sequence 542 single: \u; escape sequence 543 single: \U; escape sequence 544 545Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and 546bytes literals are interpreted according to rules similar to those used by 547Standard C. The recognized escape sequences are: 548 549+-----------------+---------------------------------+-------+ 550| Escape Sequence | Meaning | Notes | 551+=================+=================================+=======+ 552| ``\newline`` | Backslash and newline ignored | | 553+-----------------+---------------------------------+-------+ 554| ``\\`` | Backslash (``\``) | | 555+-----------------+---------------------------------+-------+ 556| ``\'`` | Single quote (``'``) | | 557+-----------------+---------------------------------+-------+ 558| ``\"`` | Double quote (``"``) | | 559+-----------------+---------------------------------+-------+ 560| ``\a`` | ASCII Bell (BEL) | | 561+-----------------+---------------------------------+-------+ 562| ``\b`` | ASCII Backspace (BS) | | 563+-----------------+---------------------------------+-------+ 564| ``\f`` | ASCII Formfeed (FF) | | 565+-----------------+---------------------------------+-------+ 566| ``\n`` | ASCII Linefeed (LF) | | 567+-----------------+---------------------------------+-------+ 568| ``\r`` | ASCII Carriage Return (CR) | | 569+-----------------+---------------------------------+-------+ 570| ``\t`` | ASCII Horizontal Tab (TAB) | | 571+-----------------+---------------------------------+-------+ 572| ``\v`` | ASCII Vertical Tab (VT) | | 573+-----------------+---------------------------------+-------+ 574| ``\ooo`` | Character with octal value | (1,3) | 575| | *ooo* | | 576+-----------------+---------------------------------+-------+ 577| ``\xhh`` | Character with hex value *hh* | (2,3) | 578+-----------------+---------------------------------+-------+ 579 580Escape sequences only recognized in string literals are: 581 582+-----------------+---------------------------------+-------+ 583| Escape Sequence | Meaning | Notes | 584+=================+=================================+=======+ 585| ``\N{name}`` | Character named *name* in the | \(4) | 586| | Unicode database | | 587+-----------------+---------------------------------+-------+ 588| ``\uxxxx`` | Character with 16-bit hex value | \(5) | 589| | *xxxx* | | 590+-----------------+---------------------------------+-------+ 591| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(6) | 592| | *xxxxxxxx* | | 593+-----------------+---------------------------------+-------+ 594 595Notes: 596 597(1) 598 As in Standard C, up to three octal digits are accepted. 599 600(2) 601 Unlike in Standard C, exactly two hex digits are required. 602 603(3) 604 In a bytes literal, hexadecimal and octal escapes denote the byte with the 605 given value. In a string literal, these escapes denote a Unicode character 606 with the given value. 607 608(4) 609 .. versionchanged:: 3.3 610 Support for name aliases [#]_ has been added. 611 612(5) 613 Exactly four hex digits are required. 614 615(6) 616 Any Unicode character can be encoded this way. Exactly eight hex digits 617 are required. 618 619 620.. index:: unrecognized escape sequence 621 622Unlike Standard C, all unrecognized escape sequences are left in the string 623unchanged, i.e., *the backslash is left in the result*. (This behavior is 624useful when debugging: if an escape sequence is mistyped, the resulting output 625is more easily recognized as broken.) It is also important to note that the 626escape sequences only recognized in string literals fall into the category of 627unrecognized escapes for bytes literals. 628 629 .. versionchanged:: 3.6 630 Unrecognized escape sequences produce a :exc:`DeprecationWarning`. In 631 a future Python version they will be a :exc:`SyntaxWarning` and 632 eventually a :exc:`SyntaxError`. 633 634Even in a raw literal, quotes can be escaped with a backslash, but the 635backslash remains in the result; for example, ``r"\""`` is a valid string 636literal consisting of two characters: a backslash and a double quote; ``r"\"`` 637is not a valid string literal (even a raw string cannot end in an odd number of 638backslashes). Specifically, *a raw literal cannot end in a single backslash* 639(since the backslash would escape the following quote character). Note also 640that a single backslash followed by a newline is interpreted as those two 641characters as part of the literal, *not* as a line continuation. 642 643 644.. _string-concatenation: 645 646String literal concatenation 647---------------------------- 648 649Multiple adjacent string or bytes literals (delimited by whitespace), possibly 650using different quoting conventions, are allowed, and their meaning is the same 651as their concatenation. Thus, ``"hello" 'world'`` is equivalent to 652``"helloworld"``. This feature can be used to reduce the number of backslashes 653needed, to split long strings conveniently across long lines, or even to add 654comments to parts of strings, for example:: 655 656 re.compile("[A-Za-z_]" # letter or underscore 657 "[A-Za-z0-9_]*" # letter, digit or underscore 658 ) 659 660Note that this feature is defined at the syntactical level, but implemented at 661compile time. The '+' operator must be used to concatenate string expressions 662at run time. Also note that literal concatenation can use different quoting 663styles for each component (even mixing raw strings and triple quoted strings), 664and formatted string literals may be concatenated with plain string literals. 665 666 667.. index:: 668 single: formatted string literal 669 single: interpolated string literal 670 single: string; formatted literal 671 single: string; interpolated literal 672 single: f-string 673 single: fstring 674 single: {} (curly brackets); in formatted string literal 675 single: ! (exclamation); in formatted string literal 676 single: : (colon); in formatted string literal 677 single: = (equals); for help in debugging using string literals 678.. _f-strings: 679 680Formatted string literals 681------------------------- 682 683.. versionadded:: 3.6 684 685A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal 686that is prefixed with ``'f'`` or ``'F'``. These strings may contain 687replacement fields, which are expressions delimited by curly braces ``{}``. 688While other string literals always have a constant value, formatted strings 689are really expressions evaluated at run time. 690 691Escape sequences are decoded like in ordinary string literals (except when 692a literal is also marked as a raw string). After decoding, the grammar 693for the contents of the string is: 694 695.. productionlist:: python-grammar 696 f_string: (`literal_char` | "{{" | "}}" | `replacement_field`)* 697 replacement_field: "{" `f_expression` ["="] ["!" `conversion`] [":" `format_spec`] "}" 698 f_expression: (`conditional_expression` | "*" `or_expr`) 699 : ("," `conditional_expression` | "," "*" `or_expr`)* [","] 700 : | `yield_expression` 701 conversion: "s" | "r" | "a" 702 format_spec: (`literal_char` | NULL | `replacement_field`)* 703 literal_char: <any code point except "{", "}" or NULL> 704 705The parts of the string outside curly braces are treated literally, 706except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced 707with the corresponding single curly brace. A single opening curly 708bracket ``'{'`` marks a replacement field, which starts with a 709Python expression. To display both the expression text and its value after 710evaluation, (useful in debugging), an equal sign ``'='`` may be added after the 711expression. A conversion field, introduced by an exclamation point ``'!'`` may 712follow. A format specifier may also be appended, introduced by a colon ``':'``. 713A replacement field ends with a closing curly bracket ``'}'``. 714 715Expressions in formatted string literals are treated like regular 716Python expressions surrounded by parentheses, with a few exceptions. 717An empty expression is not allowed, and both :keyword:`lambda` and 718assignment expressions ``:=`` must be surrounded by explicit parentheses. 719Replacement expressions can contain line breaks (e.g. in triple-quoted 720strings), but they cannot contain comments. Each expression is evaluated 721in the context where the formatted string literal appears, in order from 722left to right. 723 724.. versionchanged:: 3.7 725 Prior to Python 3.7, an :keyword:`await` expression and comprehensions 726 containing an :keyword:`async for` clause were illegal in the expressions 727 in formatted string literals due to a problem with the implementation. 728 729When the equal sign ``'='`` is provided, the output will have the expression 730text, the ``'='`` and the evaluated value. Spaces after the opening brace 731``'{'``, within the expression and after the ``'='`` are all retained in the 732output. By default, the ``'='`` causes the :func:`repr` of the expression to be 733provided, unless there is a format specified. When a format is specified it 734defaults to the :func:`str` of the expression unless a conversion ``'!r'`` is 735declared. 736 737.. versionadded:: 3.8 738 The equal sign ``'='``. 739 740If a conversion is specified, the result of evaluating the expression 741is converted before formatting. Conversion ``'!s'`` calls :func:`str` on 742the result, ``'!r'`` calls :func:`repr`, and ``'!a'`` calls :func:`ascii`. 743 744The result is then formatted using the :func:`format` protocol. The 745format specifier is passed to the :meth:`__format__` method of the 746expression or conversion result. An empty string is passed when the 747format specifier is omitted. The formatted result is then included in 748the final value of the whole string. 749 750Top-level format specifiers may include nested replacement fields. These nested 751fields may include their own conversion fields and :ref:`format specifiers 752<formatspec>`, but may not include more deeply-nested replacement fields. The 753:ref:`format specifier mini-language <formatspec>` is the same as that used by 754the :meth:`str.format` method. 755 756Formatted string literals may be concatenated, but replacement fields 757cannot be split across literals. 758 759Some examples of formatted string literals:: 760 761 >>> name = "Fred" 762 >>> f"He said his name is {name!r}." 763 "He said his name is 'Fred'." 764 >>> f"He said his name is {repr(name)}." # repr() is equivalent to !r 765 "He said his name is 'Fred'." 766 >>> width = 10 767 >>> precision = 4 768 >>> value = decimal.Decimal("12.34567") 769 >>> f"result: {value:{width}.{precision}}" # nested fields 770 'result: 12.35' 771 >>> today = datetime(year=2017, month=1, day=27) 772 >>> f"{today:%B %d, %Y}" # using date format specifier 773 'January 27, 2017' 774 >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging 775 'today=January 27, 2017' 776 >>> number = 1024 777 >>> f"{number:#0x}" # using integer format specifier 778 '0x400' 779 >>> foo = "bar" 780 >>> f"{ foo = }" # preserves whitespace 781 " foo = 'bar'" 782 >>> line = "The mill's closed" 783 >>> f"{line = }" 784 'line = "The mill\'s closed"' 785 >>> f"{line = :20}" 786 "line = The mill's closed " 787 >>> f"{line = !r:20}" 788 'line = "The mill\'s closed" ' 789 790 791A consequence of sharing the same syntax as regular string literals is 792that characters in the replacement fields must not conflict with the 793quoting used in the outer formatted string literal:: 794 795 f"abc {a["x"]} def" # error: outer string literal ended prematurely 796 f"abc {a['x']} def" # workaround: use different quoting 797 798Backslashes are not allowed in format expressions and will raise 799an error:: 800 801 f"newline: {ord('\n')}" # raises SyntaxError 802 803To include a value in which a backslash escape is required, create 804a temporary variable. 805 806 >>> newline = ord('\n') 807 >>> f"newline: {newline}" 808 'newline: 10' 809 810Formatted string literals cannot be used as docstrings, even if they do not 811include expressions. 812 813:: 814 815 >>> def foo(): 816 ... f"Not a docstring" 817 ... 818 >>> foo.__doc__ is None 819 True 820 821See also :pep:`498` for the proposal that added formatted string literals, 822and :meth:`str.format`, which uses a related format string mechanism. 823 824 825.. _numbers: 826 827Numeric literals 828---------------- 829 830.. index:: number, numeric literal, integer literal 831 floating point literal, hexadecimal literal 832 octal literal, binary literal, decimal literal, imaginary literal, complex literal 833 834There are three types of numeric literals: integers, floating point numbers, and 835imaginary numbers. There are no complex literals (complex numbers can be formed 836by adding a real number and an imaginary number). 837 838Note that numeric literals do not include a sign; a phrase like ``-1`` is 839actually an expression composed of the unary operator '``-``' and the literal 840``1``. 841 842 843.. index:: 844 single: 0b; integer literal 845 single: 0o; integer literal 846 single: 0x; integer literal 847 single: _ (underscore); in numeric literal 848 849.. _integers: 850 851Integer literals 852---------------- 853 854Integer literals are described by the following lexical definitions: 855 856.. productionlist:: python-grammar 857 integer: `decinteger` | `bininteger` | `octinteger` | `hexinteger` 858 decinteger: `nonzerodigit` (["_"] `digit`)* | "0"+ (["_"] "0")* 859 bininteger: "0" ("b" | "B") (["_"] `bindigit`)+ 860 octinteger: "0" ("o" | "O") (["_"] `octdigit`)+ 861 hexinteger: "0" ("x" | "X") (["_"] `hexdigit`)+ 862 nonzerodigit: "1"..."9" 863 digit: "0"..."9" 864 bindigit: "0" | "1" 865 octdigit: "0"..."7" 866 hexdigit: `digit` | "a"..."f" | "A"..."F" 867 868There is no limit for the length of integer literals apart from what can be 869stored in available memory. 870 871Underscores are ignored for determining the numeric value of the literal. They 872can be used to group digits for enhanced readability. One underscore can occur 873between digits, and after base specifiers like ``0x``. 874 875Note that leading zeros in a non-zero decimal number are not allowed. This is 876for disambiguation with C-style octal literals, which Python used before version 8773.0. 878 879Some examples of integer literals:: 880 881 7 2147483647 0o177 0b100110111 882 3 79228162514264337593543950336 0o377 0xdeadbeef 883 100_000_000_000 0b_1110_0101 884 885.. versionchanged:: 3.6 886 Underscores are now allowed for grouping purposes in literals. 887 888 889.. index:: 890 single: . (dot); in numeric literal 891 single: e; in numeric literal 892 single: _ (underscore); in numeric literal 893.. _floating: 894 895Floating point literals 896----------------------- 897 898Floating point literals are described by the following lexical definitions: 899 900.. productionlist:: python-grammar 901 floatnumber: `pointfloat` | `exponentfloat` 902 pointfloat: [`digitpart`] `fraction` | `digitpart` "." 903 exponentfloat: (`digitpart` | `pointfloat`) `exponent` 904 digitpart: `digit` (["_"] `digit`)* 905 fraction: "." `digitpart` 906 exponent: ("e" | "E") ["+" | "-"] `digitpart` 907 908Note that the integer and exponent parts are always interpreted using radix 10. 909For example, ``077e010`` is legal, and denotes the same number as ``77e10``. The 910allowed range of floating point literals is implementation-dependent. As in 911integer literals, underscores are supported for digit grouping. 912 913Some examples of floating point literals:: 914 915 3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93 916 917.. versionchanged:: 3.6 918 Underscores are now allowed for grouping purposes in literals. 919 920 921.. index:: 922 single: j; in numeric literal 923.. _imaginary: 924 925Imaginary literals 926------------------ 927 928Imaginary literals are described by the following lexical definitions: 929 930.. productionlist:: python-grammar 931 imagnumber: (`floatnumber` | `digitpart`) ("j" | "J") 932 933An imaginary literal yields a complex number with a real part of 0.0. Complex 934numbers are represented as a pair of floating point numbers and have the same 935restrictions on their range. To create a complex number with a nonzero real 936part, add a floating point number to it, e.g., ``(3+4j)``. Some examples of 937imaginary literals:: 938 939 3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j 940 941 942.. _operators: 943 944Operators 945========= 946 947.. index:: single: operators 948 949The following tokens are operators: 950 951.. code-block:: none 952 953 954 + - * ** / // % @ 955 << >> & | ^ ~ := 956 < > <= >= == != 957 958 959.. _delimiters: 960 961Delimiters 962========== 963 964.. index:: single: delimiters 965 966The following tokens serve as delimiters in the grammar: 967 968.. code-block:: none 969 970 ( ) [ ] { } 971 , : . ; @ = -> 972 += -= *= /= //= %= @= 973 &= |= ^= >>= <<= **= 974 975The period can also occur in floating-point and imaginary literals. A sequence 976of three periods has a special meaning as an ellipsis literal. The second half 977of the list, the augmented assignment operators, serve lexically as delimiters, 978but also perform an operation. 979 980The following printing ASCII characters have special meaning as part of other 981tokens or are otherwise significant to the lexical analyzer: 982 983.. code-block:: none 984 985 ' " # \ 986 987The following printing ASCII characters are not used in Python. Their 988occurrence outside string literals and comments is an unconditional error: 989 990.. code-block:: none 991 992 $ ? ` 993 994 995.. rubric:: Footnotes 996 997.. [#] https://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt 998