re.rst - OpenGrok cross reference for /external/python/cpython3/Doc/library/re.rst

Lines Matching +full:- +full:match
1 :mod:`re` --- Regular expression operations
12 --------------
18 as well as 8-bit strings (:class:`bytes`).
19 However, Unicode strings and 8-bit strings cannot be mixed:
20 that is, you cannot match a Unicode string with a byte pattern or
21 vice-versa; similarly, when asking for a substitution, the replacement
27 character for the same purpose in string literals; for example, to match
38 prefixed with ``'r'``.  So ``r"\n"`` is a two-character string containing
39 ``'\'`` and ``'n'``, while ``"\n"`` is a one-character string containing a
44 module-level functions and methods on
45 :ref:`compiled regular expressions <re-objects>`.  The functions are shortcuts
47 fine-tuning parameters.
51    The third-party `regex <https://pypi.org/project/regex/>`_ module,
56 .. _re-syntax:
59 -------------------------
69 string *pq* will match AB.  This holds unless *A* or *B* contain low precedence
77 information and a gentler presentation, consult the :ref:`regex-howto`.
81 expressions; they simply match themselves.  You can concatenate ordinary
91 directly nested. This avoids ambiguity with the non-greedy modifier suffix
126    Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
127    many repetitions as are possible.  ``ab*`` will match 'a', 'ab', or 'a' followed
133    Causes the resulting RE to match 1 or more repetitions of the preceding RE.
134    ``ab+`` will match 'a' followed by any non-zero number of 'b's; it will not
135    match just 'a'.
140    Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
141    ``ab?`` will match either 'a' or 'ab'.
149    The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match
151    ``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire
153    perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few*
154    characters as possible will be matched.  Using the RE ``<.*?>`` will match
162    matches cause the entire RE not to match.  For example, ``a{6}`` will match
166    Causes the resulting RE to match from *m* to *n* repetitions of the preceding
167    RE, attempting to match as many repetitions as possible.  For example,
168    ``a{3,5}`` will match from 3 to 5 ``'a'`` characters.  Omitting *m* specifies a
170    example, ``a{4,}b`` will match ``'aaaab'`` or a thousand ``'a'`` characters
175    Causes the resulting RE to match from *m* to *n* repetitions of the preceding
176    RE, attempting to match as *few* repetitions as possible.  This is the
177    non-greedy version of the previous qualifier.  For example, on the
178    6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
179    while ``a{3,5}?`` will only match 3 characters.
184    Either escapes special characters (permitting you to match characters like
202    * Characters can be listed individually, e.g. ``[amk]`` will match ``'a'``,
205    .. index:: single: - (minus); in regular expressions
208      them by a ``'-'``, for example ``[a-z]`` will match any lowercase ASCII letter,
209      ``[0-5][0-9]`` will match all the two-digits numbers from ``00`` to ``59``, and
210      ``[0-9A-Fa-f]`` will match any hexadecimal digit.  If ``-`` is escaped (e.g.
211      ``[a\-z]``) or if it's placed as the first or last character
212      (e.g. ``[-a]`` or ``[a-]``), it will match a literal ``'-'``.
215      ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``,
221      inside a set, although the characters they match depends on whether
228      that are *not* in the set will be matched.  For example, ``[^5]`` will match
229      any character except ``'5'``, and ``[^^]`` will match any character except
233    * To match a literal ``']'`` inside a set, precede it with a backslash, or
235      ``[]()[{}]`` will both match a parenthesis.
237    .. .. index:: single: --; in regular expressions
247      character sequences ``'--'``, ``'&&'``, ``'~~'``, and ``'||'``.  To
260    will match either *A* or *B*.  An arbitrary number of REs can be separated by the
265    produce a longer overall match.  In other words, the ``'|'`` operator is never
266    greedy.  To match a literal ``'|'``, use ``\|``, or enclose it inside a
274    start and end of a group; the contents of a group can be retrieved after a match
276    special sequence, described below.  To match the literals ``'('`` or ``')'``,
291    letters set the corresponding flags: :const:`re.A` (ASCII-only matching),
293    :const:`re.M` (multi-line), :const:`re.S` (dot matches all),
296    (The flags are described in :ref:`contents-of-module-re`.)
305    A non-capturing version of regular parentheses.  Matches whatever regular
307    *cannot* be retrieved after performing a match or referenced later in the
310 ``(?aiLmsux-imsx:...)``
312    ``'s'``, ``'u'``, ``'x'``, optionally followed by ``'-'`` followed by
315    :const:`re.A` (ASCII-only matching), :const:`re.I` (ignore case),
316    :const:`re.L` (locale dependent), :const:`re.M` (multi-line),
319    (The flags are described in :ref:`contents-of-module-re`.)
322    as inline flags, so they can't be combined or follow ``'-'``.  Instead,
325    ASCII-only matching, and ``(?u:...)`` switches to Unicode matching
327    matching, and ``(?a:...)`` switches to ASCII-only matching (default).
349    +---------------------------------------+----------------------------------+
354    +---------------------------------------+----------------------------------+
355    | when processing match object *m*      | * ``m.group('quote')``           |
357    +---------------------------------------+----------------------------------+
361    +---------------------------------------+----------------------------------+
378    called a :dfn:`lookahead assertion`.  For example, ``Isaac (?=Asimov)`` will match
384    Matches if ``...`` doesn't match next.  This is a :dfn:`negative lookahead assertion`.
385    For example, ``Isaac (?!Asimov)`` will match ``'Isaac '`` only if it's *not*
391    Matches if the current position in the string is preceded by a match for ``...``
393    assertion`. ``(?<=abc)def`` will find a match in ``'abcdef'``, since the
395    The contained pattern must only match strings of some fixed length, meaning that
397    patterns which start with positive lookbehind assertions will not match at the
399    :func:`search` function rather than the :func:`match` function:
408       >>> m = re.search(r'(?<=-)\w+', 'spam-egg')
418    Matches if the current position in the string is not preceded by a match for
420    positive lookbehind assertions, the contained pattern must only match strings of
422    match at the beginning of the string being searched.
424 ``(?(id/name)yes-pattern|no-pattern)``
425    Will try to match with ``yes-pattern`` if the group with given *id* or
426    *name* exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is
429    will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but
435 resulting RE will match the second character.  For example, ``\$`` matches the
444    can only be used to match one of the first 99 groups.  If the first digit of
446    a group match, but as the character with octal value *number*. Inside the
487       Unicode character category [Nd]).  This includes ``[0-9]``, and
489       used only ``[0-9]`` is matched.
491    For 8-bit (bytes) patterns:
492       Matches any decimal digit; this is equivalent to ``[0-9]``.
499    becomes the equivalent of ``[^0-9]``.
507       non-breaking spaces mandated by typography rules in many
511    For 8-bit (bytes) patterns:
529       ``[a-zA-Z0-9_]`` is matched.
531    For 8-bit (bytes) patterns:
533       this is equivalent to ``[a-zA-Z0-9_]``.  If the :const:`LOCALE` flag is
542    becomes the equivalent of ``[^a-zA-Z0-9_]``.  If the :const:`LOCALE` flag is
595 .. _contents-of-module-re:
598 ---------------
602 regular expressions.  Most non-trivial applications always use the compiled
612    <re-objects>`, which can be used for matching using its
613    :func:`~Pattern.match`, :func:`~Pattern.search` and other methods, described
623       result = prog.match(string)
627       result = re.match(pattern, string)
636       :func:`re.compile` and the module-level matching functions are cached, so
645    perform ASCII-only matching instead of full Unicode matching.  This is only
665    Perform case-insensitive matching; expressions like ``[A-Z]`` will also
666    match lowercase letters.  Full Unicode matching (such as ``Ü`` matching
668    non-ASCII matches.  The current locale does not change the effect of this
672    Note that when the Unicode patterns ``[a-z]`` or ``[A-Z]`` are used in
673    combination with the :const:`IGNORECASE` flag, they will match the 52 ASCII
674    letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital
683    Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching
687    works with 8-bit locales.  Unicode matching is already enabled by default
717    Make the ``'.'`` special character match any character at all, including a
718    newline; without this flag, ``'.'`` will match anything *except* a newline.
736    This means that the two following regular expression objects that match a
750    *pattern* produces a match, and return a corresponding :ref:`match object
751    <match-objects>`.  Return ``None`` if no position in the string matches the
752    pattern; note that this is different from finding a zero-length match at some
756 .. function:: match(pattern, string, flags=0)
758    If zero or more characters at the beginning of *string* match the regular
759    expression *pattern*, return a corresponding :ref:`match object
760    <match-objects>`.  Return ``None`` if the string does not match the pattern;
761    note that this is different from a zero-length match.
763    Note that even in :const:`MULTILINE` mode, :func:`re.match` will only match
766    If you want to locate a match anywhere in *string*, use :func:`search`
767    instead (see also :ref:`search-vs-match`).
773    corresponding :ref:`match object <match-objects>`.  Return ``None`` if the
774    string does not match the pattern; note that this is different from a
775    zero-length match.
794       >>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
808    to a previous empty match.
821       Added support of splitting on a pattern that could match an empty string.
826    Return all non-overlapping matches of *pattern* in *string*, as a list of
827    strings or tuples.  The *string* is scanned left-to-right, and matches
834    of tuples of strings matching the groups.  Non-capturing groups do not
837       >>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
843       Non-empty matches can now start just after a previous empty match.
848    Return an :term:`iterator` yielding :ref:`match objects <match-objects>` over
849    all non-overlapping matches for the RE *pattern* in *string*.  The *string*
850    is scanned left-to-right, and matches are returned in the order found.  Empty
854       Non-empty matches can now start just after a previous empty match.
859    Return the string obtained by replacing the leftmost non-overlapping occurrences
870       >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
875    If *repl* is a function, it is called for every non-overlapping occurrence of
876    *pattern*.  The function takes a single :ref:`match object <match-objects>`
880       ...     if matchobj.group(0) == '-': return ' '
881       ...     else: return '-'
882       >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
883       'pro--gram files'
887    The pattern may be a string or a :ref:`pattern object <re-objects>`.
890    replaced; *count* must be a non-negative integer.  If omitted or zero, all
892    when not adjacent to a previous empty match, so ``sub('x*', '-', 'abxd')`` returns
893    ``'-a-b--d-'``.
897    In string-type *repl* arguments, in addition to the character escapes and
923       non-empty match.
941    This is useful if you want to match an arbitrary literal string that may
947       >>> legal_chars = string.ascii_lowercase + string.digits + "!#$%&'*+-.^_`|~:"
949       [abcdefghijklmnopqrstuvwxyz0123456789!\#\$%\&'\*\+\-\.\^_`\|\~:]+
951       >>> operators = ['+', '-', '*', '/', '**']
953       /|\-|\+|\*\*|\*
959       >>> sample = '/usr/sbin/sendmail - 0 errors, 12 warnings'
961       /usr/sbin/sendmail - \d+ errors, \d+ warnings
983    error if a string contains no match for a pattern.  The error instance has
1009 .. _re-objects:
1012 --------------------------
1020    expression produces a match, and return a corresponding :ref:`match object
1021    <match-objects>`.  Return ``None`` if no position in the string matches the
1022    pattern; note that this is different from finding a zero-length match at some
1033    from *pos* to ``endpos - 1`` will be searched for a match.  If *endpos* is less
1034    than *pos*, no match will be found; otherwise, if *rx* is a compiled regular
1039       >>> pattern.search("dog")     # Match at index 0
1040       <re.Match object; span=(0, 1), match='d'>
1041       >>> pattern.search("dog", 1)  # No match; search doesn't include the "d"
1044 .. method:: Pattern.match(string[, pos[, endpos]])
1046    If zero or more characters at the *beginning* of *string* match this regular
1047    expression, return a corresponding :ref:`match object <match-objects>`.
1048    Return ``None`` if the string does not match the pattern; note that this is
1049    different from a zero-length match.
1055       >>> pattern.match("dog")      # No match as "o" is not at the start of "dog".
1056       >>> pattern.match("dog", 1)   # Match as "o" is the 2nd character of "dog".
1057       <re.Match object; span=(1, 2), match='o'>
1059    If you want to locate a match anywhere in *string*, use
1060    :meth:`~Pattern.search` instead (see also :ref:`search-vs-match`).
1066    :ref:`match object <match-objects>`.  Return ``None`` if the string does not
1067    match the pattern; note that this is different from a zero-length match.
1073       >>> pattern.fullmatch("dog")      # No match as "o" is not at the start of "dog".
1074       >>> pattern.fullmatch("ogre")     # No match as not the full string matches.
1076       <re.Match object; span=(1, 3), match='og'>
1139 .. _match-objects:
1141 Match Objects
1142 -------------
1144 Match objects always have a boolean value of ``True``.
1145 Since :meth:`~Pattern.match` and :meth:`~Pattern.search` return ``None``
1146 when there is no match, you can test whether there was a match with a simple
1149    match = re.search(pattern, string)
1150    if match:
1151        process(match)
1153 Match objects support the following methods and attributes:
1156 .. method:: Match.expand(template)
1168 .. method:: Match.group([group1, ...])
1170    Returns one or more subgroups of the match.  If there is a single argument, the
1173    (the whole match is returned). If a *groupN* argument is zero, the corresponding
1178    part of the pattern that did not match, the corresponding result is ``None``.
1180    the last match is returned. ::
1182       >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
1183       >>> m.group(0)       # The entire match
1199       >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
1212    If a group matches multiple times, only the last match is accessible::
1214       >>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
1215       >>> m.group(1)                        # Returns only the last match.
1219 .. method:: Match.__getitem__(g)
1222    an individual group from a match::
1224       >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
1225       >>> m[0]       # The entire match
1235 .. method:: Match.groups(default=None)
1237    Return a tuple containing all the subgroups of the match, from 1 up to however
1239    did not participate in the match; it defaults to ``None``.
1243       >>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
1248    might participate in the match.  These groups will default to ``None`` unless
1251       >>> m = re.match(r"(\d+)\.?(\d+)?", "24")
1258 .. method:: Match.groupdict(default=None)
1260    Return a dictionary containing all the *named* subgroups of the match, keyed by
1262    participate in the match; it defaults to ``None``.  For example::
1264       >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
1269 .. method:: Match.start([group])
1270             Match.end([group])
1273    *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
1274    *group* exists but did not contribute to the match.  For a match object *m*, and
1275    a group *g* that did contribute to the match, the substring matched by group *g*
1293 .. method:: Match.span([group])
1295    For a match *m*, return the 2-tuple ``(m.start(group), m.end(group))``. Note
1296    that if *group* did not contribute to the match, this is ``(-1, -1)``.
1297    *group* defaults to zero, the entire match.
1300 .. attribute:: Match.pos
1303    :meth:`~Pattern.match` method of a :ref:`regex object <re-objects>`.  This is
1304    the index into the string at which the RE engine started looking for a match.
1307 .. attribute:: Match.endpos
1310    :meth:`~Pattern.match` method of a :ref:`regex object <re-objects>`.  This is
1314 .. attribute:: Match.lastindex
1323 .. attribute:: Match.lastgroup
1329 .. attribute:: Match.re
1331    The :ref:`regular expression object <re-objects>` whose :meth:`~Pattern.match` or
1332    :meth:`~Pattern.search` method produced this match instance.
1335 .. attribute:: Match.string
1337    The string passed to :meth:`~Pattern.match` or :meth:`~Pattern.search`.
1341    Added support of :func:`copy.copy` and :func:`copy.deepcopy`.  Match objects
1345 .. _re-examples:
1348 ---------------------------
1354 In this example, we'll use the following helper function to display match
1357    def displaymatch(match):
1358        if match is None:
1360        return '<Match: %r, groups=%r>' % (match.group(), match.groups())
1363 a 5-character string with each character representing a card, "a" for ace, "k"
1369    >>> valid = re.compile(r"^[a2-9tjqk]{5}$")
1370    >>> displaymatch(valid.match("akt5q"))  # Valid.
1371    "<Match: 'akt5q', groups=()>"
1372    >>> displaymatch(valid.match("akt5e"))  # Invalid.
1373    >>> displaymatch(valid.match("akt"))    # Invalid.
1374    >>> displaymatch(valid.match("727ak"))  # Valid.
1375    "<Match: '727ak', groups=()>"
1378 To match this with a regular expression, one could use backreferences as such::
1381    >>> displaymatch(pair.match("717ak"))     # Pair of 7s.
1382    "<Match: '717', groups=('7',)>"
1383    >>> displaymatch(pair.match("718ak"))     # No pairs.
1384    >>> displaymatch(pair.match("354aa"))     # Pair of aces.
1385    "<Match: '354aa', groups=('a',)>"
1388 :meth:`~Match.group` method of the match object in the following manner::
1391    >>> pair.match("717ak").group(1)
1394    # Error because re.match() returns None, which doesn't have a group() method:
1395    >>> pair.match("718ak").group(1)
1398        re.match(r".*(.).*\1", "718ak").group(1)
1401    >>> pair.match("354aa").group(1)
1412 :c:func:`scanf` format strings.  The table below offers some more-or-less
1416 +--------------------------------+---------------------------------------------+
1420 +--------------------------------+---------------------------------------------+
1422 +--------------------------------+---------------------------------------------+
1423 | ``%d``                         | ``[-+]?\d+``                                |
1424 +--------------------------------+---------------------------------------------+
1425 | ``%e``, ``%E``, ``%f``, ``%g`` | ``[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?`` |
1426 +--------------------------------+---------------------------------------------+
1427 | ``%i``                         | ``[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)``     |
1428 +--------------------------------+---------------------------------------------+
1429 | ``%o``                         | ``[-+]?[0-7]+``                             |
1430 +--------------------------------+---------------------------------------------+
1432 +--------------------------------+---------------------------------------------+
1434 +--------------------------------+---------------------------------------------+
1435 | ``%x``, ``%X``                 | ``[-+]?(0[xX])?[\dA-Fa-f]+``                |
1436 +--------------------------------+---------------------------------------------+
1440    /usr/sbin/sendmail - 0 errors, 4 warnings
1444    %s - %d errors, %d warnings
1448    (\S+) - (\d+) errors, (\d+) warnings
1451 .. _search-vs-match:
1453 search() vs. match()
1459 :func:`re.match` checks for a match only at the beginning of the string, while
1460 :func:`re.search` checks for a match anywhere in the string (this is what Perl
1465    >>> re.match("c", "abcdef")    # No match
1466    >>> re.search("c", "abcdef")   # Match
1467    <re.Match object; span=(2, 3), match='c'>
1470 restrict the match at the beginning of the string::
1472    >>> re.match("c", "abcdef")    # No match
1473    >>> re.search("^c", "abcdef")  # No match
1474    >>> re.search("^a", "abcdef")  # Match
1475    <re.Match object; span=(0, 1), match='a'>
1477 Note however that in :const:`MULTILINE` mode :func:`match` only matches at the
1479 beginning with ``'^'`` will match at the beginning of each line. ::
1481    >>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
1482    >>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
1483    <re.Match object; span=(4, 5), match='X'>
1495 triple-quoted string syntax
1583 text, :func:`finditer` is useful as it provides :ref:`match objects
1584 <match-objects>` instead of strings.  Continuing with the previous example, if
1590    ...     print('%02d-%02d: %s' % (m.start(), m.end(), m.group(0)))
1591    07-16: carefully
1592    40-47: quickly
1603    >>> re.match(r"\W(.)\1\W", " ff ")
1604    <re.Match object; span=(0, 4), match=' ff '>
1605    >>> re.match("\\W(.)\\1\\W", " ff ")
1606    <re.Match object; span=(0, 4), match=' ff '>
1608 When one wants to match a literal backslash, it must be escaped in the regular
1613    >>> re.match(r"\\", r"\\")
1614    <re.Match object; span=(0, 1), match='\\'>
1615    >>> re.match("\\\\", r"\\")
1616    <re.Match object; span=(0, 1), match='\\'>
1645             ('ID',       r'[A-Za-z]+'),    # Identifiers
1646             ('OP',       r'[+\-*/]'),      # Arithmetic operators
1657             column = mo.start() - line_start