Lines Matching +full:- +full:match
1 :mod:`re` --- Regular expression operations
12 --------------
18 as well as 8-bit strings (:class:`bytes`).
19 However, Unicode strings and 8-bit strings cannot be mixed:
20 that is, you cannot match a Unicode string with a byte pattern or
21 vice-versa; similarly, when asking for a substitution, the replacement
27 character for the same purpose in string literals; for example, to match
38 prefixed with ``'r'``. So ``r"\n"`` is a two-character string containing
39 ``'\'`` and ``'n'``, while ``"\n"`` is a one-character string containing a
44 module-level functions and methods on
45 :ref:`compiled regular expressions <re-objects>`. The functions are shortcuts
47 fine-tuning parameters.
51 The third-party `regex <https://pypi.org/project/regex/>`_ module,
56 .. _re-syntax:
59 -------------------------
69 string *pq* will match AB. This holds unless *A* or *B* contain low precedence
77 information and a gentler presentation, consult the :ref:`regex-howto`.
81 expressions; they simply match themselves. You can concatenate ordinary
91 directly nested. This avoids ambiguity with the non-greedy modifier suffix
126 Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
127 many repetitions as are possible. ``ab*`` will match 'a', 'ab', or 'a' followed
133 Causes the resulting RE to match 1 or more repetitions of the preceding RE.
134 ``ab+`` will match 'a' followed by any non-zero number of 'b's; it will not
135 match just 'a'.
140 Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
141 ``ab?`` will match either 'a' or 'ab'.
149 The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match
151 ``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire
153 perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few*
154 characters as possible will be matched. Using the RE ``<.*?>`` will match
162 matches cause the entire RE not to match. For example, ``a{6}`` will match
166 Causes the resulting RE to match from *m* to *n* repetitions of the preceding
167 RE, attempting to match as many repetitions as possible. For example,
168 ``a{3,5}`` will match from 3 to 5 ``'a'`` characters. Omitting *m* specifies a
170 example, ``a{4,}b`` will match ``'aaaab'`` or a thousand ``'a'`` characters
175 Causes the resulting RE to match from *m* to *n* repetitions of the preceding
176 RE, attempting to match as *few* repetitions as possible. This is the
177 non-greedy version of the previous qualifier. For example, on the
178 6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
179 while ``a{3,5}?`` will only match 3 characters.
184 Either escapes special characters (permitting you to match characters like
202 * Characters can be listed individually, e.g. ``[amk]`` will match ``'a'``,
205 .. index:: single: - (minus); in regular expressions
208 them by a ``'-'``, for example ``[a-z]`` will match any lowercase ASCII letter,
209 ``[0-5][0-9]`` will match all the two-digits numbers from ``00`` to ``59``, and
210 ``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g.
211 ``[a\-z]``) or if it's placed as the first or last character
212 (e.g. ``[-a]`` or ``[a-]``), it will match a literal ``'-'``.
215 ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``,
221 inside a set, although the characters they match depends on whether
228 that are *not* in the set will be matched. For example, ``[^5]`` will match
229 any character except ``'5'``, and ``[^^]`` will match any character except
233 * To match a literal ``']'`` inside a set, precede it with a backslash, or
235 ``[]()[{}]`` will both match a parenthesis.
237 .. .. index:: single: --; in regular expressions
247 character sequences ``'--'``, ``'&&'``, ``'~~'``, and ``'||'``. To
260 will match either *A* or *B*. An arbitrary number of REs can be separated by the
265 produce a longer overall match. In other words, the ``'|'`` operator is never
266 greedy. To match a literal ``'|'``, use ``\|``, or enclose it inside a
274 start and end of a group; the contents of a group can be retrieved after a match
276 special sequence, described below. To match the literals ``'('`` or ``')'``,
291 letters set the corresponding flags: :const:`re.A` (ASCII-only matching),
293 :const:`re.M` (multi-line), :const:`re.S` (dot matches all),
296 (The flags are described in :ref:`contents-of-module-re`.)
305 A non-capturing version of regular parentheses. Matches whatever regular
307 *cannot* be retrieved after performing a match or referenced later in the
310 ``(?aiLmsux-imsx:...)``
312 ``'s'``, ``'u'``, ``'x'``, optionally followed by ``'-'`` followed by
315 :const:`re.A` (ASCII-only matching), :const:`re.I` (ignore case),
316 :const:`re.L` (locale dependent), :const:`re.M` (multi-line),
319 (The flags are described in :ref:`contents-of-module-re`.)
322 as inline flags, so they can't be combined or follow ``'-'``. Instead,
325 ASCII-only matching, and ``(?u:...)`` switches to Unicode matching
327 matching, and ``(?a:...)`` switches to ASCII-only matching (default).
349 +---------------------------------------+----------------------------------+
354 +---------------------------------------+----------------------------------+
355 | when processing match object *m* | * ``m.group('quote')`` |
357 +---------------------------------------+----------------------------------+
361 +---------------------------------------+----------------------------------+
378 called a :dfn:`lookahead assertion`. For example, ``Isaac (?=Asimov)`` will match
384 Matches if ``...`` doesn't match next. This is a :dfn:`negative lookahead assertion`.
385 For example, ``Isaac (?!Asimov)`` will match ``'Isaac '`` only if it's *not*
391 Matches if the current position in the string is preceded by a match for ``...``
393 assertion`. ``(?<=abc)def`` will find a match in ``'abcdef'``, since the
395 The contained pattern must only match strings of some fixed length, meaning that
397 patterns which start with positive lookbehind assertions will not match at the
399 :func:`search` function rather than the :func:`match` function:
408 >>> m = re.search(r'(?<=-)\w+', 'spam-egg')
418 Matches if the current position in the string is not preceded by a match for
420 positive lookbehind assertions, the contained pattern must only match strings of
422 match at the beginning of the string being searched.
424 ``(?(id/name)yes-pattern|no-pattern)``
425 Will try to match with ``yes-pattern`` if the group with given *id* or
426 *name* exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is
429 will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but
435 resulting RE will match the second character. For example, ``\$`` matches the
444 can only be used to match one of the first 99 groups. If the first digit of
446 a group match, but as the character with octal value *number*. Inside the
487 Unicode character category [Nd]). This includes ``[0-9]``, and
489 used only ``[0-9]`` is matched.
491 For 8-bit (bytes) patterns:
492 Matches any decimal digit; this is equivalent to ``[0-9]``.
499 becomes the equivalent of ``[^0-9]``.
507 non-breaking spaces mandated by typography rules in many
511 For 8-bit (bytes) patterns:
529 ``[a-zA-Z0-9_]`` is matched.
531 For 8-bit (bytes) patterns:
533 this is equivalent to ``[a-zA-Z0-9_]``. If the :const:`LOCALE` flag is
542 becomes the equivalent of ``[^a-zA-Z0-9_]``. If the :const:`LOCALE` flag is
595 .. _contents-of-module-re:
598 ---------------
602 regular expressions. Most non-trivial applications always use the compiled
612 <re-objects>`, which can be used for matching using its
613 :func:`~Pattern.match`, :func:`~Pattern.search` and other methods, described
623 result = prog.match(string)
627 result = re.match(pattern, string)
636 :func:`re.compile` and the module-level matching functions are cached, so
645 perform ASCII-only matching instead of full Unicode matching. This is only
665 Perform case-insensitive matching; expressions like ``[A-Z]`` will also
666 match lowercase letters. Full Unicode matching (such as ``Ü`` matching
668 non-ASCII matches. The current locale does not change the effect of this
672 Note that when the Unicode patterns ``[a-z]`` or ``[A-Z]`` are used in
673 combination with the :const:`IGNORECASE` flag, they will match the 52 ASCII
674 letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital
683 Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching
687 works with 8-bit locales. Unicode matching is already enabled by default
717 Make the ``'.'`` special character match any character at all, including a
718 newline; without this flag, ``'.'`` will match anything *except* a newline.
736 This means that the two following regular expression objects that match a
750 *pattern* produces a match, and return a corresponding :ref:`match object
751 <match-objects>`. Return ``None`` if no position in the string matches the
752 pattern; note that this is different from finding a zero-length match at some
756 .. function:: match(pattern, string, flags=0)
758 If zero or more characters at the beginning of *string* match the regular
759 expression *pattern*, return a corresponding :ref:`match object
760 <match-objects>`. Return ``None`` if the string does not match the pattern;
761 note that this is different from a zero-length match.
763 Note that even in :const:`MULTILINE` mode, :func:`re.match` will only match
766 If you want to locate a match anywhere in *string*, use :func:`search`
767 instead (see also :ref:`search-vs-match`).
773 corresponding :ref:`match object <match-objects>`. Return ``None`` if the
774 string does not match the pattern; note that this is different from a
775 zero-length match.
794 >>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
808 to a previous empty match.
821 Added support of splitting on a pattern that could match an empty string.
826 Return all non-overlapping matches of *pattern* in *string*, as a list of
827 strings or tuples. The *string* is scanned left-to-right, and matches
834 of tuples of strings matching the groups. Non-capturing groups do not
837 >>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
843 Non-empty matches can now start just after a previous empty match.
848 Return an :term:`iterator` yielding :ref:`match objects <match-objects>` over
849 all non-overlapping matches for the RE *pattern* in *string*. The *string*
850 is scanned left-to-right, and matches are returned in the order found. Empty
854 Non-empty matches can now start just after a previous empty match.
859 Return the string obtained by replacing the leftmost non-overlapping occurrences
870 >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
875 If *repl* is a function, it is called for every non-overlapping occurrence of
876 *pattern*. The function takes a single :ref:`match object <match-objects>`
880 ... if matchobj.group(0) == '-': return ' '
881 ... else: return '-'
882 >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
883 'pro--gram files'
887 The pattern may be a string or a :ref:`pattern object <re-objects>`.
890 replaced; *count* must be a non-negative integer. If omitted or zero, all
892 when not adjacent to a previous empty match, so ``sub('x*', '-', 'abxd')`` returns
893 ``'-a-b--d-'``.
897 In string-type *repl* arguments, in addition to the character escapes and
923 non-empty match.
941 This is useful if you want to match an arbitrary literal string that may
947 >>> legal_chars = string.ascii_lowercase + string.digits + "!#$%&'*+-.^_`|~:"
949 [abcdefghijklmnopqrstuvwxyz0123456789!\#\$%\&'\*\+\-\.\^_`\|\~:]+
951 >>> operators = ['+', '-', '*', '/', '**']
953 /|\-|\+|\*\*|\*
959 >>> sample = '/usr/sbin/sendmail - 0 errors, 12 warnings'
961 /usr/sbin/sendmail - \d+ errors, \d+ warnings
983 error if a string contains no match for a pattern. The error instance has
1009 .. _re-objects:
1012 --------------------------
1020 expression produces a match, and return a corresponding :ref:`match object
1021 <match-objects>`. Return ``None`` if no position in the string matches the
1022 pattern; note that this is different from finding a zero-length match at some
1033 from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less
1034 than *pos*, no match will be found; otherwise, if *rx* is a compiled regular
1039 >>> pattern.search("dog") # Match at index 0
1040 <re.Match object; span=(0, 1), match='d'>
1041 >>> pattern.search("dog", 1) # No match; search doesn't include the "d"
1044 .. method:: Pattern.match(string[, pos[, endpos]])
1046 If zero or more characters at the *beginning* of *string* match this regular
1047 expression, return a corresponding :ref:`match object <match-objects>`.
1048 Return ``None`` if the string does not match the pattern; note that this is
1049 different from a zero-length match.
1055 >>> pattern.match("dog") # No match as "o" is not at the start of "dog".
1056 >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog".
1057 <re.Match object; span=(1, 2), match='o'>
1059 If you want to locate a match anywhere in *string*, use
1060 :meth:`~Pattern.search` instead (see also :ref:`search-vs-match`).
1066 :ref:`match object <match-objects>`. Return ``None`` if the string does not
1067 match the pattern; note that this is different from a zero-length match.
1073 >>> pattern.fullmatch("dog") # No match as "o" is not at the start of "dog".
1074 >>> pattern.fullmatch("ogre") # No match as not the full string matches.
1076 <re.Match object; span=(1, 3), match='og'>
1139 .. _match-objects:
1141 Match Objects
1142 -------------
1144 Match objects always have a boolean value of ``True``.
1145 Since :meth:`~Pattern.match` and :meth:`~Pattern.search` return ``None``
1146 when there is no match, you can test whether there was a match with a simple
1149 match = re.search(pattern, string)
1150 if match:
1151 process(match)
1153 Match objects support the following methods and attributes:
1156 .. method:: Match.expand(template)
1168 .. method:: Match.group([group1, ...])
1170 Returns one or more subgroups of the match. If there is a single argument, the
1173 (the whole match is returned). If a *groupN* argument is zero, the corresponding
1178 part of the pattern that did not match, the corresponding result is ``None``.
1180 the last match is returned. ::
1182 >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
1183 >>> m.group(0) # The entire match
1199 >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
1212 If a group matches multiple times, only the last match is accessible::
1214 >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
1215 >>> m.group(1) # Returns only the last match.
1219 .. method:: Match.__getitem__(g)
1222 an individual group from a match::
1224 >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
1225 >>> m[0] # The entire match
1235 .. method:: Match.groups(default=None)
1237 Return a tuple containing all the subgroups of the match, from 1 up to however
1239 did not participate in the match; it defaults to ``None``.
1243 >>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
1248 might participate in the match. These groups will default to ``None`` unless
1251 >>> m = re.match(r"(\d+)\.?(\d+)?", "24")
1258 .. method:: Match.groupdict(default=None)
1260 Return a dictionary containing all the *named* subgroups of the match, keyed by
1262 participate in the match; it defaults to ``None``. For example::
1264 >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
1269 .. method:: Match.start([group])
1270 Match.end([group])
1273 *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
1274 *group* exists but did not contribute to the match. For a match object *m*, and
1275 a group *g* that did contribute to the match, the substring matched by group *g*
1293 .. method:: Match.span([group])
1295 For a match *m*, return the 2-tuple ``(m.start(group), m.end(group))``. Note
1296 that if *group* did not contribute to the match, this is ``(-1, -1)``.
1297 *group* defaults to zero, the entire match.
1300 .. attribute:: Match.pos
1303 :meth:`~Pattern.match` method of a :ref:`regex object <re-objects>`. This is
1304 the index into the string at which the RE engine started looking for a match.
1307 .. attribute:: Match.endpos
1310 :meth:`~Pattern.match` method of a :ref:`regex object <re-objects>`. This is
1314 .. attribute:: Match.lastindex
1323 .. attribute:: Match.lastgroup
1329 .. attribute:: Match.re
1331 The :ref:`regular expression object <re-objects>` whose :meth:`~Pattern.match` or
1332 :meth:`~Pattern.search` method produced this match instance.
1335 .. attribute:: Match.string
1337 The string passed to :meth:`~Pattern.match` or :meth:`~Pattern.search`.
1341 Added support of :func:`copy.copy` and :func:`copy.deepcopy`. Match objects
1345 .. _re-examples:
1348 ---------------------------
1354 In this example, we'll use the following helper function to display match
1357 def displaymatch(match):
1358 if match is None:
1360 return '<Match: %r, groups=%r>' % (match.group(), match.groups())
1363 a 5-character string with each character representing a card, "a" for ace, "k"
1369 >>> valid = re.compile(r"^[a2-9tjqk]{5}$")
1370 >>> displaymatch(valid.match("akt5q")) # Valid.
1371 "<Match: 'akt5q', groups=()>"
1372 >>> displaymatch(valid.match("akt5e")) # Invalid.
1373 >>> displaymatch(valid.match("akt")) # Invalid.
1374 >>> displaymatch(valid.match("727ak")) # Valid.
1375 "<Match: '727ak', groups=()>"
1378 To match this with a regular expression, one could use backreferences as such::
1381 >>> displaymatch(pair.match("717ak")) # Pair of 7s.
1382 "<Match: '717', groups=('7',)>"
1383 >>> displaymatch(pair.match("718ak")) # No pairs.
1384 >>> displaymatch(pair.match("354aa")) # Pair of aces.
1385 "<Match: '354aa', groups=('a',)>"
1388 :meth:`~Match.group` method of the match object in the following manner::
1391 >>> pair.match("717ak").group(1)
1394 # Error because re.match() returns None, which doesn't have a group() method:
1395 >>> pair.match("718ak").group(1)
1398 re.match(r".*(.).*\1", "718ak").group(1)
1401 >>> pair.match("354aa").group(1)
1412 :c:func:`scanf` format strings. The table below offers some more-or-less
1416 +--------------------------------+---------------------------------------------+
1420 +--------------------------------+---------------------------------------------+
1422 +--------------------------------+---------------------------------------------+
1423 | ``%d`` | ``[-+]?\d+`` |
1424 +--------------------------------+---------------------------------------------+
1425 | ``%e``, ``%E``, ``%f``, ``%g`` | ``[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?`` |
1426 +--------------------------------+---------------------------------------------+
1427 | ``%i`` | ``[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)`` |
1428 +--------------------------------+---------------------------------------------+
1429 | ``%o`` | ``[-+]?[0-7]+`` |
1430 +--------------------------------+---------------------------------------------+
1432 +--------------------------------+---------------------------------------------+
1434 +--------------------------------+---------------------------------------------+
1435 | ``%x``, ``%X`` | ``[-+]?(0[xX])?[\dA-Fa-f]+`` |
1436 +--------------------------------+---------------------------------------------+
1440 /usr/sbin/sendmail - 0 errors, 4 warnings
1444 %s - %d errors, %d warnings
1448 (\S+) - (\d+) errors, (\d+) warnings
1451 .. _search-vs-match:
1453 search() vs. match()
1459 :func:`re.match` checks for a match only at the beginning of the string, while
1460 :func:`re.search` checks for a match anywhere in the string (this is what Perl
1465 >>> re.match("c", "abcdef") # No match
1466 >>> re.search("c", "abcdef") # Match
1467 <re.Match object; span=(2, 3), match='c'>
1470 restrict the match at the beginning of the string::
1472 >>> re.match("c", "abcdef") # No match
1473 >>> re.search("^c", "abcdef") # No match
1474 >>> re.search("^a", "abcdef") # Match
1475 <re.Match object; span=(0, 1), match='a'>
1477 Note however that in :const:`MULTILINE` mode :func:`match` only matches at the
1479 beginning with ``'^'`` will match at the beginning of each line. ::
1481 >>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match
1482 >>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
1483 <re.Match object; span=(4, 5), match='X'>
1495 triple-quoted string syntax
1583 text, :func:`finditer` is useful as it provides :ref:`match objects
1584 <match-objects>` instead of strings. Continuing with the previous example, if
1590 ... print('%02d-%02d: %s' % (m.start(), m.end(), m.group(0)))
1591 07-16: carefully
1592 40-47: quickly
1603 >>> re.match(r"\W(.)\1\W", " ff ")
1604 <re.Match object; span=(0, 4), match=' ff '>
1605 >>> re.match("\\W(.)\\1\\W", " ff ")
1606 <re.Match object; span=(0, 4), match=' ff '>
1608 When one wants to match a literal backslash, it must be escaped in the regular
1613 >>> re.match(r"\\", r"\\")
1614 <re.Match object; span=(0, 1), match='\\'>
1615 >>> re.match("\\\\", r"\\")
1616 <re.Match object; span=(0, 1), match='\\'>
1645 ('ID', r'[A-Za-z]+'), # Identifiers
1646 ('OP', r'[+\-*/]'), # Arithmetic operators
1657 column = mo.start() - line_start