lib.rs - OpenGrok cross reference for /third_party/rust/crates/regex/src/lib.rs

Lines Matching +full:rules +full:- +full:anchors
3 expressions. Its syntax is similar to Perl-style regular expressions, but lacks
33 let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
34 assert!(re.is_match("2014-01-01"));
37 Notice the use of the `^` and `$` anchors. In this crate, every expression
39 it to match anywhere in the text. Anchors can be used to ensure that the
43 [raw strings](https://doc.rust-lang.org/stable/reference/tokens.html#raw-string-literals)
51 It is an anti-pattern to compile the same regular expression in a loop
68 fn some_helper_function(text: &str) -> bool {
84 repeatedly against a search string to find successive non-overlapping
91 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
92 let text = "2012-03-14, 2013-01-01 and 2014-07-05";
116 let re = Regex::new(r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})").unwrap();
117 let before = "2012-03-14, 2013-01-01 and 2014-07-05";
135   -
137   -
140 let before = "2012-03-14, 2013-01-01 and 2014-07-05";
149 the `x` flag, e.g., `(?-x: )`.
199 This implementation executes regular expressions **only** on valid UTF-8
201 relax this restriction, use the [`bytes`](bytes/index.html) sub-module.)
204 case-insensitively, the characters are first mapped using the "simple" case
205 folding rules defined by Unicode.
223 * `.` will match any valid UTF-8 encoded Unicode scalar value except for `\n`.
230 * `^` and `$` are **not** Unicode aware in multi-line mode. Namely, they only
250 [UNICODE](https://github.com/rust-lang/regex/blob/master/UNICODE.md)
255 The `bytes` sub-module provides a `Regex` type that can be used to match
256 on `&[u8]`. By default, text is interpreted as UTF-8 just like it is with
258 off the `u` flag, even if doing so could result in matching invalid UTF-8.
262 Disabling the `u` flag is also possible with the standard `&str`-based `Regex`
263 type, but it is only allowed where the UTF-8 invariant is maintained. For
264 example, `(?-u:\w)` is an ASCII-only `\w` character class and is legal in an
265 `&str`-based `Regex`, but `(?-u:\xFF)` will attempt to match the raw byte
266 `\xFF`, which is invalid UTF-8 and therefore is illegal in `&str`-based
273 features](#crate-features).
280 a separate crate, [`regex-syntax`](https://docs.rs/regex-syntax).
288 \pN           One-letter name Unicode character class
290 \PN           Negated one-letter name Unicode character class
299 [a-z]         A character class matching any character in range a-z.
300 [[:alpha:]]   ASCII character class ([A-Za-z])
301 [[:^alpha:]]  Negated ASCII character class ([^A-Za-z])
303 [a-y&&xyz]    Intersection (matching x or y)
304 [0-9&&[^4]]   Subtraction using intersection and negation (matching 0-9 except 4)
305 [0-9--4]      Direct subtraction (matching 0-9 except 4)
306 [a-g~~b-h]    Symmetric difference (matching `a` and `h` only)
316 1. Ranges: `a-cd` == `[a-c]d`
318 3. Intersection: `^a-z&&b` == `^[a-z&&b]`
348 ^     the beginning of text (or start-of-line with multi-line mode)
349 $     the end of text (or end-of-line with multi-line mode)
350 \A    only the beginning of text (even with multi-line mode enabled)
351 \z    only the end of text (even with multi-line mode enabled)
363 (?P&lt;name&gt;exp)  named (also numbered) capture group (allowed chars: [_0-9a-zA-Z.\[\]])
364 (?:exp)        non-capturing group
366 (?flags:exp)   set flags for exp (non-capturing)
370 and `(?-x)` clears the flag `x`. Multiple flags can be set or cleared at
371 the same time: `(?xy)` sets both the `x` and `y` flags and `(?x-y)` sets
377 i     case-insensitive: letters match both upper and lower case
378 m     multi-line mode: ^ and $ match begin/end of line
386 case-insensitively for the first part but case-sensitively for the second part:
391 let re = Regex::new(r"(?i)a+(?-i)b+").unwrap();
400 Multi-line mode means `^` and `$` no longer match just at the beginning/end of
425 let re = Regex::new(r"(?-u:\b).+(?-u:\b)").unwrap();
467 [[:alnum:]]    alphanumeric ([0-9A-Za-z])
468 [[:alpha:]]    alphabetic ([A-Za-z])
469 [[:ascii:]]    ASCII ([\x00-\x7F])
471 [[:cntrl:]]    control ([\x00-\x1F\x7F])
472 [[:digit:]]    digits ([0-9])
473 [[:graph:]]    graphical ([!-~])
474 [[:lower:]]    lower case ([a-z])
475 [[:print:]]    printable ([ -~])
476 [[:punct:]]    punctuation ([!-/:-@\[-`{-~])
478 [[:upper:]]    upper case ([A-Z])
479 [[:word:]]     word characters ([0-9A-Za-z_])
480 [[:xdigit:]]   hex digit ([0-9A-Fa-f])
499 `unicode-case` feature (described below), then compiling the regex `(?i)a`
501 callers must use `(?i-u)a` instead to disable Unicode case folding. Stated
510 * **std** -
513   intended to add `alloc`-only support to regex in the future.
517 * **perf** -
521 * **perf-dfa** -
523   portions of a regex to a very fast DFA on an as-needed basis. This can
527 * **perf-inline** -
531 * **perf-literal** -
534   magnitude. Disabling this drops the `aho-corasick` and `memchr` dependencies.
535 * **perf-cache** -
543 * **unicode** -
546 * **unicode-age** -
548   [Unicode `Age` property](https://www.unicode.org/reports/tr44/tr44-24.html#Character_Age).
551 * **unicode-bool** -
555 * **unicode-case** -
558 * **unicode-gencat** -
560 …[Unicode general categories](https://www.unicode.org/reports/tr44/tr44-24.html#General_Category_Va…
563 * **unicode-perl** -
564   Provide the data for supporting the Unicode-aware Perl character classes,
566   Unicode-aware word boundary assertions. Note that if this feature is
568   `unicode-bool` and `unicode-gencat` features are enabled, respectively.
569 * **unicode-script** -
574 * **unicode-segment** -
594 text`), which means there's no way to cause exponential blow-up like with
596 features like arbitrary look-ahead and backreferences.)
598 When a DFA is used, pathological cases with exponential state blow-up are
619 // TODO: Re-enable this once the MSRV is 1.43 or greater.
620 // See: https://github.com/rust-lang/regex/issues/684
621 // See: https://github.com/rust-lang/regex/issues/685
644 top-level of this crate. There are two important differences:
649 matching invalid UTF-8 bytes.
653 This shows how to find all null-terminated strings in a slice of bytes:
657 let re = Regex::new(r"(?-u)(?P<cstr>[^\x00]+)\x00").unwrap();
671 This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded
678     r"(?-u)\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))"
683 // Notice that despite the `.*` at the end, it will only match valid UTF-8
689 // If there was a match, Unicode mode guarantees that `title` is valid UTF-8.
696 UTF-8.
705 match invalid UTF-8. When the `u` flag is disabled, the regex is said to be in
717 matches its UTF-8 encoding of `\xC3\xBF`. Similarly for octal notation when
737 #[cfg(feature = "perf-dfa")]
759 /// testing different matching engines and supporting the `regex-debug` CLI