lib.rs - OpenGrok cross reference for /external/rust/crates/regex/src/lib.rs

Lines Matching +full:regex +full:- +full:not
3 expressions. Its syntax is similar to Perl-style regular expressions, but lacks
13 documentation for the [`Regex`](struct.Regex.html) type.
17 This crate is [on crates.io](https://crates.io/crates/regex) and can be
18 used by adding `regex` to your dependencies in your project's `Cargo.toml`.
22 regex = "1"
32 use regex::Regex;
33 let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
34 assert!(re.is_match("2014-01-01"));
43 [raw strings](https://doc.rust-lang.org/stable/reference/tokens.html#raw-string-literals)
46 not process any escape sequences. For example, `"\\d"` is the same
49 # Example: Avoid compiling the same regex in a loop
51 It is an anti-pattern to compile the same regular expression in a loop
54 regex.) Not only is compilation itself expensive, but this also prevents
66 use regex::Regex;
68 fn some_helper_function(text: &str) -> bool {
70         static ref RE: Regex = Regex::new("...").unwrap();
78 Specifically, in this example, the regex will be compiled when it is used for
84 repeatedly against a search string to find successive non-overlapping
89 # use regex::Regex;
91 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
92 let text = "2012-03-14, 2013-01-01 and 2014-07-05";
114 # use regex::Regex;
116 let re = Regex::new(r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})").unwrap();
117 let before = "2012-03-14, 2013-01-01 and 2014-07-05";
125 `Regex::replace` for more details.)
127 Note that if your regex gets complicated, you can use the `x` flag to
131 # use regex::Regex;
133 let re = Regex::new(r"(?x)
135   -
137   -
140 let before = "2012-03-14, 2013-01-01 and 2014-07-05";
149 the `x` flag, e.g., `(?-x: )`.
157 use regex::RegexSet;
173 // You can also test whether a particular regex matched:
191 not to do it if you don't need to.
199 This implementation executes regular expressions **only** on valid UTF-8
201 relax this restriction, use the [`bytes`](bytes/index.html) sub-module.)
204 case-insensitively, the characters are first mapped using the "simple" case
212 # use regex::Regex;
214 let re = Regex::new(r"(?i)Δ+").unwrap();
223 * `.` will match any valid UTF-8 encoded Unicode scalar value except for `\n`.
230 * `^` and `$` are **not** Unicode aware in multi-line mode. Namely, they only
231   recognize `\n` and not any of the other forms of line terminators defined
239 # use regex::Regex;
241 let re = Regex::new(r"[\pN\p{Greek}\p{Cherokee}]+").unwrap();
250 [UNICODE](https://github.com/rust-lang/regex/blob/master/UNICODE.md)
251 document in the root of the regex repository.
255 The `bytes` sub-module provides a `Regex` type that can be used to match
256 on `&[u8]`. By default, text is interpreted as UTF-8 just like it is with
257 the main `Regex` type. However, this behavior can be disabled by turning
258 off the `u` flag, even if doing so could result in matching invalid UTF-8.
262 Disabling the `u` flag is also possible with the standard `&str`-based `Regex`
263 type, but it is only allowed where the UTF-8 invariant is maintained. For
264 example, `(?-u:\w)` is an ASCII-only `\w` character class and is legal in an
265 `&str`-based `Regex`, but `(?-u:\xFF)` will attempt to match the raw byte
266 `\xFF`, which is invalid UTF-8 and therefore is illegal in `&str`-based
273 features](#crate-features).
280 a separate crate, [`regex-syntax`](https://docs.rs/regex-syntax).
287 \D            not digit
288 \pN           One-letter name Unicode character class
290 \PN           Negated one-letter name Unicode character class
299 [a-z]         A character class matching any character in range a-z.
300 [[:alpha:]]   ASCII character class ([A-Za-z])
301 [[:^alpha:]]  Negated ASCII character class ([^A-Za-z])
303 [a-y&&xyz]    Intersection (matching x or y)
304 [0-9&&[^4]]   Subtraction using intersection and negation (matching 0-9 except 4)
305 [0-9--4]      Direct subtraction (matching 0-9 except 4)
306 [a-g~~b-h]    Symmetric difference (matching `a` and `h` only)
316 1. Ranges: `a-cd` == `[a-c]d`
318 3. Intersection: `^a-z&&b` == `^[a-z&&b]`
348 ^     the beginning of text (or start-of-line with multi-line mode)
349 $     the end of text (or end-of-line with multi-line mode)
350 \A    only the beginning of text (even with multi-line mode enabled)
351 \z    only the end of text (even with multi-line mode enabled)
353 \B    not a Unicode word boundary
356 The empty regex is valid and matches the empty string. For example, the empty
357 regex matches `abc` at positions `0`, `1`, `2` and `3`.
363 (?P&lt;name&gt;exp)  named (also numbered) capture group (allowed chars: [_0-9a-zA-Z.\[\]])
364 (?:exp)        non-capturing group
366 (?flags:exp)   set flags for exp (non-capturing)
370 and `(?-x)` clears the flag `x`. Multiple flags can be set or cleared at
371 the same time: `(?xy)` sets both the `x` and `y` flags and `(?x-y)` sets
377 i     case-insensitive: letters match both upper and lower case
378 m     multi-line mode: ^ and $ match begin/end of line
386 case-insensitively for the first part but case-sensitively for the second part:
389 # use regex::Regex;
391 let re = Regex::new(r"(?i)a+(?-i)b+").unwrap();
400 Multi-line mode means `^` and `$` no longer match just at the beginning/end of
404 # use regex::Regex;
405 let re = Regex::new(r"(?m)^line \d+").unwrap();
413 # use regex::Regex;
414 let re = Regex::new(r"(?m)^").unwrap();
423 # use regex::Regex;
425 let re = Regex::new(r"(?-u:\b).+(?-u:\b)").unwrap();
457 \D     not digit
459 \S     not whitespace
461 \W     not word character
467 [[:alnum:]]    alphanumeric ([0-9A-Za-z])
468 [[:alpha:]]    alphabetic ([A-Za-z])
469 [[:ascii:]]    ASCII ([\x00-\x7F])
471 [[:cntrl:]]    control ([\x00-\x1F\x7F])
472 [[:digit:]]    digits ([0-9])
473 [[:graph:]]    graphical ([!-~])
474 [[:lower:]]    lower case ([a-z])
475 [[:print:]]    printable ([ -~])
476 [[:punct:]]    punctuation ([!-/:-@\[-`{-~])
478 [[:upper:]]    upper case ([A-Z])
479 [[:word:]]     word characters ([0-9A-Za-z_])
480 [[:xdigit:]]   hex digit ([0-9A-Fa-f])
485 By default, this crate tries pretty hard to make regex matching both as fast
489 and longer compile times.  This trade off may not be appropriate in all cases,
491 is still left with a perfectly serviceable regex engine that will work well
499 `unicode-case` feature (described below), then compiling the regex `(?i)a`
501 callers must use `(?i-u)a` instead to disable Unicode case folding. Stated
510 * **std** -
511   When enabled, this will cause `regex` to use the standard library. Currently,
513   intended to add `alloc`-only support to regex in the future.
517 * **perf** -
521 * **perf-dfa** -
523   portions of a regex to a very fast DFA on an as-needed basis. This can
525   haystacks. The lazy DFA does not bring in any new dependencies, but it can
527 * **perf-inline** -
531 * **perf-literal** -
534   magnitude. Disabling this drops the `aho-corasick` and `memchr` dependencies.
535 * **perf-cache** -
543 * **unicode** -
546 * **unicode-age** -
548   [Unicode `Age` property](https://www.unicode.org/reports/tr44/tr44-24.html#Character_Age).
551 * **unicode-bool** -
553   is not included here, but contains properties like `Alphabetic`, `Emoji`,
555 * **unicode-case** -
558 * **unicode-gencat** -
560 …[Unicode general categories](https://www.unicode.org/reports/tr44/tr44-24.html#General_Category_Va…
561   This includes, but is not limited to, `Decimal_Number`, `Letter`,
563 * **unicode-perl** -
564   Provide the data for supporting the Unicode-aware Perl character classes,
566   Unicode-aware word boundary assertions. Note that if this feature is
568   `unicode-bool` and `unicode-gencat` features are enabled, respectively.
569 * **unicode-script** -
572   This includes, but is not limited to, `Arabic`, `Cyrillic`, `Hebrew`,
574 * **unicode-segment** -
593 crate have time complexity `O(mn)` (with `m ~ regex` and `n ~ search
594 text`), which means there's no way to cause exponential blow-up like with
596 features like arbitrary look-ahead and backreferences.)
598 When a DFA is used, pathological cases with exponential state blow-up are
615 #[cfg(not(feature = "std"))]
619 // TODO: Re-enable this once the MSRV is 1.43 or greater.
620 // See: https://github.com/rust-lang/regex/issues/684
621 // See: https://github.com/rust-lang/regex/issues/685
636     Locations, Match, Matches, NoExpand, Regex, Replacer, ReplacerRef, Split,
644 top-level of this crate. There are two important differences:
649 matching invalid UTF-8 bytes.
653 This shows how to find all null-terminated strings in a slice of bytes:
656 # use regex::bytes::Regex;
657 let re = Regex::new(r"(?-u)(?P<cstr>[^\x00]+)\x00").unwrap();
671 This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded
676 # use regex::bytes::Regex;
677 let re = Regex::new(
678     r"(?-u)\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))"
683 // Notice that despite the `.*` at the end, it will only match valid UTF-8
689 // If there was a match, Unicode mode guarantees that `title` is valid UTF-8.
696 UTF-8.
704 1. The `u` flag can be disabled even when disabling it might cause the regex to
705 match invalid UTF-8. When the `u` flag is disabled, the regex is said to be in
713 determine whether a byte is a word byte or not.
717 matches its UTF-8 encoding of `\xC3\xBF`. Similarly for octal notation when
737 #[cfg(feature = "perf-dfa")]
759 /// testing different matching engines and supporting the `regex-debug` CLI