lib.rs - OpenGrok cross reference for /third_party/rust/crates/regex/src/lib.rs

Lines Matching +full:crate +full:- +full:example
2 This crate provides a library for parsing, compiling, and executing regular
3 expressions. Its syntax is similar to Perl-style regular expressions, but lacks
8 This crate's documentation provides some simple examples, describes
17 This crate is [on crates.io](https://crates.io/crates/regex) and can be
25 # Example: find a date
28 expression and then using it to search, split or replace text. For example,
33 let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
34 assert!(re.is_match("2014-01-01"));
37 Notice the use of the `^` and `$` anchors. In this crate, every expression
42 This example also demonstrates the utility of
43 [raw strings](https://doc.rust-lang.org/stable/reference/tokens.html#raw-string-literals)
46 not process any escape sequences. For example, `"\\d"` is the same
49 # Example: Avoid compiling the same regex in a loop
51 It is an anti-pattern to compile the same regular expression in a loop
59 [`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
62 For example:
68 fn some_helper_function(text: &str) -> bool {
78 Specifically, in this example, the regex will be compiled when it is used for
81 # Example: iterating over capture groups
83 This crate provides convenient iterators for matching an expression
84 repeatedly against a search string to find successive non-overlapping
85 matches. For example, to find all dates in a string and be able to access
91 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
92 let text = "2012-03-14, 2013-01-01 and 2014-07-05";
106 # Example: replacement with named capture groups
108 Building on the previous example, perhaps we'd like to rearrange the date
116 let re = Regex::new(r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})").unwrap();
117 let before = "2012-03-14, 2013-01-01 and 2014-07-05";
135   -
137   -
140 let before = "2012-03-14, 2013-01-01 and 2014-07-05";
149 the `x` flag, e.g., `(?-x: )`.
151 # Example: match multiple regular expressions simultaneously
188 Generally speaking, this crate could provide a function to answer only #3,
193 Therefore, only use what you need. For example, don't use `find` if you
199 This implementation executes regular expressions **only** on valid UTF-8
201 relax this restriction, use the [`bytes`](bytes/index.html) sub-module.)
204 case-insensitively, the characters are first mapped using the "simple" case
220 Most features of the regular expressions in this crate are Unicode aware. Here
223 * `.` will match any valid UTF-8 encoded Unicode scalar value except for `\n`.
225 * `\w`, `\d` and `\s` are Unicode aware. For example, `\s` will match all forms
230 * `^` and `$` are **not** Unicode aware in multi-line mode. Namely, they only
235 of boolean properties are available as character classes. For example, you can
250 [UNICODE](https://github.com/rust-lang/regex/blob/master/UNICODE.md)
255 The `bytes` sub-module provides a `Regex` type that can be used to match
256 on `&[u8]`. By default, text is interpreted as UTF-8 just like it is with
258 off the `u` flag, even if doing so could result in matching invalid UTF-8.
259 For example, when the `u` flag is disabled, `.` will match any byte instead
262 Disabling the `u` flag is also possible with the standard `&str`-based `Regex`
263 type, but it is only allowed where the UTF-8 invariant is maintained. For
264 example, `(?-u:\w)` is an ASCII-only `\w` character class and is legal in an
265 `&str`-based `Regex`, but `(?-u:\xFF)` will attempt to match the raw byte
266 `\xFF`, which is invalid UTF-8 and therefore is illegal in `&str`-based
270 tables, this crate exposes knobs to disable the compilation of those
272 compilation times. For details on how to do that, see the section on [crate
273 features](#crate-features).
277 The syntax supported in this crate is documented below.
280 a separate crate, [`regex-syntax`](https://docs.rs/regex-syntax).
288 \pN           One-letter name Unicode character class
290 \PN           Negated one-letter name Unicode character class
299 [a-z]         A character class matching any character in range a-z.
300 [[:alpha:]]   ASCII character class ([A-Za-z])
301 [[:^alpha:]]  Negated ASCII character class ([^A-Za-z])
303 [a-y&&xyz]    Intersection (matching x or y)
304 [0-9&&[^4]]   Subtraction using intersection and negation (matching 0-9 except 4)
305 [0-9--4]      Direct subtraction (matching 0-9 except 4)
306 [a-g~~b-h]    Symmetric difference (matching `a` and `h` only)
311 class. For example, `[\p{Greek}[:digit:]]` matches any Greek or ASCII
316 1. Ranges: `a-cd` == `[a-c]d`
318 3. Intersection: `^a-z&&b` == `^[a-z&&b]`
348 ^     the beginning of text (or start-of-line with multi-line mode)
349 $     the end of text (or end-of-line with multi-line mode)
350 \A    only the beginning of text (even with multi-line mode enabled)
351 \z    only the end of text (even with multi-line mode enabled)
356 The empty regex is valid and matches the empty string. For example, the empty
363 (?P&lt;name&gt;exp)  named (also numbered) capture group (allowed chars: [_0-9a-zA-Z.\[\]])
364 (?:exp)        non-capturing group
366 (?flags:exp)   set flags for exp (non-capturing)
369 Flags are each a single character. For example, `(?x)` sets the flag `x`
370 and `(?-x)` clears the flag `x`. Multiple flags can be set or cleared at
371 the same time: `(?xy)` sets both the `x` and `y` flags and `(?x-y)` sets
377 i     case-insensitive: letters match both upper and lower case
378 m     multi-line mode: ^ and $ match begin/end of line
385 Flags can be toggled within a pattern. Here's an example that matches
386 case-insensitively for the first part but case-sensitively for the second part:
391 let re = Regex::new(r"(?i)a+(?-i)b+").unwrap();
400 Multi-line mode means `^` and `$` no longer match just at the beginning/end of
419 Here is an example that uses an ASCII word boundary instead of a Unicode
425 let re = Regex::new(r"(?-u:\b).+(?-u:\b)").unwrap();
467 [[:alnum:]]    alphanumeric ([0-9A-Za-z])
468 [[:alpha:]]    alphabetic ([A-Za-z])
469 [[:ascii:]]    ASCII ([\x00-\x7F])
471 [[:cntrl:]]    control ([\x00-\x1F\x7F])
472 [[:digit:]]    digits ([0-9])
473 [[:graph:]]    graphical ([!-~])
474 [[:lower:]]    lower case ([a-z])
475 [[:print:]]    printable ([ -~])
476 [[:punct:]]    punctuation ([!-/:-@\[-`{-~])
478 [[:upper:]]    upper case ([A-Z])
479 [[:word:]]     word characters ([0-9A-Za-z_])
480 [[:xdigit:]]   hex digit ([0-9A-Fa-f])
483 # Crate features
485 By default, this crate tries pretty hard to make regex matching both as fast
494 This crate exposes a number of features for controlling that trade off. Some
498 data, can result in a loss of functionality. For example, if one disables the
499 `unicode-case` feature (described below), then compiling the regex `(?i)a`
501 callers must use `(?i-u)a` instead to disable Unicode case folding. Stated
510 * **std** -
513   intended to add `alloc`-only support to regex in the future.
517 * **perf** -
521 * **perf-dfa** -
523   portions of a regex to a very fast DFA on an as-needed basis. This can
527 * **perf-inline** -
531 * **perf-literal** -
534   magnitude. Disabling this drops the `aho-corasick` and `memchr` dependencies.
535 * **perf-cache** -
543 * **unicode** -
546 * **unicode-age** -
548   [Unicode `Age` property](https://www.unicode.org/reports/tr44/tr44-24.html#Character_Age).
551 * **unicode-bool** -
555 * **unicode-case** -
558 * **unicode-gencat** -
560 …[Unicode general categories](https://www.unicode.org/reports/tr44/tr44-24.html#General_Category_Va…
563 * **unicode-perl** -
564   Provide the data for supporting the Unicode-aware Perl character classes,
566   Unicode-aware word boundary assertions. Note that if this feature is
568   `unicode-bool` and `unicode-gencat` features are enabled, respectively.
569 * **unicode-script** -
574 * **unicode-segment** -
583 This crate can handle both untrusted regular expressions and untrusted
593 crate have time complexity `O(mn)` (with `m ~ regex` and `n ~ search
594 text`), which means there's no way to cause exponential blow-up like with
596 features like arbitrary look-ahead and backreferences.)
598 When a DFA is used, pathological cases with exponential state blow-up are
616 compile_error!("`std` feature is currently required to build this crate");
618 // To check README's example
619 // TODO: Re-enable this once the MSRV is 1.43 or greater.
620 // See: https://github.com/rust-lang/regex/issues/684
621 // See: https://github.com/rust-lang/regex/issues/685
626 pub use crate::error::Error;
628 pub use crate::re_builder::set_unicode::*;
630 pub use crate::re_builder::unicode::*;
632 pub use crate::re_set::unicode::*;
634 pub use crate::re_unicode::{
644 top-level of this crate. There are two important differences:
649 matching invalid UTF-8 bytes.
651 # Example: match null terminated string
653 This shows how to find all null-terminated strings in a slice of bytes:
657 let re = Regex::new(r"(?-u)(?P<cstr>[^\x00]+)\x00").unwrap();
669 # Example: selectively enable Unicode support
671 This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded
678     r"(?-u)\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))"
683 // Notice that despite the `.*` at the end, it will only match valid UTF-8
689 // If there was a match, Unicode mode guarantees that `title` is valid UTF-8.
696 UTF-8.
705 match invalid UTF-8. When the `u` flag is disabled, the regex is said to be in
715 Unicode codepoints. For example, in ASCII compatible mode, `\xFF` matches the
717 matches its UTF-8 encoding of `\xC3\xBF`. Similarly for octal notation when
729     pub use crate::re_builder::bytes::*;
730     pub use crate::re_builder::set_bytes::*;
731     pub use crate::re_bytes::*;
732     pub use crate::re_set::bytes::*;
737 #[cfg(feature = "perf-dfa")]
759 /// testing different matching engines and supporting the `regex-debug` CLI
764     pub use crate::compile::Compiler;
765     pub use crate::exec::{Exec, ExecBuilder};
766     pub use crate::input::{Char, CharInput, Input, InputAt};
767     pub use crate::literal::LiteralSearcher;
768     pub use crate::prog::{EmptyLook, Inst, InstRanges, Program};