README.md
        
        
        
        1regex
2=====
3A Rust library for parsing, compiling, and executing regular expressions. Its
4syntax is similar to Perl-style regular expressions, but lacks a few features
5like look around and backreferences. In exchange, all searches execute in
6linear time with respect to the size of the regular expression and search text.
7Much of the syntax and implementation is inspired
8by [RE2](https://github.com/google/re2).
9
10[](https://github.com/rust-lang/regex/actions)
11[](https://crates.io/crates/regex)
12[](https://github.com/rust-lang/regex)
13
14### Documentation
15
16[Module documentation with examples](https://docs.rs/regex).
17The module documentation also includes a comprehensive description of the
18syntax supported.
19
20Documentation with examples for the various matching functions and iterators
21can be found on the
22[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
23
24### Usage
25
26To bring this crate into your repository, either add `regex` to your
27`Cargo.toml`, or run `cargo add regex`.
28
29Here's a simple example that matches a date in YYYY-MM-DD format and prints the
30year, month and day:
31
32```rust
33use regex::Regex;
34
35fn main() {
36    let re = Regex::new(r"(?x)
37(?P<year>\d{4})  # the year
38-
39(?P<month>\d{2}) # the month
40-
41(?P<day>\d{2})   # the day
42").unwrap();
43    let caps = re.captures("2010-03-14").unwrap();
44
45    assert_eq!("2010", &caps["year"]);
46    assert_eq!("03", &caps["month"]);
47    assert_eq!("14", &caps["day"]);
48}
49```
50
51If you have lots of dates in text that you'd like to iterate over, then it's
52easy to adapt the above example with an iterator:
53
54```rust
55use regex::Regex;
56
57const TO_SEARCH: &'static str = "
58On 2010-03-14, foo happened. On 2014-10-14, bar happened.
59";
60
61fn main() {
62    let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
63
64    for caps in re.captures_iter(TO_SEARCH) {
65        // Note that all of the unwraps are actually OK for this regex
66        // because the only way for the regex to match is if all of the
67        // capture groups match. This is not true in general though!
68        println!("year: {}, month: {}, day: {}",
69                 caps.get(1).unwrap().as_str(),
70                 caps.get(2).unwrap().as_str(),
71                 caps.get(3).unwrap().as_str());
72    }
73}
74```
75
76This example outputs:
77
78```text
79year: 2010, month: 03, day: 14
80year: 2014, month: 10, day: 14
81```
82
83### Usage: Avoid compiling the same regex in a loop
84
85It is an anti-pattern to compile the same regular expression in a loop since
86compilation is typically expensive. (It takes anywhere from a few microseconds
87to a few **milliseconds** depending on the size of the regex.) Not only is
88compilation itself expensive, but this also prevents optimizations that reuse
89allocations internally to the matching engines.
90
91In Rust, it can sometimes be a pain to pass regular expressions around if
92they're used from inside a helper function. Instead, we recommend using the
93[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
94regular expressions are compiled exactly once.
95
96For example:
97
98```rust,ignore
99use regex::Regex;
100
101fn some_helper_function(text: &str) -> bool {
102    lazy_static! {
103        static ref RE: Regex = Regex::new("...").unwrap();
104    }
105    RE.is_match(text)
106}
107```
108
109Specifically, in this example, the regex will be compiled when it is used for
110the first time. On subsequent uses, it will reuse the previous compilation.
111
112### Usage: match regular expressions on `&[u8]`
113
114The main API of this crate (`regex::Regex`) requires the caller to pass a
115`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
116means the main API can't be used for searching arbitrary bytes.
117
118To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
119is identical to the main API, except that it takes an `&[u8]` to search
120on instead of an `&str`. By default, `.` will match any *byte* using
121`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
122value* using the main API.
123
124This example shows how to find all null-terminated strings in a slice of bytes:
125
126```rust
127use regex::bytes::Regex;
128
129let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
130let text = b"foo\x00bar\x00baz\x00";
131
132// Extract all of the strings without the null terminator from each match.
133// The unwrap is OK here since a match requires the `cstr` capture to match.
134let cstrs: Vec<&[u8]> =
135    re.captures_iter(text)
136      .map(|c| c.name("cstr").unwrap().as_bytes())
137      .collect();
138assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
139```
140
141Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
142using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
143except for `NUL`.
144
145### Usage: match multiple regular expressions simultaneously
146
147This demonstrates how to use a `RegexSet` to match multiple (possibly
148overlapping) regular expressions in a single scan of the search text:
149
150```rust
151use regex::RegexSet;
152
153let set = RegexSet::new(&[
154    r"\w+",
155    r"\d+",
156    r"\pL+",
157    r"foo",
158    r"bar",
159    r"barfoo",
160    r"foobar",
161]).unwrap();
162
163// Iterate over and collect all of the matches.
164let matches: Vec<_> = set.matches("foobar").into_iter().collect();
165assert_eq!(matches, vec![0, 2, 3, 4, 6]);
166
167// You can also test whether a particular regex matched:
168let matches = set.matches("foobar");
169assert!(!matches.matched(5));
170assert!(matches.matched(6));
171```
172
173### Usage: enable SIMD optimizations
174
175SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
176For nightly versions of Rust, this requires a recent version with the SIMD
177features stabilized.
178
179
180### Usage: a regular expression parser
181
182This repository contains a crate that provides a well tested regular expression
183parser, abstract syntax and a high-level intermediate representation for
184convenient analysis. It provides no facilities for compilation or execution.
185This may be useful if you're implementing your own regex engine or otherwise
186need to do analysis on the syntax of a regular expression. It is otherwise not
187recommended for general use.
188
189[Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
190
191
192### Crate features
193
194This crate comes with several features that permit tweaking the trade off
195between binary size, compilation time and runtime performance. Users of this
196crate can selectively disable Unicode tables, or choose from a variety of
197optimizations performed by this crate to disable.
198
199When all of these features are disabled, runtime match performance may be much
200worse, but if you're matching on short strings, or if high performance isn't
201necessary, then such a configuration is perfectly serviceable. To disable
202all such features, use the following `Cargo.toml` dependency configuration:
203
204```toml
205[dependencies.regex]
206version = "1.3"
207default-features = false
208# regex currently requires the standard library, you must re-enable it.
209features = ["std"]
210```
211
212This will reduce the dependency tree of `regex` down to a single crate
213(`regex-syntax`).
214
215The full set of features one can disable are
216[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features).
217
218
219### Minimum Rust version policy
220
221This crate's minimum supported `rustc` version is `1.41.1`.
222
223The current **tentative** policy is that the minimum Rust version required
224to use this crate can be increased in minor version updates. For example, if
225regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
226also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
227newer minimum version of Rust.
228
229In general, this crate will be conservative with respect to the minimum
230supported version of Rust.
231
232
233### License
234
235This project is licensed under either of
236
237 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
238   https://www.apache.org/licenses/LICENSE-2.0)
239 * MIT license ([LICENSE-MIT](LICENSE-MIT) or
240   https://opensource.org/licenses/MIT)
241
242at your option.
243
244The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
245License Agreement
246([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)).
247