Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
examples/ | 03-May-2024 | - | 2,067 | 1,982 | ||
src/ | 03-May-2024 | - | 13,264 | 7,912 | ||
tests/ | 03-May-2024 | - | 8,169 | 5,587 | ||
.cargo_vcs_info.json | D | 03-May-2024 | 94 | 6 | 6 | |
.gitignore | D | 03-May-2024 | 68 | 9 | 8 | |
Android.bp | D | 03-May-2024 | 11.5 KiB | 510 | 496 | |
CHANGELOG.md | D | 03-May-2024 | 42.2 KiB | 1,117 | 853 | |
Cargo.toml | D | 03-May-2024 | 3 KiB | 150 | 128 | |
Cargo.toml.orig | D | 03-May-2024 | 6.1 KiB | 195 | 163 | |
HACKING.md | D | 03-May-2024 | 16.5 KiB | 342 | 272 | |
LICENSE | D | 03-May-2024 | 10.6 KiB | 202 | 169 | |
LICENSE-APACHE | D | 03-May-2024 | 10.6 KiB | 202 | 169 | |
LICENSE-MIT | D | 03-May-2024 | 1 KiB | 26 | 22 | |
METADATA | D | 03-May-2024 | 707 | 24 | 22 | |
MODULE_LICENSE_APACHE2 | D | 03-May-2024 | 0 | |||
OWNERS | D | 03-May-2024 | 40 | 2 | 1 | |
PERFORMANCE.md | D | 03-May-2024 | 13.3 KiB | 278 | 216 | |
README.md | D | 03-May-2024 | 8.1 KiB | 247 | 184 | |
TEST_MAPPING | D | 03-May-2024 | 2 KiB | 103 | 102 | |
UNICODE.md | D | 03-May-2024 | 10.2 KiB | 260 | 203 | |
cargo2android.json | D | 03-May-2024 | 214 | 13 | 12 | |
rustfmt.toml | D | 03-May-2024 | 44 | 3 | 2 | |
test | D | 03-May-2024 | 839 | 31 | 21 |
README.md
1regex 2===== 3A Rust library for parsing, compiling, and executing regular expressions. Its 4syntax is similar to Perl-style regular expressions, but lacks a few features 5like look around and backreferences. In exchange, all searches execute in 6linear time with respect to the size of the regular expression and search text. 7Much of the syntax and implementation is inspired 8by [RE2](https://github.com/google/re2). 9 10[](https://github.com/rust-lang/regex/actions) 11[](https://crates.io/crates/regex) 12[](https://github.com/rust-lang/regex) 13 14### Documentation 15 16[Module documentation with examples](https://docs.rs/regex). 17The module documentation also includes a comprehensive description of the 18syntax supported. 19 20Documentation with examples for the various matching functions and iterators 21can be found on the 22[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html). 23 24### Usage 25 26To bring this crate into your repository, either add `regex` to your 27`Cargo.toml`, or run `cargo add regex`. 28 29Here's a simple example that matches a date in YYYY-MM-DD format and prints the 30year, month and day: 31 32```rust 33use regex::Regex; 34 35fn main() { 36 let re = Regex::new(r"(?x) 37(?P<year>\d{4}) # the year 38- 39(?P<month>\d{2}) # the month 40- 41(?P<day>\d{2}) # the day 42").unwrap(); 43 let caps = re.captures("2010-03-14").unwrap(); 44 45 assert_eq!("2010", &caps["year"]); 46 assert_eq!("03", &caps["month"]); 47 assert_eq!("14", &caps["day"]); 48} 49``` 50 51If you have lots of dates in text that you'd like to iterate over, then it's 52easy to adapt the above example with an iterator: 53 54```rust 55use regex::Regex; 56 57const TO_SEARCH: &'static str = " 58On 2010-03-14, foo happened. On 2014-10-14, bar happened. 59"; 60 61fn main() { 62 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap(); 63 64 for caps in re.captures_iter(TO_SEARCH) { 65 // Note that all of the unwraps are actually OK for this regex 66 // because the only way for the regex to match is if all of the 67 // capture groups match. This is not true in general though! 68 println!("year: {}, month: {}, day: {}", 69 caps.get(1).unwrap().as_str(), 70 caps.get(2).unwrap().as_str(), 71 caps.get(3).unwrap().as_str()); 72 } 73} 74``` 75 76This example outputs: 77 78```text 79year: 2010, month: 03, day: 14 80year: 2014, month: 10, day: 14 81``` 82 83### Usage: Avoid compiling the same regex in a loop 84 85It is an anti-pattern to compile the same regular expression in a loop since 86compilation is typically expensive. (It takes anywhere from a few microseconds 87to a few **milliseconds** depending on the size of the regex.) Not only is 88compilation itself expensive, but this also prevents optimizations that reuse 89allocations internally to the matching engines. 90 91In Rust, it can sometimes be a pain to pass regular expressions around if 92they're used from inside a helper function. Instead, we recommend using the 93[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that 94regular expressions are compiled exactly once. 95 96For example: 97 98```rust,ignore 99use regex::Regex; 100 101fn some_helper_function(text: &str) -> bool { 102 lazy_static! { 103 static ref RE: Regex = Regex::new("...").unwrap(); 104 } 105 RE.is_match(text) 106} 107``` 108 109Specifically, in this example, the regex will be compiled when it is used for 110the first time. On subsequent uses, it will reuse the previous compilation. 111 112### Usage: match regular expressions on `&[u8]` 113 114The main API of this crate (`regex::Regex`) requires the caller to pass a 115`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which 116means the main API can't be used for searching arbitrary bytes. 117 118To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API 119is identical to the main API, except that it takes an `&[u8]` to search 120on instead of an `&str`. By default, `.` will match any *byte* using 121`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar 122value* using the main API. 123 124This example shows how to find all null-terminated strings in a slice of bytes: 125 126```rust 127use regex::bytes::Regex; 128 129let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap(); 130let text = b"foo\x00bar\x00baz\x00"; 131 132// Extract all of the strings without the null terminator from each match. 133// The unwrap is OK here since a match requires the `cstr` capture to match. 134let cstrs: Vec<&[u8]> = 135 re.captures_iter(text) 136 .map(|c| c.name("cstr").unwrap().as_bytes()) 137 .collect(); 138assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs); 139``` 140 141Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When 142using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence 143except for `NUL`. 144 145### Usage: match multiple regular expressions simultaneously 146 147This demonstrates how to use a `RegexSet` to match multiple (possibly 148overlapping) regular expressions in a single scan of the search text: 149 150```rust 151use regex::RegexSet; 152 153let set = RegexSet::new(&[ 154 r"\w+", 155 r"\d+", 156 r"\pL+", 157 r"foo", 158 r"bar", 159 r"barfoo", 160 r"foobar", 161]).unwrap(); 162 163// Iterate over and collect all of the matches. 164let matches: Vec<_> = set.matches("foobar").into_iter().collect(); 165assert_eq!(matches, vec![0, 2, 3, 4, 6]); 166 167// You can also test whether a particular regex matched: 168let matches = set.matches("foobar"); 169assert!(!matches.matched(5)); 170assert!(matches.matched(6)); 171``` 172 173### Usage: enable SIMD optimizations 174 175SIMD optimizations are enabled automatically on Rust stable 1.27 and newer. 176For nightly versions of Rust, this requires a recent version with the SIMD 177features stabilized. 178 179 180### Usage: a regular expression parser 181 182This repository contains a crate that provides a well tested regular expression 183parser, abstract syntax and a high-level intermediate representation for 184convenient analysis. It provides no facilities for compilation or execution. 185This may be useful if you're implementing your own regex engine or otherwise 186need to do analysis on the syntax of a regular expression. It is otherwise not 187recommended for general use. 188 189[Documentation `regex-syntax`.](https://docs.rs/regex-syntax) 190 191 192### Crate features 193 194This crate comes with several features that permit tweaking the trade off 195between binary size, compilation time and runtime performance. Users of this 196crate can selectively disable Unicode tables, or choose from a variety of 197optimizations performed by this crate to disable. 198 199When all of these features are disabled, runtime match performance may be much 200worse, but if you're matching on short strings, or if high performance isn't 201necessary, then such a configuration is perfectly serviceable. To disable 202all such features, use the following `Cargo.toml` dependency configuration: 203 204```toml 205[dependencies.regex] 206version = "1.3" 207default-features = false 208# regex currently requires the standard library, you must re-enable it. 209features = ["std"] 210``` 211 212This will reduce the dependency tree of `regex` down to a single crate 213(`regex-syntax`). 214 215The full set of features one can disable are 216[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features). 217 218 219### Minimum Rust version policy 220 221This crate's minimum supported `rustc` version is `1.41.1`. 222 223The current **tentative** policy is that the minimum Rust version required 224to use this crate can be increased in minor version updates. For example, if 225regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will 226also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a 227newer minimum version of Rust. 228 229In general, this crate will be conservative with respect to the minimum 230supported version of Rust. 231 232 233### License 234 235This project is licensed under either of 236 237 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or 238 https://www.apache.org/licenses/LICENSE-2.0) 239 * MIT license ([LICENSE-MIT](LICENSE-MIT) or 240 https://opensource.org/licenses/MIT) 241 242at your option. 243 244The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode 245License Agreement 246([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)). 247