Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
examples/ | 03-May-2024 | - | 2,079 | 1,988 | ||
src/ | 03-May-2024 | - | 13,914 | 8,356 | ||
tests/ | 03-May-2024 | - | 8,064 | 5,537 | ||
.cargo_vcs_info.json | D | 03-May-2024 | 74 | 6 | 5 | |
.gitignore | D | 03-May-2024 | 68 | 9 | 8 | |
Android.bp | D | 03-May-2024 | 2.4 KiB | 77 | 72 | |
CHANGELOG.md | D | 03-May-2024 | 36.6 KiB | 958 | 741 | |
Cargo.toml | D | 03-May-2024 | 2.9 KiB | 116 | 97 | |
Cargo.toml.orig | D | 03-May-2024 | 6 KiB | 194 | 162 | |
HACKING.md | D | 03-May-2024 | 16.5 KiB | 342 | 272 | |
LICENSE | D | 03-May-2024 | 10.6 KiB | 202 | 169 | |
LICENSE-APACHE | D | 03-May-2024 | 10.6 KiB | 202 | 169 | |
LICENSE-MIT | D | 03-May-2024 | 1 KiB | 26 | 22 | |
METADATA | D | 03-May-2024 | 469 | 20 | 19 | |
MODULE_LICENSE_APACHE2 | D | 03-May-2024 | 0 | |||
OWNERS | D | 03-May-2024 | 40 | 2 | 1 | |
PERFORMANCE.md | D | 03-May-2024 | 13.3 KiB | 280 | 217 | |
README.md | D | 03-May-2024 | 8.1 KiB | 257 | 191 | |
TEST_MAPPING | D | 03-May-2024 | 257 | 15 | 14 | |
UNICODE.md | D | 03-May-2024 | 10.2 KiB | 260 | 203 | |
rustfmt.toml | D | 03-May-2024 | 44 | 3 | 2 | |
test | D | 03-May-2024 | 831 | 29 | 20 |
README.md
1regex 2===== 3A Rust library for parsing, compiling, and executing regular expressions. Its 4syntax is similar to Perl-style regular expressions, but lacks a few features 5like look around and backreferences. In exchange, all searches execute in 6linear time with respect to the size of the regular expression and search text. 7Much of the syntax and implementation is inspired 8by [RE2](https://github.com/google/re2). 9 10[](https://github.com/rust-lang/regex/actions) 11[](https://crates.io/crates/regex) 12[](https://github.com/rust-lang/regex) 13 14### Documentation 15 16[Module documentation with examples](https://docs.rs/regex). 17The module documentation also includes a comprehensive description of the 18syntax supported. 19 20Documentation with examples for the various matching functions and iterators 21can be found on the 22[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html). 23 24### Usage 25 26Add this to your `Cargo.toml`: 27 28```toml 29[dependencies] 30regex = "1" 31``` 32 33and this to your crate root (if you're using Rust 2015): 34 35```rust 36extern crate regex; 37``` 38 39Here's a simple example that matches a date in YYYY-MM-DD format and prints the 40year, month and day: 41 42```rust 43use regex::Regex; 44 45fn main() { 46 let re = Regex::new(r"(?x) 47(?P<year>\d{4}) # the year 48- 49(?P<month>\d{2}) # the month 50- 51(?P<day>\d{2}) # the day 52").unwrap(); 53 let caps = re.captures("2010-03-14").unwrap(); 54 55 assert_eq!("2010", &caps["year"]); 56 assert_eq!("03", &caps["month"]); 57 assert_eq!("14", &caps["day"]); 58} 59``` 60 61If you have lots of dates in text that you'd like to iterate over, then it's 62easy to adapt the above example with an iterator: 63 64```rust 65use regex::Regex; 66 67const TO_SEARCH: &'static str = " 68On 2010-03-14, foo happened. On 2014-10-14, bar happened. 69"; 70 71fn main() { 72 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap(); 73 74 for caps in re.captures_iter(TO_SEARCH) { 75 // Note that all of the unwraps are actually OK for this regex 76 // because the only way for the regex to match is if all of the 77 // capture groups match. This is not true in general though! 78 println!("year: {}, month: {}, day: {}", 79 caps.get(1).unwrap().as_str(), 80 caps.get(2).unwrap().as_str(), 81 caps.get(3).unwrap().as_str()); 82 } 83} 84``` 85 86This example outputs: 87 88```text 89year: 2010, month: 03, day: 14 90year: 2014, month: 10, day: 14 91``` 92 93### Usage: Avoid compiling the same regex in a loop 94 95It is an anti-pattern to compile the same regular expression in a loop since 96compilation is typically expensive. (It takes anywhere from a few microseconds 97to a few **milliseconds** depending on the size of the regex.) Not only is 98compilation itself expensive, but this also prevents optimizations that reuse 99allocations internally to the matching engines. 100 101In Rust, it can sometimes be a pain to pass regular expressions around if 102they're used from inside a helper function. Instead, we recommend using the 103[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that 104regular expressions are compiled exactly once. 105 106For example: 107 108```rust,ignore 109use regex::Regex; 110 111fn some_helper_function(text: &str) -> bool { 112 lazy_static! { 113 static ref RE: Regex = Regex::new("...").unwrap(); 114 } 115 RE.is_match(text) 116} 117``` 118 119Specifically, in this example, the regex will be compiled when it is used for 120the first time. On subsequent uses, it will reuse the previous compilation. 121 122### Usage: match regular expressions on `&[u8]` 123 124The main API of this crate (`regex::Regex`) requires the caller to pass a 125`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which 126means the main API can't be used for searching arbitrary bytes. 127 128To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API 129is identical to the main API, except that it takes an `&[u8]` to search 130on instead of an `&str`. By default, `.` will match any *byte* using 131`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar 132value* using the main API. 133 134This example shows how to find all null-terminated strings in a slice of bytes: 135 136```rust 137use regex::bytes::Regex; 138 139let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap(); 140let text = b"foo\x00bar\x00baz\x00"; 141 142// Extract all of the strings without the null terminator from each match. 143// The unwrap is OK here since a match requires the `cstr` capture to match. 144let cstrs: Vec<&[u8]> = 145 re.captures_iter(text) 146 .map(|c| c.name("cstr").unwrap().as_bytes()) 147 .collect(); 148assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs); 149``` 150 151Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When 152using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence 153except for `NUL`. 154 155### Usage: match multiple regular expressions simultaneously 156 157This demonstrates how to use a `RegexSet` to match multiple (possibly 158overlapping) regular expressions in a single scan of the search text: 159 160```rust 161use regex::RegexSet; 162 163let set = RegexSet::new(&[ 164 r"\w+", 165 r"\d+", 166 r"\pL+", 167 r"foo", 168 r"bar", 169 r"barfoo", 170 r"foobar", 171]).unwrap(); 172 173// Iterate over and collect all of the matches. 174let matches: Vec<_> = set.matches("foobar").into_iter().collect(); 175assert_eq!(matches, vec![0, 2, 3, 4, 6]); 176 177// You can also test whether a particular regex matched: 178let matches = set.matches("foobar"); 179assert!(!matches.matched(5)); 180assert!(matches.matched(6)); 181``` 182 183### Usage: enable SIMD optimizations 184 185SIMD optimizations are enabled automatically on Rust stable 1.27 and newer. 186For nightly versions of Rust, this requires a recent version with the SIMD 187features stabilized. 188 189 190### Usage: a regular expression parser 191 192This repository contains a crate that provides a well tested regular expression 193parser, abstract syntax and a high-level intermediate representation for 194convenient analysis. It provides no facilities for compilation or execution. 195This may be useful if you're implementing your own regex engine or otherwise 196need to do analysis on the syntax of a regular expression. It is otherwise not 197recommended for general use. 198 199[Documentation `regex-syntax`.](https://docs.rs/regex-syntax) 200 201 202### Crate features 203 204This crate comes with several features that permit tweaking the trade off 205between binary size, compilation time and runtime performance. Users of this 206crate can selectively disable Unicode tables, or choose from a variety of 207optimizations performed by this crate to disable. 208 209When all of these features are disabled, runtime match performance may be much 210worse, but if you're matching on short strings, or if high performance isn't 211necessary, then such a configuration is perfectly serviceable. To disable 212all such features, use the following `Cargo.toml` dependency configuration: 213 214```toml 215[dependencies.regex] 216version = "1.3" 217default-features = false 218# regex currently requires the standard library, you must re-enable it. 219features = ["std"] 220``` 221 222This will reduce the dependency tree of `regex` down to a single crate 223(`regex-syntax`). 224 225The full set of features one can disable are 226[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features). 227 228 229### Minimum Rust version policy 230 231This crate's minimum supported `rustc` version is `1.28.0`. 232 233The current **tentative** policy is that the minimum Rust version required 234to use this crate can be increased in minor version updates. For example, if 235regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will 236also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a 237newer minimum version of Rust. 238 239In general, this crate will be conservative with respect to the minimum 240supported version of Rust. 241 242 243### License 244 245This project is licensed under either of 246 247 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or 248 https://www.apache.org/licenses/LICENSE-2.0) 249 * MIT license ([LICENSE-MIT](LICENSE-MIT) or 250 https://opensource.org/licenses/MIT) 251 252at your option. 253 254The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode 255License Agreement 256([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)). 257