Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
benches/ | 03-May-2024 | - | 65 | 55 | ||
src/ | 03-May-2024 | - | 50,544 | 45,384 | ||
.cargo_vcs_info.json | D | 03-May-2024 | 74 | 6 | 5 | |
Android.bp | D | 03-May-2024 | 2.6 KiB | 92 | 85 | |
Cargo.toml | D | 03-May-2024 | 1.1 KiB | 33 | 30 | |
Cargo.toml.orig | D | 03-May-2024 | 777 | 33 | 30 | |
LICENSE | D | 03-May-2024 | 10.6 KiB | 202 | 169 | |
LICENSE-APACHE | D | 03-May-2024 | 10.6 KiB | 202 | 169 | |
LICENSE-MIT | D | 03-May-2024 | 1 KiB | 26 | 22 | |
METADATA | D | 03-May-2024 | 385 | 20 | 19 | |
MODULE_LICENSE_APACHE2 | D | 03-May-2024 | 0 | |||
OWNERS | D | 03-May-2024 | 40 | 2 | 1 | |
README.md | D | 03-May-2024 | 4 KiB | 100 | 70 | |
TEST_MAPPING | D | 03-May-2024 | 319 | 18 | 17 | |
test | D | 03-May-2024 | 408 | 21 | 16 |
README.md
1regex-syntax 2============ 3This crate provides a robust regular expression parser. 4 5[](https://travis-ci.com/rust-lang/regex) 6[](https://ci.appveyor.com/project/rust-lang-libs/regex) 7[](https://crates.io/crates/regex-syntax) 8[](https://github.com/rust-lang/regex) 9 10 11### Documentation 12 13https://docs.rs/regex-syntax 14 15 16### Overview 17 18There are two primary types exported by this crate: `Ast` and `Hir`. The former 19is a faithful abstract syntax of a regular expression, and can convert regular 20expressions back to their concrete syntax while mostly preserving its original 21form. The latter type is a high level intermediate representation of a regular 22expression that is amenable to analysis and compilation into byte codes or 23automata. An `Hir` achieves this by drastically simplifying the syntactic 24structure of the regular expression. While an `Hir` can be converted back to 25its equivalent concrete syntax, the result is unlikely to resemble the original 26concrete syntax that produced the `Hir`. 27 28 29### Example 30 31This example shows how to parse a pattern string into its HIR: 32 33```rust 34use regex_syntax::Parser; 35use regex_syntax::hir::{self, Hir}; 36 37let hir = Parser::new().parse("a|b").unwrap(); 38assert_eq!(hir, Hir::alternation(vec![ 39 Hir::literal(hir::Literal::Unicode('a')), 40 Hir::literal(hir::Literal::Unicode('b')), 41])); 42``` 43 44 45### Safety 46 47This crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it's 48possible this crate could use `unsafe` code in the future, the standard 49for doing so is extremely high. In general, most code in this crate is not 50performance critical, since it tends to be dwarfed by the time it takes to 51compile a regular expression into an automaton. Therefore, there is little need 52for extreme optimization, and therefore, use of `unsafe`. 53 54The standard for using `unsafe` in this crate is extremely high because this 55crate is intended to be reasonably safe to use with user supplied regular 56expressions. Therefore, while their may be bugs in the regex parser itself, 57they should _never_ result in memory unsafety unless there is either a bug 58in the compiler or the standard library. (Since `regex-syntax` has zero 59dependencies.) 60 61 62### Crate features 63 64By default, this crate bundles a fairly large amount of Unicode data tables 65(a source size of ~750KB). Because of their large size, one can disable some 66or all of these data tables. If a regular expression attempts to use Unicode 67data that is not available, then an error will occur when translating the `Ast` 68to the `Hir`. 69 70The full set of features one can disable are 71[in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features). 72 73 74### Testing 75 76Simply running `cargo test` will give you very good coverage. However, because 77of the large number of features exposed by this crate, a `test` script is 78included in this directory which will test several feature combinations. This 79is the same script that is run in CI. 80 81 82### Motivation 83 84The primary purpose of this crate is to provide the parser used by `regex`. 85Specifically, this crate is treated as an implementation detail of the `regex`, 86and is primarily developed for the needs of `regex`. 87 88Since this crate is an implementation detail of `regex`, it may experience 89breaking change releases at a different cadence from `regex`. This is only 90possible because this crate is _not_ a public dependency of `regex`. 91 92Another consequence of this de-coupling is that there is no direct way to 93compile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one must 94first convert the `Hir` to a string (via its `std::fmt::Display`) and then 95compile that via `Regex::new`. While this does repeat some work, compilation 96typically takes much longer than parsing. 97 98Stated differently, the coupling between `regex` and `regex-syntax` exists only 99at the level of the concrete syntax. 100