1regex-syntax 2============ 3This crate provides a robust regular expression parser. 4 5[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions) 6[![Crates.io](https://img.shields.io/crates/v/regex-syntax.svg)](https://crates.io/crates/regex-syntax) 7[![Rust](https://img.shields.io/badge/rust-1.28.0%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex) 8 9 10### Documentation 11 12https://docs.rs/regex-syntax 13 14 15### Overview 16 17There are two primary types exported by this crate: `Ast` and `Hir`. The former 18is a faithful abstract syntax of a regular expression, and can convert regular 19expressions back to their concrete syntax while mostly preserving its original 20form. The latter type is a high level intermediate representation of a regular 21expression that is amenable to analysis and compilation into byte codes or 22automata. An `Hir` achieves this by drastically simplifying the syntactic 23structure of the regular expression. While an `Hir` can be converted back to 24its equivalent concrete syntax, the result is unlikely to resemble the original 25concrete syntax that produced the `Hir`. 26 27 28### Example 29 30This example shows how to parse a pattern string into its HIR: 31 32```rust 33use regex_syntax::Parser; 34use regex_syntax::hir::{self, Hir}; 35 36let hir = Parser::new().parse("a|b").unwrap(); 37assert_eq!(hir, Hir::alternation(vec![ 38 Hir::literal(hir::Literal::Unicode('a')), 39 Hir::literal(hir::Literal::Unicode('b')), 40])); 41``` 42 43 44### Safety 45 46This crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it's 47possible this crate could use `unsafe` code in the future, the standard 48for doing so is extremely high. In general, most code in this crate is not 49performance critical, since it tends to be dwarfed by the time it takes to 50compile a regular expression into an automaton. Therefore, there is little need 51for extreme optimization, and therefore, use of `unsafe`. 52 53The standard for using `unsafe` in this crate is extremely high because this 54crate is intended to be reasonably safe to use with user supplied regular 55expressions. Therefore, while there may be bugs in the regex parser itself, 56they should _never_ result in memory unsafety unless there is either a bug 57in the compiler or the standard library. (Since `regex-syntax` has zero 58dependencies.) 59 60 61### Crate features 62 63By default, this crate bundles a fairly large amount of Unicode data tables 64(a source size of ~750KB). Because of their large size, one can disable some 65or all of these data tables. If a regular expression attempts to use Unicode 66data that is not available, then an error will occur when translating the `Ast` 67to the `Hir`. 68 69The full set of features one can disable are 70[in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features). 71 72 73### Testing 74 75Simply running `cargo test` will give you very good coverage. However, because 76of the large number of features exposed by this crate, a `test` script is 77included in this directory which will test several feature combinations. This 78is the same script that is run in CI. 79 80 81### Motivation 82 83The primary purpose of this crate is to provide the parser used by `regex`. 84Specifically, this crate is treated as an implementation detail of the `regex`, 85and is primarily developed for the needs of `regex`. 86 87Since this crate is an implementation detail of `regex`, it may experience 88breaking change releases at a different cadence from `regex`. This is only 89possible because this crate is _not_ a public dependency of `regex`. 90 91Another consequence of this de-coupling is that there is no direct way to 92compile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one must 93first convert the `Hir` to a string (via its `std::fmt::Display`) and then 94compile that via `Regex::new`. While this does repeat some work, compilation 95typically takes much longer than parsing. 96 97Stated differently, the coupling between `regex` and `regex-syntax` exists only 98at the level of the concrete syntax. 99