README.md - OpenGrok cross reference for /external/rust/crates/regex-automata/README.md

Lines Matching +full:to +full:- +full:regex
1 regex-automata
8 …status](https://github.com/BurntSushi/regex-automata/workflows/ci/badge.svg)](https://github.com/B…
9 [![on crates.io](https://meritbadge.herokuapp.com/regex-automata)](https://crates.io/crates/regex-a…
10 ![Minimum Supported Rust Version 1.41](https://img.shields.io/badge/rustc-1.41-green)
12 Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
17 https://docs.rs/regex-automata
22 Add this to your `Cargo.toml`:
26 regex-automata = "0.1"
29 and this to your crate root (if you're using Rust 2015):
36 ### Example: basic regex searching
38 This example shows how to compile a regex using the default configuration
39 and then use it to find matches in a byte string:
42 use regex_automata::Regex;
44 let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
45 let text = b"2018-12-24 2016-10-08";
51 please see the [docs](https://docs.rs/regex-automata).
67   the DFAs to use a fixed size state identifier instead of the default `usize`.
68   You may also need to serialize both little and big endian versions of each
69   DFA. (So that's 4 DFAs in total for each regex.)
72   as you would any regex.
78 [`ucd-generate`](https://github.com/BurntSushi/ucd-generate)
79 tool will do the first step for you with its `dfa` or `regex` sub-commands.
84 * `std` - **Enabled** by default. This enables the ability to compile finite
85   automata. This requires the `regex-syntax` dependency. Without this feature
88 * `transducer` - **Disabled** by default. This provides implementations of the
90   automata generated by this crate to search finite state transducers. This
94 ### Differences with the regex crate
96 The main goal of the [`regex`](https://docs.rs/regex) crate is to serve as a
97 general purpose regular expression engine. It aims to automatically balance low
106   of the regex pattern. While most patterns do not exhibit worst case
110   option to return an error if the DFA gets too big.)
111 * This crate does not support sub-match extraction, which can be achieved with
112   the regex crate's "captures" API. This may be added in the future, but is
114 * While the regex crate doesn't necessarily sport fast compilation times, the
115   regexes in this crate are almost universally slow to compile, especially when
118   almost 5MB of memory! (Compiling a sparse regex takes about the same time
119   but only uses about 500KB of memory.) Conversly, compiling the same regex
120   without Unicode support, e.g., `(?-u)\w{3}`, takes under 1 millisecond and
123 * This crate does not support regex sets.
124 * This crate does not support zero-width assertions such as `^`, `$`, `\b` or
130 * There is no `&str` API like in the regex crate. In this crate, all APIs
131   operate on `&[u8]`. By default, match indices are guaranteed to fall on
132   UTF-8 boundaries, unless `RegexBuilder::allow_invalid_utf8` is enabled.
136 * Both dense and sparse DFAs can be serialized to raw bytes, and then cheaply
142   deserialize pre-compiled regexes.
143 * Since this crate builds DFAs ahead of time, it will generally out-perform
144   the `regex` crate on equivalent tasks. The performance difference is likely
145   not large. However, because of a complex set of optimizations in the regex
147   difficult to do.
148 * Sparse DFAs provide a way to build a DFA ahead of time that sacrifices search
150   even less than what the regex crate uses.
152   which enables one to do less work in some cases. For example, if you only
154   directly without building a `Regex`, which always requires a second DFA to
158   things like choosing a smaller state identifier representation, to
169   want to improve if we want to make DFA compile times faster. I *believe*
170   it's possible to potentially build minimal or nearly minimal NFAs for the
174   key adaptation I think we need to make is to modify the algorithm to operate
177 * Add support for regex sets. It should be possible to do this by "simply"
179   each match, similar to how Aho-Corasick works. I think the long pole in the
181   introduce extra overhead into the non-regex-set case without duplicating a
191   to make small fixed length look-around work? It would be really nice to
194   src/codegen.rs that is thoroughly bit-rotted. At the time, I was
200   enough. Either way, it's probably a good option to have. For one thing, it
203   you only need to compile one of them for any given arch).
206   implement something for this outside of the crate, but it would be good to
208   suspect we might want to support several variants.
210   in this crate. My original intent was to not let this crate sink down into
211   that very very very deep rabbit hole. But instead, we might want to provide
212   some way for literal optimizations to hook into the match routines. The right
213   path forward here is to probably build something outside of the crate and
217   quite costly to build. Their worst case compilation time is O(2^n), where
219   seems to provide a way to character state blow up such that it is detectable.
220   If we could know whether a regex will exhibit state explosion or not, then
221   we could make an intelligent decision about whether to ahead-of-time compile
223 …-Shutu/publication/229032602_Characterization_of_a_global_germplasm_collection_and_its_potential_u…