• Home
  • Raw
  • Download

Lines Matching +full:to +full:- +full:regex

1 regex-automata
8 …status](https://github.com/BurntSushi/regex-automata/workflows/ci/badge.svg)](https://github.com/B…
9 [![on crates.io](https://meritbadge.herokuapp.com/regex-automata)](https://crates.io/crates/regex-a…
10 ![Minimum Supported Rust Version 1.41](https://img.shields.io/badge/rustc-1.41-green)
12 Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
17 https://docs.rs/regex-automata
22 Add this to your `Cargo.toml`:
26 regex-automata = "0.1"
29 and this to your crate root (if you're using Rust 2015):
36 ### Example: basic regex searching
38 This example shows how to compile a regex using the default configuration
39 and then use it to find matches in a byte string:
42 use regex_automata::Regex;
44 let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
45 let text = b"2018-12-24 2016-10-08";
51 please see the [docs](https://docs.rs/regex-automata).
67 the DFAs to use a fixed size state identifier instead of the default `usize`.
68 You may also need to serialize both little and big endian versions of each
69 DFA. (So that's 4 DFAs in total for each regex.)
72 as you would any regex.
78 [`ucd-generate`](https://github.com/BurntSushi/ucd-generate)
79 tool will do the first step for you with its `dfa` or `regex` sub-commands.
84 * `std` - **Enabled** by default. This enables the ability to compile finite
85 automata. This requires the `regex-syntax` dependency. Without this feature
88 * `transducer` - **Disabled** by default. This provides implementations of the
90 automata generated by this crate to search finite state transducers. This
94 ### Differences with the regex crate
96 The main goal of the [`regex`](https://docs.rs/regex) crate is to serve as a
97 general purpose regular expression engine. It aims to automatically balance low
106 of the regex pattern. While most patterns do not exhibit worst case
110 option to return an error if the DFA gets too big.)
111 * This crate does not support sub-match extraction, which can be achieved with
112 the regex crate's "captures" API. This may be added in the future, but is
114 * While the regex crate doesn't necessarily sport fast compilation times, the
115 regexes in this crate are almost universally slow to compile, especially when
118 almost 5MB of memory! (Compiling a sparse regex takes about the same time
119 but only uses about 500KB of memory.) Conversly, compiling the same regex
120 without Unicode support, e.g., `(?-u)\w{3}`, takes under 1 millisecond and
123 * This crate does not support regex sets.
124 * This crate does not support zero-width assertions such as `^`, `$`, `\b` or
130 * There is no `&str` API like in the regex crate. In this crate, all APIs
131 operate on `&[u8]`. By default, match indices are guaranteed to fall on
132 UTF-8 boundaries, unless `RegexBuilder::allow_invalid_utf8` is enabled.
136 * Both dense and sparse DFAs can be serialized to raw bytes, and then cheaply
142 deserialize pre-compiled regexes.
143 * Since this crate builds DFAs ahead of time, it will generally out-perform
144 the `regex` crate on equivalent tasks. The performance difference is likely
145 not large. However, because of a complex set of optimizations in the regex
147 difficult to do.
148 * Sparse DFAs provide a way to build a DFA ahead of time that sacrifices search
150 even less than what the regex crate uses.
152 which enables one to do less work in some cases. For example, if you only
154 directly without building a `Regex`, which always requires a second DFA to
158 things like choosing a smaller state identifier representation, to
169 want to improve if we want to make DFA compile times faster. I *believe*
170 it's possible to potentially build minimal or nearly minimal NFAs for the
174 key adaptation I think we need to make is to modify the algorithm to operate
177 * Add support for regex sets. It should be possible to do this by "simply"
179 each match, similar to how Aho-Corasick works. I think the long pole in the
181 introduce extra overhead into the non-regex-set case without duplicating a
191 to make small fixed length look-around work? It would be really nice to
194 src/codegen.rs that is thoroughly bit-rotted. At the time, I was
200 enough. Either way, it's probably a good option to have. For one thing, it
203 you only need to compile one of them for any given arch).
206 implement something for this outside of the crate, but it would be good to
208 suspect we might want to support several variants.
210 in this crate. My original intent was to not let this crate sink down into
211 that very very very deep rabbit hole. But instead, we might want to provide
212 some way for literal optimizations to hook into the match routines. The right
213 path forward here is to probably build something outside of the crate and
217 quite costly to build. Their worst case compilation time is O(2^n), where
219 seems to provide a way to character state blow up such that it is detectable.
220 If we could know whether a regex will exhibit state explosion or not, then
221 we could make an intelligent decision about whether to ahead-of-time compile
223-Shutu/publication/229032602_Characterization_of_a_global_germplasm_collection_and_its_potential_u…