• Home
  • Raw
  • Download

Lines Matching +full:to +full:- +full:regex

11 * A [`Regex`](struct.Regex.html) provides a way to search for matches of a
15 compilation options for a regex.
16 * A [`DenseDFA`](enum.DenseDFA.html) provides low level access to a DFA that
23 [serialization to raw bytes](enum.DenseDFA.html#method.to_bytes_little_endian)
27 # Example: basic regex searching
29 This example shows how to compile a regex using the default configuration
30 and then use it to find matches in a byte string:
33 use regex_automata::Regex;
35 let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
36 let text = b"2018-12-24 2016-10-08";
43 By default, compiling a regex will use dense DFAs internally. This uses more
45 (somewhere around 3-5x), then sparse DFAs might make more sense since they can
48 Using sparse DFAs is as easy as using `Regex::new_sparse` instead of
49 `Regex::new`:
52 use regex_automata::Regex;
54 # fn example() -> Result<(), regex_automata::Error> {
55 let re = Regex::new_sparse(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
56 let text = b"2018-12-24 2016-10-08";
62 If you already have dense DFAs for some reason, they can be converted to sparse
63 DFAs and used to build a new `Regex`. For example:
66 use regex_automata::Regex;
68 # fn example() -> Result<(), regex_automata::Error> {
69 let dense_re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
70 let sparse_re = Regex::from_dfas(
74 let text = b"2018-12-24 2016-10-08";
82 This shows how to first serialize a DFA into raw bytes, and then deserialize
84 contrived, this same technique can be used in your program to deserialize a
86 deserialization is guaranteed to be cheap because it will always be a constant
90 use regex_automata::{DenseDFA, Regex};
92 # fn example() -> Result<(), regex_automata::Error> {
93 let re1 = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
97 // now deserialize both---we need to specify the correct type!
100 // finally, reconstruct our regex
101 let re2 = Regex::from_dfas(fwd, rev);
104 let text = b"2018-12-24 2016-10-08";
112 * We need to extract the raw DFAs used by the regex and serialize those. You
115 `Regex` guarantees that the DFAs are built correctly.
116 * We specifically convert the dense DFA to a representation that uses `u16`
124 * To convert the DFA to raw bytes, we use the `to_bytes_native_endian`
125 method. In practice, you'll want to use either
130 to deserialize on either platform, then you'll need to serialize both and
136 DFA must be able to follow transitions blindly for performance reasons,
137 giving incorrect bytes to the deserialization API can result in memory
143 use regex_automata::{SparseDFA, Regex};
145 # fn example() -> Result<(), regex_automata::Error> {
146 let re1 = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}").unwrap();
150 // now deserialize both---we need to specify the correct type!
153 // finally, reconstruct our regex
154 let re2 = Regex::from_dfas(fwd, rev);
157 let text = b"2018-12-24 2016-10-08";
164 Conversely, dense DFAs must be be aligned to the same alignment as their
180 the DFAs to use a fixed size state identifier instead of the default `usize`.
181 You may also need to serialize both little and big endian versions of each
182 DFA. (So that's 4 DFAs in total for each regex.)
185 as you would any regex.
191 [`ucd-generate`](https://github.com/BurntSushi/ucd-generate)
192 tool will do the first step for you with its `dfa` or `regex` sub-commands.
196 This crate supports the same syntax as the `regex` crate, since they share the
198 [documentation for the `regex` crate](https://docs.rs/regex/1.1/regex/#syntax).
201 support zero-width assertions, although they may be added in the future. This
207 It is possible to run a search that is anchored at the beginning of the input.
208 To do that, set the
210 option when building a regex. By default, all searches are unanchored.
212 # Differences with the regex crate
214 The main goal of the [`regex`](https://docs.rs/regex) crate is to serve as a
215 general purpose regular expression engine. It aims to automatically balance low
224 of the regex pattern. While most patterns do not exhibit worst case
228 option to return an error if the DFA gets too big.)
229 * This crate does not support sub-match extraction, which can be achieved with
230 the regex crate's "captures" API. This may be added in the future, but is
232 * While the regex crate doesn't necessarily sport fast compilation times, the
233 regexes in this crate are almost universally slow to compile, especially when
236 almost 5MB of memory! (Compiling a sparse regex takes about the same time
237 but only uses about 500KB of memory.) Conversly, compiling the same regex
238 without Unicode support, e.g., `(?-u)\w{3}`, takes under 1 millisecond and
241 * This crate does not support regex sets.
242 * This crate does not support zero-width assertions such as `^`, `$`, `\b` or
248 * There is no `&str` API like in the regex crate. In this crate, all APIs
249 operate on `&[u8]`. By default, match indices are guaranteed to fall on
250 UTF-8 boundaries, unless
256 * Both dense and sparse DFAs can be serialized to raw bytes, and then cheaply
262 deserialize pre-compiled regexes.
263 * Since this crate builds DFAs ahead of time, it will generally out-perform
264 the `regex` crate on equivalent tasks. The performance difference is likely
265 not large. However, because of a complex set of optimizations in the regex
267 difficult to do.
268 * Sparse DFAs provide a way to build a DFA ahead of time that sacrifices search
270 even less than what the regex crate uses.
273 which enables one to do less work in some cases. For example, if you only
275 directly without building a `Regex`, which always requires a second DFA to
279 things like choosing a smaller state identifier representation, to
302 pub use regex::Regex;
304 pub use regex::RegexBuilder;
322 mod regex; module
331 /// Types and routines specific to dense DFAs.
343 /// Types and routines specific to sparse DFAs.
351 /// contain a builder specific for sparse DFAs. Instead, the intended way to