PERFORMANCE.md - OpenGrok cross reference for /external/rust/crates/regex/PERFORMANCE.md

Lines Matching +full:regex +full:- +full:not
5 can be found here: https://docs.rs/regex
12 regex implementations, which typically use backtracking which has worst case
28 or places where the current regex engine isn't quite optimal. This guide will
32 ## Thou Shalt Not Compile Regular Expressions In A Loop
34 **Advice**: Use `lazy_static` to amortize the cost of `Regex` compilation.
44 turn it into a proper automaton that decodes a subset of UTF-8 which
48 This means that in order to realize efficient regex matching, one must
50 inside a loop, then make sure your call to `Regex::new` is *outside* that loop.
55 life-before-main, and therefore, one cannot utter this:
57     static MY_REGEX: Regex = Regex::new("...").unwrap();
59 Unfortunately, this would seem to imply that one must pass `Regex` objects
66     use regex::Regex;
68     fn some_helper_function(text: &str) -> bool {
70             static ref MY_REGEX: Regex = Regex::new("...").unwrap();
75 In other words, the `lazy_static!` macro enables us to define a `Regex` *as if*
77 that the code inside the macro (i.e., `Regex::new(...)`) is run on *first use*
83 ## Using a regex from multiple threads
85 **Advice**: The performance impact from using a `Regex` from multiple threads
86 is likely negligible. If necessary, clone the `Regex` so that each thread gets
87 its own copy. Cloning a regex does not incur any additional memory overhead
88 than what would be used by using a `Regex` from multiple threads
95 One might imagine that this is possible because a `Regex` represents a
97 therefore read-only. Unfortunately, this is not true. Each type of search
104 mutation should not be observable from users of this crate. Therefore, it uses
105 interior mutability. This implies that `Regex` can either only be used from one
110 Synchronization implies *some* amount of overhead. When a `Regex` is used from
111 a single thread, this overhead is negligible. When a `Regex` is used from
129 Then you may not suffer from contention since the cost of synchronization is
137 There are three primary search methods on a `Regex`:
146 end of the leftmost-first match. It can quit immediately after it knows there
147 is a match. For example, given the regex `a+` and the haystack, `aaaaa`, the
151 leftmost-first match. It can use the DFA matcher for this, but must run it
154 the leftmost-first match make this more expensive than `is_match`.
163 One other method not mentioned is `shortest_match`. This method has precisely
165 end location of when it discovered a match. For example, given the regex `a+`
167 the latter of which being the correct end location of the leftmost-first match.
169 ## Literals in your regex may make it faster
171 **Advice**: Literals can reduce the work that the regex engine needs to do. Use
174 In particular, if your regex starts with a prefix literal, the prefix is
175 quickly searched before entering the (much slower) regex engine. For example,
176 given the regex `foo\w+`, the literal `foo` will be searched for using
177 Boyer-Moore. If there's no match, then no regex engine is ever used. Only when
178 there's a match is the regex engine invoked at the location of the match, which
179 effectively permits the regex engine to skip large portions of a haystack.
180 If a regex is comprised entirely of literals (possibly more than one), then
181 it's possible that the regex engine can be avoided entirely even when there's a
184 When one literal is found, Boyer-Moore is used. When multiple literals are
185 found, then an optimized version of Aho-Corasick is used.
197 Literals in anchored regexes can also be used for detecting non-matches very
198 quickly. For example, `^foo\w+` and `\w+foo$` may be able to detect a non-match
203 **Advice**: In most cases, `\b` should work well. If not, use `(?-u:\b)`
208 If the DFA comes across any non-ASCII byte, it will quit and fall back to an
214 expression. Even though the DFA may not be used, specialized routines will
220 instead of using `\b`, use `(?-u:\b)`.  Namely, given the regex `\b.+\b`, it
221 can be transformed into a regex that uses the DFA with `(?-u:\b).+(?-u:\b)`. It
223 to a syntax error if the regex could match arbitrary bytes. For example, if one
224 wrote `(?-u)\b.+\b`, then a syntax error would be returned because `.` matches
231 non-ASCII UTF-8 bytes. This results in giving up correctness in exchange for
234 N.B. When using `bytes::Regex`, Unicode support is disabled by default, so one
274 An entire book was written on how to optimize Perl-style regular expressions.
275 Most of those techniques are not applicable for this library. For example,
276 there is no problem with using non-greedy matching or having lots of
277 alternations in your regex.