PERFORMANCE.md - OpenGrok cross reference for /third_party/rust/crates/regex/PERFORMANCE.md

Lines Matching +full:find +full:- +full:up
44 turn it into a proper automaton that decodes a subset of UTF-8 which
55 life-before-main, and therefore, one cannot utter this:
68     fn some_helper_function(text: &str) -> bool {
97 therefore read-only. Unfortunately, this is not true. Each type of search
119 * find
135 **Advice**: Prefer in this order: `is_match`, `find`, `captures`.
140 * find
145 `is_match` is fastest because it doesn't actually need to find the start or the
146 end of the leftmost-first match. It can quit immediately after it knows there
150 In contrast, `find` must return both the start and end location of the
151 leftmost-first match. It can use the DFA matcher for this, but must run it
152 forwards once to find the end of the match *and then run it backwards* to find
154 the leftmost-first match make this more expensive than `is_match`.
156 `captures` is the most expensive of them all because it must do what `find`
167 the latter of which being the correct end location of the leftmost-first match.
177 Boyer-Moore. If there's no match, then no regex engine is ever used. Only when
184 When one literal is found, Boyer-Moore is used. When multiple literals are
185 found, then an optimized version of Aho-Corasick is used.
197 Literals in anchored regexes can also be used for detecting non-matches very
198 quickly. For example, `^foo\w+` and `\w+foo$` may be able to detect a non-match
203 **Advice**: In most cases, `\b` should work well. If not, use `(?-u:\b)`
208 If the DFA comes across any non-ASCII byte, it will quit and fall back to an
215 still kick in to find prefix literals quickly, which limits how much work the
218 The second way is to give up on Unicode and use an ASCII word boundary instead.
220 instead of using `\b`, use `(?-u:\b)`.  Namely, given the regex `\b.+\b`, it
221 can be transformed into a regex that uses the DFA with `(?-u:\b).+(?-u:\b)`. It
224 wrote `(?-u)\b.+\b`, then a syntax error would be returned because `.` matches
231 non-ASCII UTF-8 bytes. This results in giving up correctness in exchange for
237 ## Excessive counting can lead to exponential state blow up in the DFA
239 **Advice**: Don't write regexes that cause DFA state blow up if you care about
244 an exponential blow up in the number of states. This crate specifically guards
245 against exponential blow up by doing two things:
257    too frequently, then the DFA will give up and execution will fall back to
260 In effect, this crate will detect exponential state blow up and fall back to
274 An entire book was written on how to optimize Perl-style regular expressions.
276 there is no problem with using non-greedy matching or having lots of