• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
4<title>Knuth-Morris-Pratt Search</title>
5<link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7<link rel="home" href="../../index.html" title="The Boost Algorithm Library">
8<link rel="up" href="../../algorithm/Searching.html" title="Searching Algorithms">
9<link rel="prev" href="BoyerMooreHorspool.html" title="Boyer-Moore-Horspool Search">
10<link rel="next" href="../../algorithm/CXX11.html" title="C++11 Algorithms">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<table cellpadding="2" width="100%"><tr>
14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15<td align="center"><a href="../../../../../../index.html">Home</a></td>
16<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19<td align="center"><a href="../../../../../../more/index.htm">More</a></td>
20</tr></table>
21<hr>
22<div class="spirit-nav">
23<a accesskey="p" href="BoyerMooreHorspool.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../../algorithm/CXX11.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
24</div>
25<div class="section">
26<div class="titlepage"><div><div><h3 class="title">
27<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt"></a><a class="link" href="KnuthMorrisPratt.html" title="Knuth-Morris-Pratt Search">Knuth-Morris-Pratt
28      Search</a>
29</h3></div></div></div>
30<h5>
31<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h0"></a>
32        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.overview"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.overview">Overview</a>
33      </h5>
34<p>
35        The header file 'knuth_morris_pratt.hpp' contains an implementation of the
36        Knuth-Morris-Pratt algorithm for searching sequences of values.
37      </p>
38<p>
39        The basic premise of the Knuth-Morris-Pratt algorithm is that when a mismatch
40        occurs, there is information in the pattern being searched for that can be
41        used to determine where the next match could begin, enabling the skipping
42        of some elements of the corpus that have already been examined.
43      </p>
44<p>
45        It does this by building a table from the pattern being searched for, with
46        one entry for each element in the pattern.
47      </p>
48<p>
49        The algorithm was conceived in 1974 by Donald Knuth and Vaughan Pratt, and
50        independently by James H. Morris. The three published it jointly in 1977
51        in the SIAM Journal on Computing <a href="http://citeseer.ist.psu.edu/context/23820/0" target="_top">http://citeseer.ist.psu.edu/context/23820/0</a>
52      </p>
53<p>
54        However, the Knuth-Morris-Pratt algorithm cannot be used with comparison
55        predicates like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
56      </p>
57<h5>
58<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h1"></a>
59        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.interface"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.interface">Interface</a>
60      </h5>
61<p>
62        Nomenclature: I refer to the sequence being searched for as the "pattern",
63        and the sequence being searched in as the "corpus".
64      </p>
65<p>
66        For flexibility, the Knuth-Morris-Pratt algorithm has two interfaces; an
67        object-based interface and a procedural one. The object-based interface builds
68        the table in the constructor, and uses operator () to perform the search.
69        The procedural interface builds the table and does the search all in one
70        step. If you are going to be searching for the same pattern in multiple corpora,
71        then you should use the object interface, and only build the tables once.
72      </p>
73<p>
74        Here is the object interface:
75</p>
76<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">&gt;</span>
77<span class="keyword">class</span> <span class="identifier">knuth_morris_pratt</span> <span class="special">{</span>
78<span class="keyword">public</span><span class="special">:</span>
79    <span class="identifier">knuth_morris_pratt</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
80    <span class="special">~</span><span class="identifier">knuth_morris_pratt</span> <span class="special">();</span>
81
82    <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
83    <span class="identifier">pair</span><span class="special">&lt;</span><span class="identifier">corpusIter</span><span class="special">,</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
84    <span class="special">};</span>
85</pre>
86<p>
87      </p>
88<p>
89        and here is the corresponding procedural interface:
90      </p>
91<p>
92</p>
93<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
94<span class="identifier">pair</span><span class="special">&lt;</span><span class="identifier">corpusIter</span><span class="special">,</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span> <span class="identifier">knuth_morris_pratt_search</span> <span class="special">(</span>
95        <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
96        <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
97</pre>
98<p>
99      </p>
100<p>
101        Each of the functions is passed two pairs of iterators. The first two define
102        the corpus and the second two define the pattern. Note that the two pairs
103        need not be of the same type, but they do need to "point" at the
104        same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
105        and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
106      </p>
107<p>
108        The return value of the function is a pair of iterators pointing to the position
109        of the pattern in the corpus. If the pattern is empty, it returns at empty
110        range at the start of the corpus (<code class="computeroutput"><span class="identifier">corpus_first</span></code>,
111        <code class="computeroutput"><span class="identifier">corpus_first</span></code>). If the pattern
112        is not found, it returns at empty range at the end of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>, <code class="computeroutput"><span class="identifier">corpus_last</span></code>).
113      </p>
114<h5>
115<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h2"></a>
116        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.compatibility_note"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.compatibility_note">Compatibility
117        Note</a>
118      </h5>
119<p>
120        Earlier versions of this searcher returned only a single iterator. As explained
121        in <a href="https://cplusplusmusings.wordpress.com/2016/02/01/sometimes-you-get-things-wrong/" target="_top">https://cplusplusmusings.wordpress.com/2016/02/01/sometimes-you-get-things-wrong/</a>,
122        this was a suboptimal interface choice, and has been changed, starting in
123        the 1.62.0 release. Old code that is expecting a single iterator return value
124        can be updated by replacing the return value of the searcher's <code class="computeroutput"><span class="keyword">operator</span> <span class="special">()</span></code>
125        with the <code class="computeroutput"><span class="special">.</span><span class="identifier">first</span></code>
126        field of the pair.
127      </p>
128<p>
129        Instead of:
130</p>
131<pre class="programlisting"><span class="identifier">iterator</span> <span class="identifier">foo</span> <span class="special">=</span> <span class="identifier">searcher</span><span class="special">(</span><span class="identifier">a</span><span class="special">,</span> <span class="identifier">b</span><span class="special">);</span>
132</pre>
133<p>
134      </p>
135<p>
136        you now write:
137</p>
138<pre class="programlisting"><span class="identifier">iterator</span> <span class="identifier">foo</span> <span class="special">=</span> <span class="identifier">searcher</span><span class="special">(</span><span class="identifier">a</span><span class="special">,</span> <span class="identifier">b</span><span class="special">).</span><span class="identifier">first</span><span class="special">;</span>
139</pre>
140<p>
141      </p>
142<h5>
143<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h3"></a>
144        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.performance"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.performance">Performance</a>
145      </h5>
146<p>
147        The execution time of the Knuth-Morris-Pratt algorithm is linear in the size
148        of the string being searched. Generally the algorithm gets faster as the
149        pattern being searched for becomes longer. Its efficiency derives from the
150        fact that with each unsuccessful attempt to find a match between the search
151        string and the text it is searching, it uses the information gained from
152        that attempt to rule out as many positions of the text as possible where
153        the string cannot match.
154      </p>
155<h5>
156<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h4"></a>
157        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.memory_use"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.memory_use">Memory
158        Use</a>
159      </h5>
160<p>
161        The algorithm an that contains one entry for each element the pattern, plus
162        one extra. So, when searching for a 1026 byte string, the table will have
163        1027 entries.
164      </p>
165<h5>
166<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h5"></a>
167        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.complexity"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.complexity">Complexity</a>
168      </h5>
169<p>
170        The worst-case performance is <span class="emphasis"><em>O(2n)</em></span>, where <span class="emphasis"><em>n</em></span>
171        is the length of the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>.
172        The best case performance is sub-linear.
173      </p>
174<h5>
175<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h6"></a>
176        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.exception_safety"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.exception_safety">Exception
177        Safety</a>
178      </h5>
179<p>
180        Both the object-oriented and procedural versions of the Knuth-Morris-Pratt
181        algorithm take their parameters by value and do not use any information other
182        than what is passed in. Therefore, both interfaces provide the strong exception
183        guarantee.
184      </p>
185<h5>
186<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h7"></a>
187        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.notes"></a></span><a class="link" href="KnuthMorrisPratt.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.notes">Notes</a>
188      </h5>
189<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
190<li class="listitem">
191            When using the object-based interface, the pattern must remain unchanged
192            for during the searches; i.e, from the time the object is constructed
193            until the final call to operator () returns.
194          </li>
195<li class="listitem">
196            The Knuth-Morris-Pratt algorithm requires random-access iterators for
197            both the pattern and the corpus. It should be possible to write this
198            to use bidirectional iterators (or possibly even forward ones), but this
199            implementation does not do that.
200          </li>
201</ul></div>
202</div>
203<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
204<td align="left"></td>
205<td align="right"><div class="copyright-footer">Copyright © 2010-2012 Marshall Clow<p>
206        Distributed under the Boost Software License, Version 1.0. (See accompanying
207        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
208      </p>
209</div></td>
210</tr></table>
211<hr>
212<div class="spirit-nav">
213<a accesskey="p" href="BoyerMooreHorspool.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../../algorithm/CXX11.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
214</div>
215</body>
216</html>
217