1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>Understanding Marked Sub-Expressions and Captures</title> 5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../index.html" title="Boost.Regex 5.1.4"> 8<link rel="up" href="../index.html" title="Boost.Regex 5.1.4"> 9<link rel="prev" href="unicode.html" title="Unicode and Boost.Regex"> 10<link rel="next" href="partial_matches.html" title="Partial Matches"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="unicode.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="partial_matches.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 27<a name="boost_regex.captures"></a><a class="link" href="captures.html" title="Understanding Marked Sub-Expressions and Captures">Understanding Marked Sub-Expressions 28 and Captures</a> 29</h2></div></div></div> 30<p> 31 Captures are the iterator ranges that are "captured" by marked sub-expressions 32 as a regular expression gets matched. Each marked sub-expression can result 33 in more than one capture, if it is matched more than once. This document explains 34 how captures and marked sub-expressions in Boost.Regex are represented and 35 accessed. 36 </p> 37<h5> 38<a name="boost_regex.captures.h0"></a> 39 <span class="phrase"><a name="boost_regex.captures.marked_sub_expressions"></a></span><a class="link" href="captures.html#boost_regex.captures.marked_sub_expressions">Marked 40 sub-expressions</a> 41 </h5> 42<p> 43 Every time a Perl regular expression contains a parenthesis group <code class="computeroutput"><span class="special">()</span></code>, it spits out an extra field, known as a 44 marked sub-expression, for example the expression: 45 </p> 46<pre class="programlisting">(\w+)\W+(\w+)</pre> 47<p> 48 Has two marked sub-expressions (known as $1 and $2 respectively), in addition 49 the complete match is known as $&, everything before the first match as 50 $`, and everything after the match as $'. So if the above expression is searched 51 for within <code class="computeroutput"><span class="string">"@abc def--"</span></code>, 52 then we obtain: 53 </p> 54<div class="informaltable"><table class="table"> 55<colgroup> 56<col> 57<col> 58</colgroup> 59<thead><tr> 60<th> 61 <p> 62 Sub-expression 63 </p> 64 </th> 65<th> 66 <p> 67 Text found 68 </p> 69 </th> 70</tr></thead> 71<tbody> 72<tr> 73<td> 74 <p> 75 $` 76 </p> 77 </td> 78<td> 79 <p> 80 "@" 81 </p> 82 </td> 83</tr> 84<tr> 85<td> 86 <p> 87 $& 88 </p> 89 </td> 90<td> 91 <p> 92 "abc def" 93 </p> 94 </td> 95</tr> 96<tr> 97<td> 98 <p> 99 $1 100 </p> 101 </td> 102<td> 103 <p> 104 "abc" 105 </p> 106 </td> 107</tr> 108<tr> 109<td> 110 <p> 111 $2 112 </p> 113 </td> 114<td> 115 <p> 116 "def" 117 </p> 118 </td> 119</tr> 120<tr> 121<td> 122 <p> 123 $' 124 </p> 125 </td> 126<td> 127 <p> 128 "--" 129 </p> 130 </td> 131</tr> 132</tbody> 133</table></div> 134<p> 135 In Boost.Regex all these are accessible via the <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a> class that gets filled 136 in when calling one of the regular expression matching algorithms ( <a class="link" href="ref/regex_search.html" title="regex_search"><code class="computeroutput"><span class="identifier">regex_search</span></code></a>, <a class="link" href="ref/regex_match.html" title="regex_match"><code class="computeroutput"><span class="identifier">regex_match</span></code></a>, or <a class="link" href="ref/regex_iterator.html" title="regex_iterator"><code class="computeroutput"><span class="identifier">regex_iterator</span></code></a>). So given: 137 </p> 138<pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_results</span><span class="special"><</span><span class="identifier">IteratorType</span><span class="special">></span> <span class="identifier">m</span><span class="special">;</span> 139</pre> 140<p> 141 The Perl and Boost.Regex equivalents are as follows: 142 </p> 143<div class="informaltable"><table class="table"> 144<colgroup> 145<col> 146<col> 147</colgroup> 148<thead><tr> 149<th> 150 <p> 151 Perl 152 </p> 153 </th> 154<th> 155 <p> 156 Boost.Regex 157 </p> 158 </th> 159</tr></thead> 160<tbody> 161<tr> 162<td> 163 <p> 164 $` 165 </p> 166 </td> 167<td> 168 <p> 169 <code class="computeroutput"><span class="identifier">m</span><span class="special">.</span><span class="identifier">prefix</span><span class="special">()</span></code> 170 </p> 171 </td> 172</tr> 173<tr> 174<td> 175 <p> 176 $& 177 </p> 178 </td> 179<td> 180 <p> 181 <code class="computeroutput"><span class="identifier">m</span><span class="special">[</span><span class="number">0</span><span class="special">]</span></code> 182 </p> 183 </td> 184</tr> 185<tr> 186<td> 187 <p> 188 $n 189 </p> 190 </td> 191<td> 192 <p> 193 <code class="computeroutput"><span class="identifier">m</span><span class="special">[</span><span class="identifier">n</span><span class="special">]</span></code> 194 </p> 195 </td> 196</tr> 197<tr> 198<td> 199 <p> 200 $' 201 </p> 202 </td> 203<td> 204 <p> 205 <code class="computeroutput"><span class="identifier">m</span><span class="special">.</span><span class="identifier">suffix</span><span class="special">()</span></code> 206 </p> 207 </td> 208</tr> 209</tbody> 210</table></div> 211<p> 212 In Boost.Regex each sub-expression match is represented by a <a class="link" href="ref/sub_match.html" title="sub_match"><code class="computeroutput"><span class="identifier">sub_match</span></code></a> object, this is basically 213 just a pair of iterators denoting the start and end position of the sub-expression 214 match, but there are some additional operators provided so that objects of 215 type <a class="link" href="ref/sub_match.html" title="sub_match"><code class="computeroutput"><span class="identifier">sub_match</span></code></a> 216 behave a lot like a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span></code>: for example they are implicitly 217 convertible to a <code class="computeroutput"><span class="identifier">basic_string</span></code>, 218 they can be compared to a string, added to a string, or streamed out to an 219 output stream. 220 </p> 221<h5> 222<a name="boost_regex.captures.h1"></a> 223 <span class="phrase"><a name="boost_regex.captures.unmatched_sub_expressions"></a></span><a class="link" href="captures.html#boost_regex.captures.unmatched_sub_expressions">Unmatched 224 Sub-Expressions</a> 225 </h5> 226<p> 227 When a regular expression match is found there is no need for all of the marked 228 sub-expressions to have participated in the match, for example the expression: 229 </p> 230<pre class="programlisting">(abc)|(def)</pre> 231<p> 232 can match either $1 or $2, but never both at the same time. In Boost.Regex 233 you can determine which sub-expressions matched by accessing the <code class="computeroutput"><span class="identifier">sub_match</span><span class="special">::</span><span class="identifier">matched</span></code> data member. 234 </p> 235<h5> 236<a name="boost_regex.captures.h2"></a> 237 <span class="phrase"><a name="boost_regex.captures.repeated_captures"></a></span><a class="link" href="captures.html#boost_regex.captures.repeated_captures">Repeated 238 Captures</a> 239 </h5> 240<p> 241 When a marked sub-expression is repeated, then the sub-expression gets "captured" 242 multiple times, however normally only the final capture is available, for example 243 if 244 </p> 245<pre class="programlisting">(?:(\w+)\W+)+</pre> 246<p> 247 is matched against 248 </p> 249<pre class="programlisting">one fine day</pre> 250<p> 251 Then $1 will contain the string "day", and all the previous captures 252 will have been forgotten. 253 </p> 254<p> 255 However, Boost.Regex has an experimental feature that allows all the capture 256 information to be retained - this is accessed either via the <code class="computeroutput"><span class="identifier">match_results</span><span class="special">::</span><span class="identifier">captures</span></code> member function or the <code class="computeroutput"><span class="identifier">sub_match</span><span class="special">::</span><span class="identifier">captures</span></code> member function. These functions 257 return a container that contains a sequence of all the captures obtained during 258 the regular expression matching. The following example program shows how this 259 information may be used: 260 </p> 261<pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">regex</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> 262<span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> 263 264<span class="keyword">void</span> <span class="identifier">print_captures</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&</span> <span class="identifier">regx</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&</span> <span class="identifier">text</span><span class="special">)</span> 265<span class="special">{</span> 266 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="identifier">regx</span><span class="special">);</span> 267 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> 268 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"Expression: \""</span> <span class="special"><<</span> <span class="identifier">regx</span> <span class="special"><<</span> <span class="string">"\"\n"</span><span class="special">;</span> 269 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"Text: \""</span> <span class="special"><<</span> <span class="identifier">text</span> <span class="special"><<</span> <span class="string">"\"\n"</span><span class="special">;</span> 270 <span class="keyword">if</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">text</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">e</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_extra</span><span class="special">))</span> 271 <span class="special">{</span> 272 <span class="keyword">unsigned</span> <span class="identifier">i</span><span class="special">,</span> <span class="identifier">j</span><span class="special">;</span> 273 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"** Match found **\n Sub-Expressions:\n"</span><span class="special">;</span> 274 <span class="keyword">for</span><span class="special">(</span><span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special"><</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span> 275 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" $"</span> <span class="special"><<</span> <span class="identifier">i</span> <span class="special"><<</span> <span class="string">" = \""</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special"><<</span> <span class="string">"\"\n"</span><span class="special">;</span> 276 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" Captures:\n"</span><span class="special">;</span> 277 <span class="keyword">for</span><span class="special">(</span><span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special"><</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span> 278 <span class="special">{</span> 279 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" $"</span> <span class="special"><<</span> <span class="identifier">i</span> <span class="special"><<</span> <span class="string">" = {"</span><span class="special">;</span> 280 <span class="keyword">for</span><span class="special">(</span><span class="identifier">j</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">j</span> <span class="special"><</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">captures</span><span class="special">(</span><span class="identifier">i</span><span class="special">).</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">j</span><span class="special">)</span> 281 <span class="special">{</span> 282 <span class="keyword">if</span><span class="special">(</span><span class="identifier">j</span><span class="special">)</span> 283 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">", "</span><span class="special">;</span> 284 <span class="keyword">else</span> 285 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" "</span><span class="special">;</span> 286 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"\""</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">captures</span><span class="special">(</span><span class="identifier">i</span><span class="special">)[</span><span class="identifier">j</span><span class="special">]</span> <span class="special"><<</span> <span class="string">"\""</span><span class="special">;</span> 287 <span class="special">}</span> 288 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" }\n"</span><span class="special">;</span> 289 <span class="special">}</span> 290 <span class="special">}</span> 291 <span class="keyword">else</span> 292 <span class="special">{</span> 293 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"** No Match found **\n"</span><span class="special">;</span> 294 <span class="special">}</span> 295<span class="special">}</span> 296 297<span class="keyword">int</span> <span class="identifier">main</span><span class="special">(</span><span class="keyword">int</span> <span class="special">,</span> <span class="keyword">char</span><span class="special">*</span> <span class="special">[])</span> 298<span class="special">{</span> 299 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(([[:lower:]]+)|([[:upper:]]+))+"</span><span class="special">,</span> <span class="string">"aBBcccDDDDDeeeeeeee"</span><span class="special">);</span> 300 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(.*)bar|(.*)bah"</span><span class="special">,</span> <span class="string">"abcbar"</span><span class="special">);</span> 301 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(.*)bar|(.*)bah"</span><span class="special">,</span> <span class="string">"abcbah"</span><span class="special">);</span> 302 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"^(?:(\\w+)|(?>\\W+))*$"</span><span class="special">,</span> 303 <span class="string">"now is the time for all good men to come to the aid of the party"</span><span class="special">);</span> 304 <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> 305<span class="special">}</span> 306</pre> 307<p> 308 Which produces the following output: 309 </p> 310<pre class="programlisting">Expression: "(([[:lower:]]+)|([[:upper:]]+))+" 311Text: "aBBcccDDDDDeeeeeeee" 312** Match found ** 313 Sub-Expressions: 314 $0 = "aBBcccDDDDDeeeeeeee" 315 $1 = "eeeeeeee" 316 $2 = "eeeeeeee" 317 $3 = "DDDDD" 318 Captures: 319 $0 = { "aBBcccDDDDDeeeeeeee" } 320 $1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" } 321 $2 = { "a", "ccc", "eeeeeeee" } 322 $3 = { "BB", "DDDDD" } 323Expression: "(.*)bar|(.*)bah" 324Text: "abcbar" 325** Match found ** 326 Sub-Expressions: 327 $0 = "abcbar" 328 $1 = "abc" 329 $2 = "" 330 Captures: 331 $0 = { "abcbar" } 332 $1 = { "abc" } 333 $2 = { } 334Expression: "(.*)bar|(.*)bah" 335Text: "abcbah" 336** Match found ** 337 Sub-Expressions: 338 $0 = "abcbah" 339 $1 = "" 340 $2 = "abc" 341 Captures: 342 $0 = { "abcbah" } 343 $1 = { } 344 $2 = { "abc" } 345Expression: "^(?:(\w+)|(?>\W+))*$" 346Text: "now is the time for all good men to come to the aid of the party" 347** Match found ** 348 Sub-Expressions: 349 $0 = "now is the time for all good men to come to the aid of the party" 350 $1 = "party" 351 Captures: 352 $0 = { "now is the time for all good men to come to the aid of the party" } 353 $1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to", 354 "come", "to", "the", "aid", "of", "the", "party" } 355</pre> 356<p> 357 Unfortunately enabling this feature has an impact on performance (even if you 358 don't use it), and a much bigger impact if you do use it, therefore to use 359 this feature you need to: 360 </p> 361<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 362<li class="listitem"> 363 Define BOOST_REGEX_MATCH_EXTRA for all translation units including the 364 library source (the best way to do this is to uncomment this define in 365 boost/regex/user.hpp and then rebuild everything. 366 </li> 367<li class="listitem"> 368 Pass the match_extra flag to the particular algorithms where you actually 369 need the captures information (regex_search, regex_match, or regex_iterator). 370 </li> 371</ul></div> 372</div> 373<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 374<td align="left"></td> 375<td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> 376 Distributed under the Boost Software License, Version 1.0. (See accompanying 377 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) 378 </p> 379</div></td> 380</tr></table> 381<hr> 382<div class="spirit-nav"> 383<a accesskey="p" href="unicode.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="partial_matches.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 384</div> 385</body> 386</html> 387