1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>POSIX Extended Regular Expression Syntax</title> 5<link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../../index.html" title="Boost.Regex 5.1.4"> 8<link rel="up" href="../syntax.html" title="Regular Expression Syntax"> 9<link rel="prev" href="perl_syntax.html" title="Perl Regular Expression Syntax"> 10<link rel="next" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h3 class="title"> 27<a name="boost_regex.syntax.basic_extended"></a><a class="link" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">POSIX Extended Regular 28 Expression Syntax</a> 29</h3></div></div></div> 30<h4> 31<a name="boost_regex.syntax.basic_extended.h0"></a> 32 <span class="phrase"><a name="boost_regex.syntax.basic_extended.synopsis"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.synopsis">Synopsis</a> 33 </h4> 34<p> 35 The POSIX-Extended regular expression syntax is supported by the POSIX C 36 regular expression API's, and variations are used by the utilities <code class="computeroutput"><span class="identifier">egrep</span></code> and <code class="computeroutput"><span class="identifier">awk</span></code>. 37 You can construct POSIX extended regular expressions in Boost.Regex by passing 38 the flag <code class="computeroutput"><span class="identifier">extended</span></code> to the 39 regex constructor, for example: 40 </p> 41<pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Extended expression:</span> 42<span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">);</span> 43<span class="comment">// e2 a case insensitive POSIX-Extended expression:</span> 44<span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span> 45</pre> 46<a name="boost_regex.posix_extended_syntax"></a><h4> 47<a name="boost_regex.syntax.basic_extended.h1"></a> 48 <span class="phrase"><a name="boost_regex.syntax.basic_extended.posix_extended_syntax"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.posix_extended_syntax">POSIX Extended 49 Syntax</a> 50 </h4> 51<p> 52 In POSIX-Extended regular expressions, all characters match themselves except 53 for the following special characters: 54 </p> 55<pre class="programlisting">.[{}()\*+?|^$</pre> 56<h5> 57<a name="boost_regex.syntax.basic_extended.h2"></a> 58 <span class="phrase"><a name="boost_regex.syntax.basic_extended.wildcard"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.wildcard">Wildcard:</a> 59 </h5> 60<p> 61 The single character '.' when used outside of a character set will match 62 any single character except: 63 </p> 64<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 65<li class="listitem"> 66 The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code> 67 is passed to the matching algorithms. 68 </li> 69<li class="listitem"> 70 The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code> 71 is passed to the matching algorithms. 72 </li> 73</ul></div> 74<h5> 75<a name="boost_regex.syntax.basic_extended.h3"></a> 76 <span class="phrase"><a name="boost_regex.syntax.basic_extended.anchors"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.anchors">Anchors:</a> 77 </h5> 78<p> 79 A '^' character shall match the start of a line when used as the first character 80 of an expression, or the first character of a sub-expression. 81 </p> 82<p> 83 A '$' character shall match the end of a line when used as the last character 84 of an expression, or the last character of a sub-expression. 85 </p> 86<h5> 87<a name="boost_regex.syntax.basic_extended.h4"></a> 88 <span class="phrase"><a name="boost_regex.syntax.basic_extended.marked_sub_expressions"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.marked_sub_expressions">Marked 89 sub-expressions:</a> 90 </h5> 91<p> 92 A section beginning <code class="computeroutput"><span class="special">(</span></code> and ending 93 <code class="computeroutput"><span class="special">)</span></code> acts as a marked sub-expression. 94 Whatever matched the sub-expression is split out in a separate field by the 95 matching algorithms. Marked sub-expressions can also repeated, or referred 96 to by a back-reference. 97 </p> 98<h5> 99<a name="boost_regex.syntax.basic_extended.h5"></a> 100 <span class="phrase"><a name="boost_regex.syntax.basic_extended.repeats"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.repeats">Repeats:</a> 101 </h5> 102<p> 103 Any atom (a single character, a marked sub-expression, or a character class) 104 can be repeated with the <code class="computeroutput"><span class="special">*</span></code>, 105 <code class="computeroutput"><span class="special">+</span></code>, <code class="computeroutput"><span class="special">?</span></code>, 106 and <code class="computeroutput"><span class="special">{}</span></code> operators. 107 </p> 108<p> 109 The <code class="computeroutput"><span class="special">*</span></code> operator will match the 110 preceding atom <span class="emphasis"><em>zero or more times</em></span>, for example the expression 111 <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span><span class="identifier">b</span></code> will match any of the following: 112 </p> 113<pre class="programlisting">b 114ab 115aaaaaaaab 116</pre> 117<p> 118 The <code class="computeroutput"><span class="special">+</span></code> operator will match the 119 preceding atom <span class="emphasis"><em>one or more times</em></span>, for example the expression 120 a+b will match any of the following: 121 </p> 122<pre class="programlisting">ab 123aaaaaaaab 124</pre> 125<p> 126 But will not match: 127 </p> 128<pre class="programlisting">b 129</pre> 130<p> 131 The <code class="computeroutput"><span class="special">?</span></code> operator will match the 132 preceding atom <span class="emphasis"><em>zero or one times</em></span>, for example the expression 133 <code class="computeroutput"><span class="identifier">ca</span><span class="special">?</span><span class="identifier">b</span></code> will match any of the following: 134 </p> 135<pre class="programlisting">cb 136cab 137</pre> 138<p> 139 But will not match: 140 </p> 141<pre class="programlisting">caab 142</pre> 143<p> 144 An atom can also be repeated with a bounded repeat: 145 </p> 146<p> 147 <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">}</span></code> Matches 148 'a' repeated <span class="emphasis"><em>exactly n times</em></span>. 149 </p> 150<p> 151 <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,}</span></code> Matches 152 'a' repeated <span class="emphasis"><em>n or more times</em></span>. 153 </p> 154<p> 155 <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">m</span><span class="special">}</span></code> Matches 'a' repeated <span class="emphasis"><em>between n 156 and m times inclusive</em></span>. 157 </p> 158<p> 159 For example: 160 </p> 161<pre class="programlisting">^a{2,3}$</pre> 162<p> 163 Will match either of: 164 </p> 165<pre class="programlisting"><span class="identifier">aa</span> 166<span class="identifier">aaa</span> 167</pre> 168<p> 169 But neither of: 170 </p> 171<pre class="programlisting"><span class="identifier">a</span> 172<span class="identifier">aaaa</span> 173</pre> 174<p> 175 It is an error to use a repeat operator, if the preceding construct can not 176 be repeated, for example: 177 </p> 178<pre class="programlisting"><span class="identifier">a</span><span class="special">(*)</span> 179</pre> 180<p> 181 Will raise an error, as there is nothing for the <code class="computeroutput"><span class="special">*</span></code> 182 operator to be applied to. 183 </p> 184<h5> 185<a name="boost_regex.syntax.basic_extended.h6"></a> 186 <span class="phrase"><a name="boost_regex.syntax.basic_extended.back_references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.back_references">Back 187 references:</a> 188 </h5> 189<p> 190 An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span> 191 is in the range 1-9, matches the same string that was matched by sub-expression 192 <span class="emphasis"><em>n</em></span>. For example the expression: 193 </p> 194<pre class="programlisting">^(a*)[^a]*\1$</pre> 195<p> 196 Will match the string: 197 </p> 198<pre class="programlisting"><span class="identifier">aaabbaaa</span> 199</pre> 200<p> 201 But not the string: 202 </p> 203<pre class="programlisting"><span class="identifier">aaabba</span> 204</pre> 205<div class="caution"><table border="0" summary="Caution"> 206<tr> 207<td rowspan="2" align="center" valign="top" width="25"><img alt="[Caution]" src="../../../../../../doc/src/images/caution.png"></td> 208<th align="left">Caution</th> 209</tr> 210<tr><td align="left" valign="top"><p> 211 The POSIX standard does not support back-references for "extended" 212 regular expressions, this is a compatible extension to that standard. 213 </p></td></tr> 214</table></div> 215<h5> 216<a name="boost_regex.syntax.basic_extended.h7"></a> 217 <span class="phrase"><a name="boost_regex.syntax.basic_extended.alternation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.alternation">Alternation</a> 218 </h5> 219<p> 220 The <code class="computeroutput"><span class="special">|</span></code> operator will match either 221 of its arguments, so for example: <code class="computeroutput"><span class="identifier">abc</span><span class="special">|</span><span class="identifier">def</span></code> will 222 match either "abc" or "def". 223 </p> 224<p> 225 Parenthesis can be used to group alternations, for example: <code class="computeroutput"><span class="identifier">ab</span><span class="special">(</span><span class="identifier">d</span><span class="special">|</span><span class="identifier">ef</span><span class="special">)</span></code> 226 will match either of "abd" or "abef". 227 </p> 228<h5> 229<a name="boost_regex.syntax.basic_extended.h8"></a> 230 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_sets"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_sets">Character 231 sets:</a> 232 </h5> 233<p> 234 A character set is a bracket-expression starting with [ and ending with ], 235 it defines a set of characters, and matches any single character that is 236 a member of that set. 237 </p> 238<p> 239 A bracket expression may contain any combination of the following: 240 </p> 241<h6> 242<a name="boost_regex.syntax.basic_extended.h9"></a> 243 <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_characters"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_characters">Single 244 characters:</a> 245 </h6> 246<p> 247 For example <code class="computeroutput"><span class="special">[</span><span class="identifier">abc</span><span class="special">]</span></code>, will match any of the characters 'a', 'b', 248 or 'c'. 249 </p> 250<h6> 251<a name="boost_regex.syntax.basic_extended.h10"></a> 252 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_ranges"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_ranges">Character 253 ranges:</a> 254 </h6> 255<p> 256 For example <code class="computeroutput"><span class="special">[</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> 257 will match any single character in the range 'a' to 'c'. By default, for 258 POSIX-Extended regular expressions, a character <span class="emphasis"><em>x</em></span> is 259 within the range <span class="emphasis"><em>y</em></span> to <span class="emphasis"><em>z</em></span>, if it 260 collates within that range; this results in locale specific behavior . This 261 behavior can be turned off by unsetting the <code class="computeroutput"><span class="identifier">collate</span></code> 262 <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">option flag</a> - in 263 which case whether a character appears within a range is determined by comparing 264 the code points of the characters only. 265 </p> 266<h6> 267<a name="boost_regex.syntax.basic_extended.h11"></a> 268 <span class="phrase"><a name="boost_regex.syntax.basic_extended.negation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.negation">Negation:</a> 269 </h6> 270<p> 271 If the bracket-expression begins with the ^ character, then it matches the 272 complement of the characters it contains, for example <code class="computeroutput"><span class="special">[^</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> matches any character that is not in the 273 range <code class="computeroutput"><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span></code>. 274 </p> 275<h6> 276<a name="boost_regex.syntax.basic_extended.h12"></a> 277 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_classes">Character 278 classes:</a> 279 </h6> 280<p> 281 An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code> 282 matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See 283 <a class="link" href="character_classes.html" title="Character Class Names">character class names</a>. 284 </p> 285<h6> 286<a name="boost_regex.syntax.basic_extended.h13"></a> 287 <span class="phrase"><a name="boost_regex.syntax.basic_extended.collating_elements"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.collating_elements">Collating 288 Elements:</a> 289 </h6> 290<p> 291 An expression of the form <code class="computeroutput"><span class="special">[[.</span><span class="identifier">col</span><span class="special">.]</span></code> matches 292 the collating element <span class="emphasis"><em>col</em></span>. A collating element is any 293 single character, or any sequence of characters that collates as a single 294 unit. Collating elements may also be used as the end point of a range, for 295 example: <code class="computeroutput"><span class="special">[[.</span><span class="identifier">ae</span><span class="special">.]-</span><span class="identifier">c</span><span class="special">]</span></code> 296 matches the character sequence "ae", plus any single character 297 in the range "ae"-c, assuming that "ae" is treated as 298 a single collating element in the current locale. 299 </p> 300<p> 301 Collating elements may be used in place of escapes (which are not normally 302 allowed inside character sets), for example <code class="computeroutput"><span class="special">[[.^.]</span><span class="identifier">abc</span><span class="special">]</span></code> would 303 match either one of the characters 'abc^'. 304 </p> 305<p> 306 As an extension, a collating element may also be specified via its <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>, for example: 307 </p> 308<pre class="programlisting"><span class="special">[[.</span><span class="identifier">NUL</span><span class="special">.]]</span> 309</pre> 310<p> 311 matches a NUL character. 312 </p> 313<h6> 314<a name="boost_regex.syntax.basic_extended.h14"></a> 315 <span class="phrase"><a name="boost_regex.syntax.basic_extended.equivalence_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.equivalence_classes">Equivalence 316 classes:</a> 317 </h6> 318<p> 319 An expression of the form <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>, 320 matches any character or collating element whose primary sort key is the 321 same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating 322 elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">symbolic 323 name</a>. A primary sort key is one that ignores case, accentation, or 324 locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches 325 any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation 326 of this is reliant on the platform's collation and localisation support; 327 this feature can not be relied upon to work portably across all platforms, 328 or even all locales on one platform. 329 </p> 330<h6> 331<a name="boost_regex.syntax.basic_extended.h15"></a> 332 <span class="phrase"><a name="boost_regex.syntax.basic_extended.combinations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.combinations">Combinations:</a> 333 </h6> 334<p> 335 All of the above can be combined in one character set declaration, for example: 336 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]]</span></code>. 337 </p> 338<h5> 339<a name="boost_regex.syntax.basic_extended.h16"></a> 340 <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes">Escapes</a> 341 </h5> 342<p> 343 The POSIX standard defines no escape sequences for POSIX-Extended regular 344 expressions, except that: 345 </p> 346<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 347<li class="listitem"> 348 Any special character preceded by an escape shall match itself. 349 </li> 350<li class="listitem"> 351 The effect of any ordinary character being preceded by an escape is undefined. 352 </li> 353<li class="listitem"> 354 An escape inside a character class declaration shall match itself: in 355 other words the escape character is not "special" inside a 356 character class declaration; so <code class="computeroutput"><span class="special">[\^]</span></code> 357 will match either a literal '\' or a '^'. 358 </li> 359</ul></div> 360<p> 361 However, that's rather restrictive, so the following standard-compatible 362 extensions are also supported by Boost.Regex: 363 </p> 364<h6> 365<a name="boost_regex.syntax.basic_extended.h17"></a> 366 <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes_matching_a_specific_char"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes_matching_a_specific_char">Escapes 367 matching a specific character</a> 368 </h6> 369<p> 370 The following escape sequences are all synonyms for single characters: 371 </p> 372<div class="informaltable"><table class="table"> 373<colgroup> 374<col> 375<col> 376</colgroup> 377<thead><tr> 378<th> 379 <p> 380 Escape 381 </p> 382 </th> 383<th> 384 <p> 385 Character 386 </p> 387 </th> 388</tr></thead> 389<tbody> 390<tr> 391<td> 392 <p> 393 \a 394 </p> 395 </td> 396<td> 397 <p> 398 '\a' 399 </p> 400 </td> 401</tr> 402<tr> 403<td> 404 <p> 405 \e 406 </p> 407 </td> 408<td> 409 <p> 410 0x1B 411 </p> 412 </td> 413</tr> 414<tr> 415<td> 416 <p> 417 \f 418 </p> 419 </td> 420<td> 421 <p> 422 \f 423 </p> 424 </td> 425</tr> 426<tr> 427<td> 428 <p> 429 \n 430 </p> 431 </td> 432<td> 433 <p> 434 \n 435 </p> 436 </td> 437</tr> 438<tr> 439<td> 440 <p> 441 \r 442 </p> 443 </td> 444<td> 445 <p> 446 \r 447 </p> 448 </td> 449</tr> 450<tr> 451<td> 452 <p> 453 \t 454 </p> 455 </td> 456<td> 457 <p> 458 \t 459 </p> 460 </td> 461</tr> 462<tr> 463<td> 464 <p> 465 \v 466 </p> 467 </td> 468<td> 469 <p> 470 \v 471 </p> 472 </td> 473</tr> 474<tr> 475<td> 476 <p> 477 \b 478 </p> 479 </td> 480<td> 481 <p> 482 \b (but only inside a character class declaration). 483 </p> 484 </td> 485</tr> 486<tr> 487<td> 488 <p> 489 \cX 490 </p> 491 </td> 492<td> 493 <p> 494 An ASCII escape sequence - the character whose code point is X 495 % 32 496 </p> 497 </td> 498</tr> 499<tr> 500<td> 501 <p> 502 \xdd 503 </p> 504 </td> 505<td> 506 <p> 507 A hexadecimal escape sequence - matches the single character whose 508 code point is 0xdd. 509 </p> 510 </td> 511</tr> 512<tr> 513<td> 514 <p> 515 \x{dddd} 516 </p> 517 </td> 518<td> 519 <p> 520 A hexadecimal escape sequence - matches the single character whose 521 code point is 0xdddd. 522 </p> 523 </td> 524</tr> 525<tr> 526<td> 527 <p> 528 \0ddd 529 </p> 530 </td> 531<td> 532 <p> 533 An octal escape sequence - matches the single character whose code 534 point is 0ddd. 535 </p> 536 </td> 537</tr> 538<tr> 539<td> 540 <p> 541 \N{Name} 542 </p> 543 </td> 544<td> 545 <p> 546 Matches the single character which has the symbolic name <span class="emphasis"><em>Name</em></span>. 547 For example <code class="computeroutput"><span class="special">\\</span><span class="identifier">N</span><span class="special">{</span><span class="identifier">newline</span><span class="special">}</span></code> matches the single character \n. 548 </p> 549 </td> 550</tr> 551</tbody> 552</table></div> 553<h6> 554<a name="boost_regex.syntax.basic_extended.h18"></a> 555 <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_character_character_class"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_character_character_class">"Single 556 character" character classes:</a> 557 </h6> 558<p> 559 Any escaped character <span class="emphasis"><em>x</em></span>, if <span class="emphasis"><em>x</em></span> is 560 the name of a character class shall match any character that is a member 561 of that class, and any escaped character <span class="emphasis"><em>X</em></span>, if <span class="emphasis"><em>x</em></span> 562 is the name of a character class, shall match any character not in that class. 563 </p> 564<p> 565 The following are supported by default: 566 </p> 567<div class="informaltable"><table class="table"> 568<colgroup> 569<col> 570<col> 571</colgroup> 572<thead><tr> 573<th> 574 <p> 575 Escape sequence 576 </p> 577 </th> 578<th> 579 <p> 580 Equivalent to 581 </p> 582 </th> 583</tr></thead> 584<tbody> 585<tr> 586<td> 587 <p> 588 <code class="computeroutput"><span class="special">\</span><span class="identifier">d</span></code> 589 </p> 590 </td> 591<td> 592 <p> 593 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]]</span></code> 594 </p> 595 </td> 596</tr> 597<tr> 598<td> 599 <p> 600 <code class="computeroutput"><span class="special">\</span><span class="identifier">l</span></code> 601 </p> 602 </td> 603<td> 604 <p> 605 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> 606 </p> 607 </td> 608</tr> 609<tr> 610<td> 611 <p> 612 <code class="computeroutput"><span class="special">\</span><span class="identifier">s</span></code> 613 </p> 614 </td> 615<td> 616 <p> 617 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">space</span><span class="special">:]]</span></code> 618 </p> 619 </td> 620</tr> 621<tr> 622<td> 623 <p> 624 <code class="computeroutput"><span class="special">\</span><span class="identifier">u</span></code> 625 </p> 626 </td> 627<td> 628 <p> 629 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">upper</span><span class="special">:]]</span></code> 630 </p> 631 </td> 632</tr> 633<tr> 634<td> 635 <p> 636 <code class="computeroutput"><span class="special">\</span><span class="identifier">w</span></code> 637 </p> 638 </td> 639<td> 640 <p> 641 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">word</span><span class="special">:]]</span></code> 642 </p> 643 </td> 644</tr> 645<tr> 646<td> 647 <p> 648 <code class="computeroutput"><span class="special">\</span><span class="identifier">D</span></code> 649 </p> 650 </td> 651<td> 652 <p> 653 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">digit</span><span class="special">:]]</span></code> 654 </p> 655 </td> 656</tr> 657<tr> 658<td> 659 <p> 660 <code class="computeroutput"><span class="special">\</span><span class="identifier">L</span></code> 661 </p> 662 </td> 663<td> 664 <p> 665 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> 666 </p> 667 </td> 668</tr> 669<tr> 670<td> 671 <p> 672 <code class="computeroutput"><span class="special">\</span><span class="identifier">S</span></code> 673 </p> 674 </td> 675<td> 676 <p> 677 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">space</span><span class="special">:]]</span></code> 678 </p> 679 </td> 680</tr> 681<tr> 682<td> 683 <p> 684 <code class="computeroutput"><span class="special">\</span><span class="identifier">U</span></code> 685 </p> 686 </td> 687<td> 688 <p> 689 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">upper</span><span class="special">:]]</span></code> 690 </p> 691 </td> 692</tr> 693<tr> 694<td> 695 <p> 696 <code class="computeroutput"><span class="special">\</span><span class="identifier">W</span></code> 697 </p> 698 </td> 699<td> 700 <p> 701 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">word</span><span class="special">:]]</span></code> 702 </p> 703 </td> 704</tr> 705</tbody> 706</table></div> 707<h6> 708<a name="boost_regex.syntax.basic_extended.h19"></a> 709 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_properties"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_properties">Character 710 Properties</a> 711 </h6> 712<p> 713 The character property names in the following table are all equivalent to 714 the names used in character classes. 715 </p> 716<div class="informaltable"><table class="table"> 717<colgroup> 718<col> 719<col> 720<col> 721</colgroup> 722<thead><tr> 723<th> 724 <p> 725 Form 726 </p> 727 </th> 728<th> 729 <p> 730 Description 731 </p> 732 </th> 733<th> 734 <p> 735 Equivalent character set form 736 </p> 737 </th> 738</tr></thead> 739<tbody> 740<tr> 741<td> 742 <p> 743 <code class="computeroutput"><span class="special">\</span><span class="identifier">pX</span></code> 744 </p> 745 </td> 746<td> 747 <p> 748 Matches any character that has the property X. 749 </p> 750 </td> 751<td> 752 <p> 753 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">X</span><span class="special">:]]</span></code> 754 </p> 755 </td> 756</tr> 757<tr> 758<td> 759 <p> 760 <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code> 761 </p> 762 </td> 763<td> 764 <p> 765 Matches any character that has the property Name. 766 </p> 767 </td> 768<td> 769 <p> 770 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Name</span><span class="special">:]]</span></code> 771 </p> 772 </td> 773</tr> 774<tr> 775<td> 776 <p> 777 <code class="computeroutput"><span class="special">\</span><span class="identifier">PX</span></code> 778 </p> 779 </td> 780<td> 781 <p> 782 Matches any character that does not have the property X. 783 </p> 784 </td> 785<td> 786 <p> 787 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">X</span><span class="special">:]]</span></code> 788 </p> 789 </td> 790</tr> 791<tr> 792<td> 793 <p> 794 <code class="computeroutput"><span class="special">\</span><span class="identifier">P</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code> 795 </p> 796 </td> 797<td> 798 <p> 799 Matches any character that does not have the property Name. 800 </p> 801 </td> 802<td> 803 <p> 804 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">Name</span><span class="special">:]]</span></code> 805 </p> 806 </td> 807</tr> 808</tbody> 809</table></div> 810<p> 811 For example <code class="computeroutput"><span class="special">\</span><span class="identifier">pd</span></code> 812 matches any "digit" character, as does <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">digit</span><span class="special">}</span></code>. 813 </p> 814<h6> 815<a name="boost_regex.syntax.basic_extended.h20"></a> 816 <span class="phrase"><a name="boost_regex.syntax.basic_extended.word_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.word_boundaries">Word 817 Boundaries</a> 818 </h6> 819<p> 820 The following escape sequences match the boundaries of words: 821 </p> 822<div class="informaltable"><table class="table"> 823<colgroup> 824<col> 825<col> 826</colgroup> 827<thead><tr> 828<th> 829 <p> 830 Escape 831 </p> 832 </th> 833<th> 834 <p> 835 Meaning 836 </p> 837 </th> 838</tr></thead> 839<tbody> 840<tr> 841<td> 842 <p> 843 <code class="computeroutput"><span class="special">\<</span></code> 844 </p> 845 </td> 846<td> 847 <p> 848 Matches the start of a word. 849 </p> 850 </td> 851</tr> 852<tr> 853<td> 854 <p> 855 <code class="computeroutput"><span class="special">\></span></code> 856 </p> 857 </td> 858<td> 859 <p> 860 Matches the end of a word. 861 </p> 862 </td> 863</tr> 864<tr> 865<td> 866 <p> 867 <code class="computeroutput"><span class="special">\</span><span class="identifier">b</span></code> 868 </p> 869 </td> 870<td> 871 <p> 872 Matches a word boundary (the start or end of a word). 873 </p> 874 </td> 875</tr> 876<tr> 877<td> 878 <p> 879 <code class="computeroutput"><span class="special">\</span><span class="identifier">B</span></code> 880 </p> 881 </td> 882<td> 883 <p> 884 Matches only when not at a word boundary. 885 </p> 886 </td> 887</tr> 888</tbody> 889</table></div> 890<h6> 891<a name="boost_regex.syntax.basic_extended.h21"></a> 892 <span class="phrase"><a name="boost_regex.syntax.basic_extended.buffer_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.buffer_boundaries">Buffer 893 boundaries</a> 894 </h6> 895<p> 896 The following match only at buffer boundaries: a "buffer" in this 897 context is the whole of the input text that is being matched against (note 898 that ^ and $ may match embedded newlines within the text). 899 </p> 900<div class="informaltable"><table class="table"> 901<colgroup> 902<col> 903<col> 904</colgroup> 905<thead><tr> 906<th> 907 <p> 908 Escape 909 </p> 910 </th> 911<th> 912 <p> 913 Meaning 914 </p> 915 </th> 916</tr></thead> 917<tbody> 918<tr> 919<td> 920 <p> 921 \` 922 </p> 923 </td> 924<td> 925 <p> 926 Matches at the start of a buffer only. 927 </p> 928 </td> 929</tr> 930<tr> 931<td> 932 <p> 933 \' 934 </p> 935 </td> 936<td> 937 <p> 938 Matches at the end of a buffer only. 939 </p> 940 </td> 941</tr> 942<tr> 943<td> 944 <p> 945 <code class="computeroutput"><span class="special">\</span><span class="identifier">A</span></code> 946 </p> 947 </td> 948<td> 949 <p> 950 Matches at the start of a buffer only (the same as \`). 951 </p> 952 </td> 953</tr> 954<tr> 955<td> 956 <p> 957 <code class="computeroutput"><span class="special">\</span><span class="identifier">z</span></code> 958 </p> 959 </td> 960<td> 961 <p> 962 Matches at the end of a buffer only (the same as \'). 963 </p> 964 </td> 965</tr> 966<tr> 967<td> 968 <p> 969 <code class="computeroutput"><span class="special">\</span><span class="identifier">Z</span></code> 970 </p> 971 </td> 972<td> 973 <p> 974 Matches an optional sequence of newlines at the end of a buffer: 975 equivalent to the regular expression <code class="computeroutput"><span class="special">\</span><span class="identifier">n</span><span class="special">*\</span><span class="identifier">z</span></code> 976 </p> 977 </td> 978</tr> 979</tbody> 980</table></div> 981<h6> 982<a name="boost_regex.syntax.basic_extended.h22"></a> 983 <span class="phrase"><a name="boost_regex.syntax.basic_extended.continuation_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.continuation_escape">Continuation 984 Escape</a> 985 </h6> 986<p> 987 The sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">G</span></code> 988 matches only at the end of the last match found, or at the start of the text 989 being matched if no previous match was found. This escape useful if you're 990 iterating over the matches contained within a text, and you want each subsequence 991 match to start where the last one ended. 992 </p> 993<h6> 994<a name="boost_regex.syntax.basic_extended.h23"></a> 995 <span class="phrase"><a name="boost_regex.syntax.basic_extended.quoting_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.quoting_escape">Quoting 996 escape</a> 997 </h6> 998<p> 999 The escape sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span></code> 1000 begins a "quoted sequence": all the subsequent characters are treated 1001 as literals, until either the end of the regular expression or <code class="computeroutput"><span class="special">\</span><span class="identifier">E</span></code> is found. 1002 For example the expression: <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span><span class="special">\*+\</span><span class="identifier">Ea</span><span class="special">+</span></code> would match either of: 1003 </p> 1004<pre class="programlisting"><span class="special">\*+</span><span class="identifier">a</span> 1005<span class="special">\*+</span><span class="identifier">aaa</span> 1006</pre> 1007<h6> 1008<a name="boost_regex.syntax.basic_extended.h24"></a> 1009 <span class="phrase"><a name="boost_regex.syntax.basic_extended.unicode_escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.unicode_escapes">Unicode 1010 escapes</a> 1011 </h6> 1012<div class="informaltable"><table class="table"> 1013<colgroup> 1014<col> 1015<col> 1016</colgroup> 1017<thead><tr> 1018<th> 1019 <p> 1020 Escape 1021 </p> 1022 </th> 1023<th> 1024 <p> 1025 Meaning 1026 </p> 1027 </th> 1028</tr></thead> 1029<tbody> 1030<tr> 1031<td> 1032 <p> 1033 <code class="computeroutput"><span class="special">\</span><span class="identifier">C</span></code> 1034 </p> 1035 </td> 1036<td> 1037 <p> 1038 Matches a single code point: in Boost regex this has exactly the 1039 same effect as a "." operator. 1040 </p> 1041 </td> 1042</tr> 1043<tr> 1044<td> 1045 <p> 1046 <code class="computeroutput"><span class="special">\</span><span class="identifier">X</span></code> 1047 </p> 1048 </td> 1049<td> 1050 <p> 1051 Matches a combining character sequence: that is any non-combining 1052 character followed by a sequence of zero or more combining characters. 1053 </p> 1054 </td> 1055</tr> 1056</tbody> 1057</table></div> 1058<h6> 1059<a name="boost_regex.syntax.basic_extended.h25"></a> 1060 <span class="phrase"><a name="boost_regex.syntax.basic_extended.any_other_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.any_other_escape">Any 1061 other escape</a> 1062 </h6> 1063<p> 1064 Any other escape sequence matches the character that is escaped, for example 1065 \@ matches a literal '@'. 1066 </p> 1067<h5> 1068<a name="boost_regex.syntax.basic_extended.h26"></a> 1069 <span class="phrase"><a name="boost_regex.syntax.basic_extended.operator_precedence"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.operator_precedence">Operator 1070 precedence</a> 1071 </h5> 1072<p> 1073 The order of precedence for of operators is as follows: 1074 </p> 1075<div class="orderedlist"><ol class="orderedlist" type="1"> 1076<li class="listitem"> 1077 Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span> 1078 <span class="special">[::]</span> <span class="special">[..]</span></code> 1079 </li> 1080<li class="listitem"> 1081 Escaped characters <code class="computeroutput"><span class="special">\</span></code> 1082 </li> 1083<li class="listitem"> 1084 Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code> 1085 </li> 1086<li class="listitem"> 1087 Grouping <code class="computeroutput"><span class="special">()</span></code> 1088 </li> 1089<li class="listitem"> 1090 Single-character-ERE duplication <code class="computeroutput"><span class="special">*</span> 1091 <span class="special">+</span> <span class="special">?</span> 1092 <span class="special">{</span><span class="identifier">m</span><span class="special">,</span><span class="identifier">n</span><span class="special">}</span></code> 1093 </li> 1094<li class="listitem"> 1095 Concatenation 1096 </li> 1097<li class="listitem"> 1098 Anchoring ^$ 1099 </li> 1100<li class="listitem"> 1101 Alternation <code class="computeroutput"><span class="special">|</span></code> 1102 </li> 1103</ol></div> 1104<h5> 1105<a name="boost_regex.syntax.basic_extended.h27"></a> 1106 <span class="phrase"><a name="boost_regex.syntax.basic_extended.what_gets_matched"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.what_gets_matched">What 1107 Gets Matched</a> 1108 </h5> 1109<p> 1110 When there is more that one way to match a regular expression, the "best" 1111 possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest 1112 rule</a>. 1113 </p> 1114<h4> 1115<a name="boost_regex.syntax.basic_extended.h28"></a> 1116 <span class="phrase"><a name="boost_regex.syntax.basic_extended.variations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.variations">Variations</a> 1117 </h4> 1118<h5> 1119<a name="boost_regex.syntax.basic_extended.h29"></a> 1120 <span class="phrase"><a name="boost_regex.syntax.basic_extended.egrep"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.egrep">Egrep</a> 1121 </h5> 1122<p> 1123 When an expression is compiled with the <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">flag 1124 <code class="computeroutput"><span class="identifier">egrep</span></code></a> set, then the 1125 expression is treated as a newline separated list of <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended 1126 expressions</a>, a match is found if any of the expressions in the list 1127 match, for example: 1128 </p> 1129<pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">egrep</span><span class="special">);</span> 1130</pre> 1131<p> 1132 will match either of the POSIX-Basic expressions "abc" or "def". 1133 </p> 1134<p> 1135 As its name suggests, this behavior is consistent with the Unix utility 1136 <code class="computeroutput"><span class="identifier">egrep</span></code>, and with grep when 1137 used with the -E option. 1138 </p> 1139<h5> 1140<a name="boost_regex.syntax.basic_extended.h30"></a> 1141 <span class="phrase"><a name="boost_regex.syntax.basic_extended.awk"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.awk">awk</a> 1142 </h5> 1143<p> 1144 In addition to the <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended 1145 features</a> the escape character is special inside a character class 1146 declaration. 1147 </p> 1148<p> 1149 In addition, some escape sequences that are not defined as part of POSIX-Extended 1150 specification are required to be supported - however Boost.Regex supports 1151 these by default anyway. 1152 </p> 1153<h4> 1154<a name="boost_regex.syntax.basic_extended.h31"></a> 1155 <span class="phrase"><a name="boost_regex.syntax.basic_extended.options"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.options">Options</a> 1156 </h4> 1157<p> 1158 There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions">variety 1159 of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">extended</span></code> 1160 and <code class="computeroutput"><span class="identifier">egrep</span></code> options when constructing 1161 the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code></a> option alters the syntax, 1162 while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code>, <code class="computeroutput"><span class="identifier">nosubs</span></code> 1163 and <code class="computeroutput"><span class="identifier">icase</span></code> options</a> 1164 modify how the case and locale sensitivity are to be applied. 1165 </p> 1166<h4> 1167<a name="boost_regex.syntax.basic_extended.h32"></a> 1168 <span class="phrase"><a name="boost_regex.syntax.basic_extended.references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.references">References</a> 1169 </h4> 1170<p> 1171 <a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE 1172 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions 1173 and Headers, Section 9, Regular Expressions.</a> 1174 </p> 1175<p> 1176 <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target="_top">IEEE 1177 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and 1178 Utilities, Section 4, Utilities, egrep.</a> 1179 </p> 1180<p> 1181 <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/awk.html" target="_top">IEEE 1182 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and 1183 Utilities, Section 4, Utilities, awk.</a> 1184 </p> 1185</div> 1186<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 1187<td align="left"></td> 1188<td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> 1189 Distributed under the Boost Software License, Version 1.0. (See accompanying 1190 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) 1191 </p> 1192</div></td> 1193</tr></table> 1194<hr> 1195<div class="spirit-nav"> 1196<a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> 1197</div> 1198</body> 1199</html> 1200