• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
4<title>POSIX Extended Regular Expression Syntax</title>
5<link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7<link rel="home" href="../../index.html" title="Boost.Regex 5.1.4">
8<link rel="up" href="../syntax.html" title="Regular Expression Syntax">
9<link rel="prev" href="perl_syntax.html" title="Perl Regular Expression Syntax">
10<link rel="next" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<table cellpadding="2" width="100%"><tr>
14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15<td align="center"><a href="../../../../../../index.html">Home</a></td>
16<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19<td align="center"><a href="../../../../../../more/index.htm">More</a></td>
20</tr></table>
21<hr>
22<div class="spirit-nav">
23<a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
24</div>
25<div class="section">
26<div class="titlepage"><div><div><h3 class="title">
27<a name="boost_regex.syntax.basic_extended"></a><a class="link" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">POSIX Extended Regular
28      Expression Syntax</a>
29</h3></div></div></div>
30<h4>
31<a name="boost_regex.syntax.basic_extended.h0"></a>
32        <span class="phrase"><a name="boost_regex.syntax.basic_extended.synopsis"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.synopsis">Synopsis</a>
33      </h4>
34<p>
35        The POSIX-Extended regular expression syntax is supported by the POSIX C
36        regular expression API's, and variations are used by the utilities <code class="computeroutput"><span class="identifier">egrep</span></code> and <code class="computeroutput"><span class="identifier">awk</span></code>.
37        You can construct POSIX extended regular expressions in Boost.Regex by passing
38        the flag <code class="computeroutput"><span class="identifier">extended</span></code> to the
39        regex constructor, for example:
40      </p>
41<pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Extended expression:</span>
42<span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">);</span>
43<span class="comment">// e2 a case insensitive POSIX-Extended expression:</span>
44<span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
45</pre>
46<a name="boost_regex.posix_extended_syntax"></a><h4>
47<a name="boost_regex.syntax.basic_extended.h1"></a>
48        <span class="phrase"><a name="boost_regex.syntax.basic_extended.posix_extended_syntax"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.posix_extended_syntax">POSIX Extended
49        Syntax</a>
50      </h4>
51<p>
52        In POSIX-Extended regular expressions, all characters match themselves except
53        for the following special characters:
54      </p>
55<pre class="programlisting">.[{}()\*+?|^$</pre>
56<h5>
57<a name="boost_regex.syntax.basic_extended.h2"></a>
58        <span class="phrase"><a name="boost_regex.syntax.basic_extended.wildcard"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.wildcard">Wildcard:</a>
59      </h5>
60<p>
61        The single character '.' when used outside of a character set will match
62        any single character except:
63      </p>
64<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
65<li class="listitem">
66            The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code>
67            is passed to the matching algorithms.
68          </li>
69<li class="listitem">
70            The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code>
71            is passed to the matching algorithms.
72          </li>
73</ul></div>
74<h5>
75<a name="boost_regex.syntax.basic_extended.h3"></a>
76        <span class="phrase"><a name="boost_regex.syntax.basic_extended.anchors"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.anchors">Anchors:</a>
77      </h5>
78<p>
79        A '^' character shall match the start of a line when used as the first character
80        of an expression, or the first character of a sub-expression.
81      </p>
82<p>
83        A '$' character shall match the end of a line when used as the last character
84        of an expression, or the last character of a sub-expression.
85      </p>
86<h5>
87<a name="boost_regex.syntax.basic_extended.h4"></a>
88        <span class="phrase"><a name="boost_regex.syntax.basic_extended.marked_sub_expressions"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.marked_sub_expressions">Marked
89        sub-expressions:</a>
90      </h5>
91<p>
92        A section beginning <code class="computeroutput"><span class="special">(</span></code> and ending
93        <code class="computeroutput"><span class="special">)</span></code> acts as a marked sub-expression.
94        Whatever matched the sub-expression is split out in a separate field by the
95        matching algorithms. Marked sub-expressions can also repeated, or referred
96        to by a back-reference.
97      </p>
98<h5>
99<a name="boost_regex.syntax.basic_extended.h5"></a>
100        <span class="phrase"><a name="boost_regex.syntax.basic_extended.repeats"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.repeats">Repeats:</a>
101      </h5>
102<p>
103        Any atom (a single character, a marked sub-expression, or a character class)
104        can be repeated with the <code class="computeroutput"><span class="special">*</span></code>,
105        <code class="computeroutput"><span class="special">+</span></code>, <code class="computeroutput"><span class="special">?</span></code>,
106        and <code class="computeroutput"><span class="special">{}</span></code> operators.
107      </p>
108<p>
109        The <code class="computeroutput"><span class="special">*</span></code> operator will match the
110        preceding atom <span class="emphasis"><em>zero or more times</em></span>, for example the expression
111        <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span><span class="identifier">b</span></code> will match any of the following:
112      </p>
113<pre class="programlisting">b
114ab
115aaaaaaaab
116</pre>
117<p>
118        The <code class="computeroutput"><span class="special">+</span></code> operator will match the
119        preceding atom <span class="emphasis"><em>one or more times</em></span>, for example the expression
120        a+b will match any of the following:
121      </p>
122<pre class="programlisting">ab
123aaaaaaaab
124</pre>
125<p>
126        But will not match:
127      </p>
128<pre class="programlisting">b
129</pre>
130<p>
131        The <code class="computeroutput"><span class="special">?</span></code> operator will match the
132        preceding atom <span class="emphasis"><em>zero or one times</em></span>, for example the expression
133        <code class="computeroutput"><span class="identifier">ca</span><span class="special">?</span><span class="identifier">b</span></code> will match any of the following:
134      </p>
135<pre class="programlisting">cb
136cab
137</pre>
138<p>
139        But will not match:
140      </p>
141<pre class="programlisting">caab
142</pre>
143<p>
144        An atom can also be repeated with a bounded repeat:
145      </p>
146<p>
147        <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">}</span></code> Matches
148        'a' repeated <span class="emphasis"><em>exactly n times</em></span>.
149      </p>
150<p>
151        <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,}</span></code> Matches
152        'a' repeated <span class="emphasis"><em>n or more times</em></span>.
153      </p>
154<p>
155        <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">m</span><span class="special">}</span></code> Matches 'a' repeated <span class="emphasis"><em>between n
156        and m times inclusive</em></span>.
157      </p>
158<p>
159        For example:
160      </p>
161<pre class="programlisting">^a{2,3}$</pre>
162<p>
163        Will match either of:
164      </p>
165<pre class="programlisting"><span class="identifier">aa</span>
166<span class="identifier">aaa</span>
167</pre>
168<p>
169        But neither of:
170      </p>
171<pre class="programlisting"><span class="identifier">a</span>
172<span class="identifier">aaaa</span>
173</pre>
174<p>
175        It is an error to use a repeat operator, if the preceding construct can not
176        be repeated, for example:
177      </p>
178<pre class="programlisting"><span class="identifier">a</span><span class="special">(*)</span>
179</pre>
180<p>
181        Will raise an error, as there is nothing for the <code class="computeroutput"><span class="special">*</span></code>
182        operator to be applied to.
183      </p>
184<h5>
185<a name="boost_regex.syntax.basic_extended.h6"></a>
186        <span class="phrase"><a name="boost_regex.syntax.basic_extended.back_references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.back_references">Back
187        references:</a>
188      </h5>
189<p>
190        An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span>
191        is in the range 1-9, matches the same string that was matched by sub-expression
192        <span class="emphasis"><em>n</em></span>. For example the expression:
193      </p>
194<pre class="programlisting">^(a*)[^a]*\1$</pre>
195<p>
196        Will match the string:
197      </p>
198<pre class="programlisting"><span class="identifier">aaabbaaa</span>
199</pre>
200<p>
201        But not the string:
202      </p>
203<pre class="programlisting"><span class="identifier">aaabba</span>
204</pre>
205<div class="caution"><table border="0" summary="Caution">
206<tr>
207<td rowspan="2" align="center" valign="top" width="25"><img alt="[Caution]" src="../../../../../../doc/src/images/caution.png"></td>
208<th align="left">Caution</th>
209</tr>
210<tr><td align="left" valign="top"><p>
211          The POSIX standard does not support back-references for "extended"
212          regular expressions, this is a compatible extension to that standard.
213        </p></td></tr>
214</table></div>
215<h5>
216<a name="boost_regex.syntax.basic_extended.h7"></a>
217        <span class="phrase"><a name="boost_regex.syntax.basic_extended.alternation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.alternation">Alternation</a>
218      </h5>
219<p>
220        The <code class="computeroutput"><span class="special">|</span></code> operator will match either
221        of its arguments, so for example: <code class="computeroutput"><span class="identifier">abc</span><span class="special">|</span><span class="identifier">def</span></code> will
222        match either "abc" or "def".
223      </p>
224<p>
225        Parenthesis can be used to group alternations, for example: <code class="computeroutput"><span class="identifier">ab</span><span class="special">(</span><span class="identifier">d</span><span class="special">|</span><span class="identifier">ef</span><span class="special">)</span></code>
226        will match either of "abd" or "abef".
227      </p>
228<h5>
229<a name="boost_regex.syntax.basic_extended.h8"></a>
230        <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_sets"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_sets">Character
231        sets:</a>
232      </h5>
233<p>
234        A character set is a bracket-expression starting with [ and ending with ],
235        it defines a set of characters, and matches any single character that is
236        a member of that set.
237      </p>
238<p>
239        A bracket expression may contain any combination of the following:
240      </p>
241<h6>
242<a name="boost_regex.syntax.basic_extended.h9"></a>
243        <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_characters"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_characters">Single
244        characters:</a>
245      </h6>
246<p>
247        For example <code class="computeroutput"><span class="special">[</span><span class="identifier">abc</span><span class="special">]</span></code>, will match any of the characters 'a', 'b',
248        or 'c'.
249      </p>
250<h6>
251<a name="boost_regex.syntax.basic_extended.h10"></a>
252        <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_ranges"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_ranges">Character
253        ranges:</a>
254      </h6>
255<p>
256        For example <code class="computeroutput"><span class="special">[</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code>
257        will match any single character in the range 'a' to 'c'. By default, for
258        POSIX-Extended regular expressions, a character <span class="emphasis"><em>x</em></span> is
259        within the range <span class="emphasis"><em>y</em></span> to <span class="emphasis"><em>z</em></span>, if it
260        collates within that range; this results in locale specific behavior . This
261        behavior can be turned off by unsetting the <code class="computeroutput"><span class="identifier">collate</span></code>
262        <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">option flag</a> - in
263        which case whether a character appears within a range is determined by comparing
264        the code points of the characters only.
265      </p>
266<h6>
267<a name="boost_regex.syntax.basic_extended.h11"></a>
268        <span class="phrase"><a name="boost_regex.syntax.basic_extended.negation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.negation">Negation:</a>
269      </h6>
270<p>
271        If the bracket-expression begins with the ^ character, then it matches the
272        complement of the characters it contains, for example <code class="computeroutput"><span class="special">[^</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> matches any character that is not in the
273        range <code class="computeroutput"><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span></code>.
274      </p>
275<h6>
276<a name="boost_regex.syntax.basic_extended.h12"></a>
277        <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_classes">Character
278        classes:</a>
279      </h6>
280<p>
281        An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code>
282        matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See
283        <a class="link" href="character_classes.html" title="Character Class Names">character class names</a>.
284      </p>
285<h6>
286<a name="boost_regex.syntax.basic_extended.h13"></a>
287        <span class="phrase"><a name="boost_regex.syntax.basic_extended.collating_elements"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.collating_elements">Collating
288        Elements:</a>
289      </h6>
290<p>
291        An expression of the form <code class="computeroutput"><span class="special">[[.</span><span class="identifier">col</span><span class="special">.]</span></code> matches
292        the collating element <span class="emphasis"><em>col</em></span>. A collating element is any
293        single character, or any sequence of characters that collates as a single
294        unit. Collating elements may also be used as the end point of a range, for
295        example: <code class="computeroutput"><span class="special">[[.</span><span class="identifier">ae</span><span class="special">.]-</span><span class="identifier">c</span><span class="special">]</span></code>
296        matches the character sequence "ae", plus any single character
297        in the range "ae"-c, assuming that "ae" is treated as
298        a single collating element in the current locale.
299      </p>
300<p>
301        Collating elements may be used in place of escapes (which are not normally
302        allowed inside character sets), for example <code class="computeroutput"><span class="special">[[.^.]</span><span class="identifier">abc</span><span class="special">]</span></code> would
303        match either one of the characters 'abc^'.
304      </p>
305<p>
306        As an extension, a collating element may also be specified via its <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>, for example:
307      </p>
308<pre class="programlisting"><span class="special">[[.</span><span class="identifier">NUL</span><span class="special">.]]</span>
309</pre>
310<p>
311        matches a NUL character.
312      </p>
313<h6>
314<a name="boost_regex.syntax.basic_extended.h14"></a>
315        <span class="phrase"><a name="boost_regex.syntax.basic_extended.equivalence_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.equivalence_classes">Equivalence
316        classes:</a>
317      </h6>
318<p>
319        An expression of the form <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>,
320        matches any character or collating element whose primary sort key is the
321        same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating
322        elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">symbolic
323        name</a>. A primary sort key is one that ignores case, accentation, or
324        locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches
325        any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation
326        of this is reliant on the platform's collation and localisation support;
327        this feature can not be relied upon to work portably across all platforms,
328        or even all locales on one platform.
329      </p>
330<h6>
331<a name="boost_regex.syntax.basic_extended.h15"></a>
332        <span class="phrase"><a name="boost_regex.syntax.basic_extended.combinations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.combinations">Combinations:</a>
333      </h6>
334<p>
335        All of the above can be combined in one character set declaration, for example:
336        <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]]</span></code>.
337      </p>
338<h5>
339<a name="boost_regex.syntax.basic_extended.h16"></a>
340        <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes">Escapes</a>
341      </h5>
342<p>
343        The POSIX standard defines no escape sequences for POSIX-Extended regular
344        expressions, except that:
345      </p>
346<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
347<li class="listitem">
348            Any special character preceded by an escape shall match itself.
349          </li>
350<li class="listitem">
351            The effect of any ordinary character being preceded by an escape is undefined.
352          </li>
353<li class="listitem">
354            An escape inside a character class declaration shall match itself: in
355            other words the escape character is not "special" inside a
356            character class declaration; so <code class="computeroutput"><span class="special">[\^]</span></code>
357            will match either a literal '\' or a '^'.
358          </li>
359</ul></div>
360<p>
361        However, that's rather restrictive, so the following standard-compatible
362        extensions are also supported by Boost.Regex:
363      </p>
364<h6>
365<a name="boost_regex.syntax.basic_extended.h17"></a>
366        <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes_matching_a_specific_char"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes_matching_a_specific_char">Escapes
367        matching a specific character</a>
368      </h6>
369<p>
370        The following escape sequences are all synonyms for single characters:
371      </p>
372<div class="informaltable"><table class="table">
373<colgroup>
374<col>
375<col>
376</colgroup>
377<thead><tr>
378<th>
379                <p>
380                  Escape
381                </p>
382              </th>
383<th>
384                <p>
385                  Character
386                </p>
387              </th>
388</tr></thead>
389<tbody>
390<tr>
391<td>
392                <p>
393                  \a
394                </p>
395              </td>
396<td>
397                <p>
398                  '\a'
399                </p>
400              </td>
401</tr>
402<tr>
403<td>
404                <p>
405                  \e
406                </p>
407              </td>
408<td>
409                <p>
410                  0x1B
411                </p>
412              </td>
413</tr>
414<tr>
415<td>
416                <p>
417                  \f
418                </p>
419              </td>
420<td>
421                <p>
422                  \f
423                </p>
424              </td>
425</tr>
426<tr>
427<td>
428                <p>
429                  \n
430                </p>
431              </td>
432<td>
433                <p>
434                  \n
435                </p>
436              </td>
437</tr>
438<tr>
439<td>
440                <p>
441                  \r
442                </p>
443              </td>
444<td>
445                <p>
446                  \r
447                </p>
448              </td>
449</tr>
450<tr>
451<td>
452                <p>
453                  \t
454                </p>
455              </td>
456<td>
457                <p>
458                  \t
459                </p>
460              </td>
461</tr>
462<tr>
463<td>
464                <p>
465                  \v
466                </p>
467              </td>
468<td>
469                <p>
470                  \v
471                </p>
472              </td>
473</tr>
474<tr>
475<td>
476                <p>
477                  \b
478                </p>
479              </td>
480<td>
481                <p>
482                  \b (but only inside a character class declaration).
483                </p>
484              </td>
485</tr>
486<tr>
487<td>
488                <p>
489                  \cX
490                </p>
491              </td>
492<td>
493                <p>
494                  An ASCII escape sequence - the character whose code point is X
495                  % 32
496                </p>
497              </td>
498</tr>
499<tr>
500<td>
501                <p>
502                  \xdd
503                </p>
504              </td>
505<td>
506                <p>
507                  A hexadecimal escape sequence - matches the single character whose
508                  code point is 0xdd.
509                </p>
510              </td>
511</tr>
512<tr>
513<td>
514                <p>
515                  \x{dddd}
516                </p>
517              </td>
518<td>
519                <p>
520                  A hexadecimal escape sequence - matches the single character whose
521                  code point is 0xdddd.
522                </p>
523              </td>
524</tr>
525<tr>
526<td>
527                <p>
528                  \0ddd
529                </p>
530              </td>
531<td>
532                <p>
533                  An octal escape sequence - matches the single character whose code
534                  point is 0ddd.
535                </p>
536              </td>
537</tr>
538<tr>
539<td>
540                <p>
541                  \N{Name}
542                </p>
543              </td>
544<td>
545                <p>
546                  Matches the single character which has the symbolic name <span class="emphasis"><em>Name</em></span>.
547                  For example <code class="computeroutput"><span class="special">\\</span><span class="identifier">N</span><span class="special">{</span><span class="identifier">newline</span><span class="special">}</span></code> matches the single character \n.
548                </p>
549              </td>
550</tr>
551</tbody>
552</table></div>
553<h6>
554<a name="boost_regex.syntax.basic_extended.h18"></a>
555        <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_character_character_class"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_character_character_class">"Single
556        character" character classes:</a>
557      </h6>
558<p>
559        Any escaped character <span class="emphasis"><em>x</em></span>, if <span class="emphasis"><em>x</em></span> is
560        the name of a character class shall match any character that is a member
561        of that class, and any escaped character <span class="emphasis"><em>X</em></span>, if <span class="emphasis"><em>x</em></span>
562        is the name of a character class, shall match any character not in that class.
563      </p>
564<p>
565        The following are supported by default:
566      </p>
567<div class="informaltable"><table class="table">
568<colgroup>
569<col>
570<col>
571</colgroup>
572<thead><tr>
573<th>
574                <p>
575                  Escape sequence
576                </p>
577              </th>
578<th>
579                <p>
580                  Equivalent to
581                </p>
582              </th>
583</tr></thead>
584<tbody>
585<tr>
586<td>
587                <p>
588                  <code class="computeroutput"><span class="special">\</span><span class="identifier">d</span></code>
589                </p>
590              </td>
591<td>
592                <p>
593                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]]</span></code>
594                </p>
595              </td>
596</tr>
597<tr>
598<td>
599                <p>
600                  <code class="computeroutput"><span class="special">\</span><span class="identifier">l</span></code>
601                </p>
602              </td>
603<td>
604                <p>
605                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code>
606                </p>
607              </td>
608</tr>
609<tr>
610<td>
611                <p>
612                  <code class="computeroutput"><span class="special">\</span><span class="identifier">s</span></code>
613                </p>
614              </td>
615<td>
616                <p>
617                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">space</span><span class="special">:]]</span></code>
618                </p>
619              </td>
620</tr>
621<tr>
622<td>
623                <p>
624                  <code class="computeroutput"><span class="special">\</span><span class="identifier">u</span></code>
625                </p>
626              </td>
627<td>
628                <p>
629                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">upper</span><span class="special">:]]</span></code>
630                </p>
631              </td>
632</tr>
633<tr>
634<td>
635                <p>
636                  <code class="computeroutput"><span class="special">\</span><span class="identifier">w</span></code>
637                </p>
638              </td>
639<td>
640                <p>
641                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">word</span><span class="special">:]]</span></code>
642                </p>
643              </td>
644</tr>
645<tr>
646<td>
647                <p>
648                  <code class="computeroutput"><span class="special">\</span><span class="identifier">D</span></code>
649                </p>
650              </td>
651<td>
652                <p>
653                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">digit</span><span class="special">:]]</span></code>
654                </p>
655              </td>
656</tr>
657<tr>
658<td>
659                <p>
660                  <code class="computeroutput"><span class="special">\</span><span class="identifier">L</span></code>
661                </p>
662              </td>
663<td>
664                <p>
665                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">lower</span><span class="special">:]]</span></code>
666                </p>
667              </td>
668</tr>
669<tr>
670<td>
671                <p>
672                  <code class="computeroutput"><span class="special">\</span><span class="identifier">S</span></code>
673                </p>
674              </td>
675<td>
676                <p>
677                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">space</span><span class="special">:]]</span></code>
678                </p>
679              </td>
680</tr>
681<tr>
682<td>
683                <p>
684                  <code class="computeroutput"><span class="special">\</span><span class="identifier">U</span></code>
685                </p>
686              </td>
687<td>
688                <p>
689                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">upper</span><span class="special">:]]</span></code>
690                </p>
691              </td>
692</tr>
693<tr>
694<td>
695                <p>
696                  <code class="computeroutput"><span class="special">\</span><span class="identifier">W</span></code>
697                </p>
698              </td>
699<td>
700                <p>
701                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">word</span><span class="special">:]]</span></code>
702                </p>
703              </td>
704</tr>
705</tbody>
706</table></div>
707<h6>
708<a name="boost_regex.syntax.basic_extended.h19"></a>
709        <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_properties"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_properties">Character
710        Properties</a>
711      </h6>
712<p>
713        The character property names in the following table are all equivalent to
714        the names used in character classes.
715      </p>
716<div class="informaltable"><table class="table">
717<colgroup>
718<col>
719<col>
720<col>
721</colgroup>
722<thead><tr>
723<th>
724                <p>
725                  Form
726                </p>
727              </th>
728<th>
729                <p>
730                  Description
731                </p>
732              </th>
733<th>
734                <p>
735                  Equivalent character set form
736                </p>
737              </th>
738</tr></thead>
739<tbody>
740<tr>
741<td>
742                <p>
743                  <code class="computeroutput"><span class="special">\</span><span class="identifier">pX</span></code>
744                </p>
745              </td>
746<td>
747                <p>
748                  Matches any character that has the property X.
749                </p>
750              </td>
751<td>
752                <p>
753                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">X</span><span class="special">:]]</span></code>
754                </p>
755              </td>
756</tr>
757<tr>
758<td>
759                <p>
760                  <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code>
761                </p>
762              </td>
763<td>
764                <p>
765                  Matches any character that has the property Name.
766                </p>
767              </td>
768<td>
769                <p>
770                  <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Name</span><span class="special">:]]</span></code>
771                </p>
772              </td>
773</tr>
774<tr>
775<td>
776                <p>
777                  <code class="computeroutput"><span class="special">\</span><span class="identifier">PX</span></code>
778                </p>
779              </td>
780<td>
781                <p>
782                  Matches any character that does not have the property X.
783                </p>
784              </td>
785<td>
786                <p>
787                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">X</span><span class="special">:]]</span></code>
788                </p>
789              </td>
790</tr>
791<tr>
792<td>
793                <p>
794                  <code class="computeroutput"><span class="special">\</span><span class="identifier">P</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code>
795                </p>
796              </td>
797<td>
798                <p>
799                  Matches any character that does not have the property Name.
800                </p>
801              </td>
802<td>
803                <p>
804                  <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">Name</span><span class="special">:]]</span></code>
805                </p>
806              </td>
807</tr>
808</tbody>
809</table></div>
810<p>
811        For example <code class="computeroutput"><span class="special">\</span><span class="identifier">pd</span></code>
812        matches any "digit" character, as does <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">digit</span><span class="special">}</span></code>.
813      </p>
814<h6>
815<a name="boost_regex.syntax.basic_extended.h20"></a>
816        <span class="phrase"><a name="boost_regex.syntax.basic_extended.word_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.word_boundaries">Word
817        Boundaries</a>
818      </h6>
819<p>
820        The following escape sequences match the boundaries of words:
821      </p>
822<div class="informaltable"><table class="table">
823<colgroup>
824<col>
825<col>
826</colgroup>
827<thead><tr>
828<th>
829                <p>
830                  Escape
831                </p>
832              </th>
833<th>
834                <p>
835                  Meaning
836                </p>
837              </th>
838</tr></thead>
839<tbody>
840<tr>
841<td>
842                <p>
843                  <code class="computeroutput"><span class="special">\&lt;</span></code>
844                </p>
845              </td>
846<td>
847                <p>
848                  Matches the start of a word.
849                </p>
850              </td>
851</tr>
852<tr>
853<td>
854                <p>
855                  <code class="computeroutput"><span class="special">\&gt;</span></code>
856                </p>
857              </td>
858<td>
859                <p>
860                  Matches the end of a word.
861                </p>
862              </td>
863</tr>
864<tr>
865<td>
866                <p>
867                  <code class="computeroutput"><span class="special">\</span><span class="identifier">b</span></code>
868                </p>
869              </td>
870<td>
871                <p>
872                  Matches a word boundary (the start or end of a word).
873                </p>
874              </td>
875</tr>
876<tr>
877<td>
878                <p>
879                  <code class="computeroutput"><span class="special">\</span><span class="identifier">B</span></code>
880                </p>
881              </td>
882<td>
883                <p>
884                  Matches only when not at a word boundary.
885                </p>
886              </td>
887</tr>
888</tbody>
889</table></div>
890<h6>
891<a name="boost_regex.syntax.basic_extended.h21"></a>
892        <span class="phrase"><a name="boost_regex.syntax.basic_extended.buffer_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.buffer_boundaries">Buffer
893        boundaries</a>
894      </h6>
895<p>
896        The following match only at buffer boundaries: a "buffer" in this
897        context is the whole of the input text that is being matched against (note
898        that ^ and $ may match embedded newlines within the text).
899      </p>
900<div class="informaltable"><table class="table">
901<colgroup>
902<col>
903<col>
904</colgroup>
905<thead><tr>
906<th>
907                <p>
908                  Escape
909                </p>
910              </th>
911<th>
912                <p>
913                  Meaning
914                </p>
915              </th>
916</tr></thead>
917<tbody>
918<tr>
919<td>
920                <p>
921                  \`
922                </p>
923              </td>
924<td>
925                <p>
926                  Matches at the start of a buffer only.
927                </p>
928              </td>
929</tr>
930<tr>
931<td>
932                <p>
933                  \'
934                </p>
935              </td>
936<td>
937                <p>
938                  Matches at the end of a buffer only.
939                </p>
940              </td>
941</tr>
942<tr>
943<td>
944                <p>
945                  <code class="computeroutput"><span class="special">\</span><span class="identifier">A</span></code>
946                </p>
947              </td>
948<td>
949                <p>
950                  Matches at the start of a buffer only (the same as \`).
951                </p>
952              </td>
953</tr>
954<tr>
955<td>
956                <p>
957                  <code class="computeroutput"><span class="special">\</span><span class="identifier">z</span></code>
958                </p>
959              </td>
960<td>
961                <p>
962                  Matches at the end of a buffer only (the same as \').
963                </p>
964              </td>
965</tr>
966<tr>
967<td>
968                <p>
969                  <code class="computeroutput"><span class="special">\</span><span class="identifier">Z</span></code>
970                </p>
971              </td>
972<td>
973                <p>
974                  Matches an optional sequence of newlines at the end of a buffer:
975                  equivalent to the regular expression <code class="computeroutput"><span class="special">\</span><span class="identifier">n</span><span class="special">*\</span><span class="identifier">z</span></code>
976                </p>
977              </td>
978</tr>
979</tbody>
980</table></div>
981<h6>
982<a name="boost_regex.syntax.basic_extended.h22"></a>
983        <span class="phrase"><a name="boost_regex.syntax.basic_extended.continuation_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.continuation_escape">Continuation
984        Escape</a>
985      </h6>
986<p>
987        The sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">G</span></code>
988        matches only at the end of the last match found, or at the start of the text
989        being matched if no previous match was found. This escape useful if you're
990        iterating over the matches contained within a text, and you want each subsequence
991        match to start where the last one ended.
992      </p>
993<h6>
994<a name="boost_regex.syntax.basic_extended.h23"></a>
995        <span class="phrase"><a name="boost_regex.syntax.basic_extended.quoting_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.quoting_escape">Quoting
996        escape</a>
997      </h6>
998<p>
999        The escape sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span></code>
1000        begins a "quoted sequence": all the subsequent characters are treated
1001        as literals, until either the end of the regular expression or <code class="computeroutput"><span class="special">\</span><span class="identifier">E</span></code> is found.
1002        For example the expression: <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span><span class="special">\*+\</span><span class="identifier">Ea</span><span class="special">+</span></code> would match either of:
1003      </p>
1004<pre class="programlisting"><span class="special">\*+</span><span class="identifier">a</span>
1005<span class="special">\*+</span><span class="identifier">aaa</span>
1006</pre>
1007<h6>
1008<a name="boost_regex.syntax.basic_extended.h24"></a>
1009        <span class="phrase"><a name="boost_regex.syntax.basic_extended.unicode_escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.unicode_escapes">Unicode
1010        escapes</a>
1011      </h6>
1012<div class="informaltable"><table class="table">
1013<colgroup>
1014<col>
1015<col>
1016</colgroup>
1017<thead><tr>
1018<th>
1019                <p>
1020                  Escape
1021                </p>
1022              </th>
1023<th>
1024                <p>
1025                  Meaning
1026                </p>
1027              </th>
1028</tr></thead>
1029<tbody>
1030<tr>
1031<td>
1032                <p>
1033                  <code class="computeroutput"><span class="special">\</span><span class="identifier">C</span></code>
1034                </p>
1035              </td>
1036<td>
1037                <p>
1038                  Matches a single code point: in Boost regex this has exactly the
1039                  same effect as a "." operator.
1040                </p>
1041              </td>
1042</tr>
1043<tr>
1044<td>
1045                <p>
1046                  <code class="computeroutput"><span class="special">\</span><span class="identifier">X</span></code>
1047                </p>
1048              </td>
1049<td>
1050                <p>
1051                  Matches a combining character sequence: that is any non-combining
1052                  character followed by a sequence of zero or more combining characters.
1053                </p>
1054              </td>
1055</tr>
1056</tbody>
1057</table></div>
1058<h6>
1059<a name="boost_regex.syntax.basic_extended.h25"></a>
1060        <span class="phrase"><a name="boost_regex.syntax.basic_extended.any_other_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.any_other_escape">Any
1061        other escape</a>
1062      </h6>
1063<p>
1064        Any other escape sequence matches the character that is escaped, for example
1065        \@ matches a literal '@'.
1066      </p>
1067<h5>
1068<a name="boost_regex.syntax.basic_extended.h26"></a>
1069        <span class="phrase"><a name="boost_regex.syntax.basic_extended.operator_precedence"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.operator_precedence">Operator
1070        precedence</a>
1071      </h5>
1072<p>
1073        The order of precedence for of operators is as follows:
1074      </p>
1075<div class="orderedlist"><ol class="orderedlist" type="1">
1076<li class="listitem">
1077            Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span>
1078            <span class="special">[::]</span> <span class="special">[..]</span></code>
1079          </li>
1080<li class="listitem">
1081            Escaped characters <code class="computeroutput"><span class="special">\</span></code>
1082          </li>
1083<li class="listitem">
1084            Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code>
1085          </li>
1086<li class="listitem">
1087            Grouping <code class="computeroutput"><span class="special">()</span></code>
1088          </li>
1089<li class="listitem">
1090            Single-character-ERE duplication <code class="computeroutput"><span class="special">*</span>
1091            <span class="special">+</span> <span class="special">?</span>
1092            <span class="special">{</span><span class="identifier">m</span><span class="special">,</span><span class="identifier">n</span><span class="special">}</span></code>
1093          </li>
1094<li class="listitem">
1095            Concatenation
1096          </li>
1097<li class="listitem">
1098            Anchoring ^$
1099          </li>
1100<li class="listitem">
1101            Alternation <code class="computeroutput"><span class="special">|</span></code>
1102          </li>
1103</ol></div>
1104<h5>
1105<a name="boost_regex.syntax.basic_extended.h27"></a>
1106        <span class="phrase"><a name="boost_regex.syntax.basic_extended.what_gets_matched"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.what_gets_matched">What
1107        Gets Matched</a>
1108      </h5>
1109<p>
1110        When there is more that one way to match a regular expression, the "best"
1111        possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest
1112        rule</a>.
1113      </p>
1114<h4>
1115<a name="boost_regex.syntax.basic_extended.h28"></a>
1116        <span class="phrase"><a name="boost_regex.syntax.basic_extended.variations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.variations">Variations</a>
1117      </h4>
1118<h5>
1119<a name="boost_regex.syntax.basic_extended.h29"></a>
1120        <span class="phrase"><a name="boost_regex.syntax.basic_extended.egrep"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.egrep">Egrep</a>
1121      </h5>
1122<p>
1123        When an expression is compiled with the <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">flag
1124        <code class="computeroutput"><span class="identifier">egrep</span></code></a> set, then the
1125        expression is treated as a newline separated list of <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended
1126        expressions</a>, a match is found if any of the expressions in the list
1127        match, for example:
1128      </p>
1129<pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">egrep</span><span class="special">);</span>
1130</pre>
1131<p>
1132        will match either of the POSIX-Basic expressions "abc" or "def".
1133      </p>
1134<p>
1135        As its name suggests, this behavior is consistent with the Unix utility
1136        <code class="computeroutput"><span class="identifier">egrep</span></code>, and with grep when
1137        used with the -E option.
1138      </p>
1139<h5>
1140<a name="boost_regex.syntax.basic_extended.h30"></a>
1141        <span class="phrase"><a name="boost_regex.syntax.basic_extended.awk"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.awk">awk</a>
1142      </h5>
1143<p>
1144        In addition to the <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended
1145        features</a> the escape character is special inside a character class
1146        declaration.
1147      </p>
1148<p>
1149        In addition, some escape sequences that are not defined as part of POSIX-Extended
1150        specification are required to be supported - however Boost.Regex supports
1151        these by default anyway.
1152      </p>
1153<h4>
1154<a name="boost_regex.syntax.basic_extended.h31"></a>
1155        <span class="phrase"><a name="boost_regex.syntax.basic_extended.options"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.options">Options</a>
1156      </h4>
1157<p>
1158        There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions">variety
1159        of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">extended</span></code>
1160        and <code class="computeroutput"><span class="identifier">egrep</span></code> options when constructing
1161        the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code></a> option alters the syntax,
1162        while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code>, <code class="computeroutput"><span class="identifier">nosubs</span></code>
1163        and <code class="computeroutput"><span class="identifier">icase</span></code> options</a>
1164        modify how the case and locale sensitivity are to be applied.
1165      </p>
1166<h4>
1167<a name="boost_regex.syntax.basic_extended.h32"></a>
1168        <span class="phrase"><a name="boost_regex.syntax.basic_extended.references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.references">References</a>
1169      </h4>
1170<p>
1171        <a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE
1172        Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions
1173        and Headers, Section 9, Regular Expressions.</a>
1174      </p>
1175<p>
1176        <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target="_top">IEEE
1177        Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
1178        Utilities, Section 4, Utilities, egrep.</a>
1179      </p>
1180<p>
1181        <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/awk.html" target="_top">IEEE
1182        Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
1183        Utilities, Section 4, Utilities, awk.</a>
1184      </p>
1185</div>
1186<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
1187<td align="left"></td>
1188<td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p>
1189        Distributed under the Boost Software License, Version 1.0. (See accompanying
1190        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
1191      </p>
1192</div></td>
1193</tr></table>
1194<hr>
1195<div class="spirit-nav">
1196<a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
1197</div>
1198</body>
1199</html>
1200