• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/==============================================================================
2    Copyright (C) 2001-2011 Joel de Guzman
3    Copyright (C) 2001-2011 Hartmut Kaiser
4
5    Distributed under the Boost Software License, Version 1.0. (See accompanying
6    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8
9[section:lexer Supported Regular Expressions]
10
11[table Regular expressions support
12    [[Expression]   [Meaning]]
13    [[`x`]          [Match any character `x`]]
14    [[`.`]          [Match any except newline (or optionally *any* character)]]
15    [[`"..."`]      [All characters taken as literals between double quotes, except escape sequences]]
16    [[`[xyz]`]      [A character class; in this case matches `x`, `y` or `z`]]
17    [[`[abj-oZ]`]   [A character class with a range in it; matches `a`, `b` any
18                     letter from `j` through `o` or a `Z`]]
19    [[`[^A-Z]`]     [A negated character class i.e. any character but those in
20                     the class. In this case, any character except an uppercase
21                     letter]]
22    [[`r*`]         [Zero or more r's (greedy), where r is any regular expression]]
23    [[`r*?`]        [Zero or more r's (abstemious), where r is any regular expression]]
24    [[`r+`]         [One or more r's (greedy)]]
25    [[`r+?`]        [One or more r's (abstemious)]]
26    [[`r?`]         [Zero or one r's (greedy), i.e. optional]]
27    [[`r??`]        [Zero or one r's (abstemious), i.e. optional]]
28    [[`r{2,5}`]     [Anywhere between two and five r's (greedy)]]
29    [[`r{2,5}?`]    [Anywhere between two and five r's (abstemious)]]
30    [[`r{2,}`]      [Two or more r's (greedy)]]
31    [[`r{2,}?`]     [Two or more r's (abstemious)]]
32    [[`r{4}`]       [Exactly four r's]]
33    [[`{NAME}`]     [The macro `NAME` (see below)]]
34    [[`"[xyz]\"foo"`]  [The literal string `[xyz]\"foo`]]
35    [[`\X`]         [If X is `a`, `b`, `e`, `n`, `r`, `f`, `t`, `v` then the
36                     ANSI-C interpretation of `\x`. Otherwise a literal `X`
37                     (used to escape operators such as `*`)]]
38    [[`\0`]         [A NUL character (ASCII code 0)]]
39    [[`\123`]       [The character with octal value 123]]
40    [[`\x2a`]       [The character with hexadecimal value 2a]]
41    [[`\cX`]        [A named control character `X`.]]
42    [[`\a`]         [A shortcut for Alert (bell).]]
43    [[`\b`]         [A shortcut for Backspace]]
44    [[`\e`]         [A shortcut for ESC (escape character `0x1b`)]]
45    [[`\n`]         [A shortcut for newline]]
46    [[`\r`]         [A shortcut for carriage return]]
47    [[`\f`]         [A shortcut for form feed `0x0c`]]
48    [[`\t`]         [A shortcut for horizontal tab `0x09`]]
49    [[`\v`]         [A shortcut for vertical tab `0x0b`]]
50    [[`\d`]         [A shortcut for `[0-9]`]]
51    [[`\D`]         [A shortcut for `[^0-9]`]]
52    [[`\s`]         [A shortcut for `[\x20\t\n\r\f\v]`]]
53    [[`\S`]         [A shortcut for `[^\x20\t\n\r\f\v]`]]
54    [[`\w`]         [A shortcut for `[a-zA-Z0-9_]`]]
55    [[`\W`]         [A shortcut for `[^a-zA-Z0-9_]`]]
56    [[`(r)`]        [Match an `r`; parenthesis are used to override precedence
57                     (see below)]]
58    [[`(?r-s:pattern)`] [apply option 'r' and omit option 's' while interpreting pattern.
59Options may be zero or more of the characters 'i' or 's'.
60'i' means case-insensitive. '-i' means case-sensitive.
61's' alters the meaning of the '.' syntax to match any single character whatsoever.
62'-s' alters the meaning of '.' to match any character except '`\n`'.]]
63    [[`rs`]         [The regular expression `r` followed by the regular
64                     expression `s` (a sequence)]]
65    [[`r|s`]        [Either an `r` or and `s`]]
66    [[`^r`]         [An `r` but only at the beginning of a line (i.e. when just
67                     starting to scan, or right after a newline has been
68                     scanned)]]
69    [[`r`$]         [An `r` but only at the end of a line (i.e. just before a
70                     newline)]]
71]
72
73[note POSIX character classes are not currently supported, due to performance issues
74when creating them in wide character mode.]
75
76[tip  If you want to build tokens for syntaxes that recognize items like quotes
77      (`"'"`, `'"'`) and backslash (`\`), here is example syntax to get you started.
78      The lesson here really is to remember that both c++, as well as regular
79      expressions require escaping with `\` for some constructs, which can
80      cascade.
81      ``
82          quote1         = "'";            // match single "'"
83          quote2         = "\\\"";         // match single '"'
84          literal_quote1 = "\\'";          // match backslash followed by single "'"
85          literal_quote2 = "\\\\\\\"";     // match backslash followed by single '"'
86          literal_backslash = "\\\\\\\\";  // match two backslashes
87      ``
88]
89
90[heading Regular Expression Precedence]
91
92* `rs` has highest precedence
93* `r*` has next highest (`+`, `?`, `{n,m}` have the same precedence as `*`)
94* `r|s` has the lowest precedence
95
96[heading Macros]
97
98Regular expressions can be given a name and referred to in rules using the
99syntax `{NAME}` where `NAME` is the name you have given to the macro.  A macro
100name can be at most 30 characters long and must start with a `_` or a letter.
101Subsequent characters can be `_`, `-`, a letter or a decimal digit.
102
103[endsect]
104
105