1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section:lexer Supported Regular Expressions] 10 11[table Regular expressions support 12 [[Expression] [Meaning]] 13 [[`x`] [Match any character `x`]] 14 [[`.`] [Match any except newline (or optionally *any* character)]] 15 [[`"..."`] [All characters taken as literals between double quotes, except escape sequences]] 16 [[`[xyz]`] [A character class; in this case matches `x`, `y` or `z`]] 17 [[`[abj-oZ]`] [A character class with a range in it; matches `a`, `b` any 18 letter from `j` through `o` or a `Z`]] 19 [[`[^A-Z]`] [A negated character class i.e. any character but those in 20 the class. In this case, any character except an uppercase 21 letter]] 22 [[`r*`] [Zero or more r's (greedy), where r is any regular expression]] 23 [[`r*?`] [Zero or more r's (abstemious), where r is any regular expression]] 24 [[`r+`] [One or more r's (greedy)]] 25 [[`r+?`] [One or more r's (abstemious)]] 26 [[`r?`] [Zero or one r's (greedy), i.e. optional]] 27 [[`r??`] [Zero or one r's (abstemious), i.e. optional]] 28 [[`r{2,5}`] [Anywhere between two and five r's (greedy)]] 29 [[`r{2,5}?`] [Anywhere between two and five r's (abstemious)]] 30 [[`r{2,}`] [Two or more r's (greedy)]] 31 [[`r{2,}?`] [Two or more r's (abstemious)]] 32 [[`r{4}`] [Exactly four r's]] 33 [[`{NAME}`] [The macro `NAME` (see below)]] 34 [[`"[xyz]\"foo"`] [The literal string `[xyz]\"foo`]] 35 [[`\X`] [If X is `a`, `b`, `e`, `n`, `r`, `f`, `t`, `v` then the 36 ANSI-C interpretation of `\x`. Otherwise a literal `X` 37 (used to escape operators such as `*`)]] 38 [[`\0`] [A NUL character (ASCII code 0)]] 39 [[`\123`] [The character with octal value 123]] 40 [[`\x2a`] [The character with hexadecimal value 2a]] 41 [[`\cX`] [A named control character `X`.]] 42 [[`\a`] [A shortcut for Alert (bell).]] 43 [[`\b`] [A shortcut for Backspace]] 44 [[`\e`] [A shortcut for ESC (escape character `0x1b`)]] 45 [[`\n`] [A shortcut for newline]] 46 [[`\r`] [A shortcut for carriage return]] 47 [[`\f`] [A shortcut for form feed `0x0c`]] 48 [[`\t`] [A shortcut for horizontal tab `0x09`]] 49 [[`\v`] [A shortcut for vertical tab `0x0b`]] 50 [[`\d`] [A shortcut for `[0-9]`]] 51 [[`\D`] [A shortcut for `[^0-9]`]] 52 [[`\s`] [A shortcut for `[\x20\t\n\r\f\v]`]] 53 [[`\S`] [A shortcut for `[^\x20\t\n\r\f\v]`]] 54 [[`\w`] [A shortcut for `[a-zA-Z0-9_]`]] 55 [[`\W`] [A shortcut for `[^a-zA-Z0-9_]`]] 56 [[`(r)`] [Match an `r`; parenthesis are used to override precedence 57 (see below)]] 58 [[`(?r-s:pattern)`] [apply option 'r' and omit option 's' while interpreting pattern. 59Options may be zero or more of the characters 'i' or 's'. 60'i' means case-insensitive. '-i' means case-sensitive. 61's' alters the meaning of the '.' syntax to match any single character whatsoever. 62'-s' alters the meaning of '.' to match any character except '`\n`'.]] 63 [[`rs`] [The regular expression `r` followed by the regular 64 expression `s` (a sequence)]] 65 [[`r|s`] [Either an `r` or and `s`]] 66 [[`^r`] [An `r` but only at the beginning of a line (i.e. when just 67 starting to scan, or right after a newline has been 68 scanned)]] 69 [[`r`$] [An `r` but only at the end of a line (i.e. just before a 70 newline)]] 71] 72 73[note POSIX character classes are not currently supported, due to performance issues 74when creating them in wide character mode.] 75 76[tip If you want to build tokens for syntaxes that recognize items like quotes 77 (`"'"`, `'"'`) and backslash (`\`), here is example syntax to get you started. 78 The lesson here really is to remember that both c++, as well as regular 79 expressions require escaping with `\` for some constructs, which can 80 cascade. 81 `` 82 quote1 = "'"; // match single "'" 83 quote2 = "\\\""; // match single '"' 84 literal_quote1 = "\\'"; // match backslash followed by single "'" 85 literal_quote2 = "\\\\\\\""; // match backslash followed by single '"' 86 literal_backslash = "\\\\\\\\"; // match two backslashes 87 `` 88] 89 90[heading Regular Expression Precedence] 91 92* `rs` has highest precedence 93* `r*` has next highest (`+`, `?`, `{n,m}` have the same precedence as `*`) 94* `r|s` has the lowest precedence 95 96[heading Macros] 97 98Regular expressions can be given a name and referred to in rules using the 99syntax `{NAME}` where `NAME` is the name you have given to the macro. A macro 100name can be at most 30 characters long and must start with a `_` or a letter. 101Subsequent characters can be `_`, `-`, a letter or a decimal digit. 102 103[endsect] 104 105