• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/==============================================================================
2    Copyright (C) 2001-2011 Joel de Guzman
3    Copyright (C) 2001-2011 Hartmut Kaiser
4
5    Distributed under the Boost Software License, Version 1.0. (See accompanying
6    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8[section:char Character Parsers]
9
10This module includes parsers for single characters. Currently, this
11module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
12characters, ranges and character sets) and the encoding specific
13character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).
14
15[heading Module Header]
16
17    // forwards to <boost/spirit/home/qi/char.hpp>
18    #include <boost/spirit/include/qi_char.hpp>
19
20Also, see __include_structure__.
21
22[/------------------------------------------------------------------------------]
23[section:char Character Parser (`char_`, `lit`)]
24
25[heading Description]
26
27The `char_` parser matches single characters. The `char_` parser has an
28associated __char_encoding_namespace__. This is needed when doing basic
29operations such as inhibiting case sensitivity and dealing with
30character ranges.
31
32There are various forms of `char_`.
33
34[heading char_]
35
36The no argument form of `char_` matches any character in the associated
37__char_encoding_namespace__.
38
39    char_               // matches any character
40
41[heading char_(ch)]
42
43The single argument form of `char_` (with a character argument) matches
44the supplied character.
45
46    char_('x')          // matches 'x'
47    char_(L'x')         // matches L'x'
48    char_(x)            // matches x (a char)
49
50[heading char_(first, last)]
51
52`char_` with two arguments, matches a range of characters.
53
54    char_('a','z')      // alphabetic characters
55    char_(L'0',L'9')    // digits
56
57A range of characters is created from a low-high character pair. Such a
58parser matches a single character that is in the range, including both
59endpoints. Note, the first character must be /before/ the second,
60according to the underlying __char_encoding_namespace__.
61
62Character mapping is inherently platform dependent. It is not guaranteed
63in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
64purposely attach a specific __char_encoding_namespace__ (such as ASCII,
65ISO-8859-1) to the `char_` parser to eliminate such ambiguities.
66
67[note *Sparse bit vectors*
68
69To accommodate 16/32 and 64 bit characters, the char-set statically
70switches from a `std::bitset` implementation when the character type is
71not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
72vector of disjoint ranges (`range_run`). The set is constructed from
73ranges such that adjacent or overlapping ranges are coalesced.
74
75`range_runs` are very space-economical in situations where there are lots
76of ranges and a few individual disjoint values. Searching is O(log n)
77where n is the number of ranges.]
78
79[heading char_(def)]
80
81Lastly, when given a string (a plain C string, a `std::basic_string`,
82etc.), the string is regarded as a char-set definition string following
83a syntax that resembles posix style regular expression character sets
84(except that double quotes delimit the set elements instead of square
85brackets and there is no special negation ^ character). Examples:
86
87    char_("a-zA-Z")     // alphabetic characters
88    char_("0-9a-fA-F")  // hexadecimal characters
89    char_("actgACTG")   // DNA identifiers
90    char_("\x7f\x7e")   // Hexadecimal 0x7F and 0x7E
91
92[heading lit(ch)]
93
94`lit`, when passed a single character, behaves like the single argument
95`char_` except that `lit` does not synthesize an attribute. A plain
96`char` or `wchar_t` is equivalent to a `lit`.
97
98[note `lit` is reused by both the [qi_lit_string string parsers] and the
99char parsers. In general, a char parser is created when you pass in a
100character and a string parser is created when you pass in a string. The
101exception is when you pass a single element literal string, e.g.
102`lit("x")`. In this case, we optimize this to create a char parser
103instead of a string parser.]
104
105Examples:
106
107    'x'
108    lit('x')
109    lit(L'x')
110    lit(c) // c is a char
111
112[heading Header]
113
114    // forwards to <boost/spirit/home/qi/char/char.hpp>
115    #include <boost/spirit/include/qi_char_.hpp>
116
117Also, see __include_structure__.
118
119[heading Namespace]
120
121[table
122    [[Name]]
123    [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
124    [[`ns::char_`]]
125]
126
127In the table above, `ns` represents a __char_encoding_namespace__.
128
129[heading Model of]
130
131[:__primitive_parser_concept__]
132
133[variablelist Notation
134    [[`c`, `f`, `l`]    [A literal char, e.g. `'x'`, `L'x'` or anything that can be
135                        converted to a `char` or `wchar_t`, or a __qi_lazy_argument__
136                        that evaluates to anything that can be converted to a `char`
137                        or `wchar_t`.]]
138    [[`ns`]             [A __char_encoding_namespace__.]]
139    [[`cs`]             [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
140                        that specifies a char-set definition string following a syntax
141                        that resembles posix style regular expression character sets
142                        (except the square brackets and the negation `^` character).]]
143    [[`cp`]             [A char parser, a char range parser or a char set parser.]]
144]
145
146[heading Expression Semantics]
147
148Semantics of an expression is defined only where it differs from, or is
149not defined in __primitive_parser_concept__.
150
151[table
152    [[Expression]       [Semantics]]
153    [[`c`]              [Create char parser from a char, `c`.]]
154    [[`lit(c)`]         [Create a char parser from a char, `c`.]]
155    [[`ns::char_`]      [Create a char parser that matches any character in the
156                        `ns` encoding.]]
157    [[`ns::char_(c)`]   [Create a char parser with `ns` encoding from a char, `c`.]]
158    [[`ns::char_(f, l)`][Create a char-range parser that matches characters from
159                        range (`f` to `l`, inclusive) with `ns` encoding.]]
160    [[`ns::char_(cs)`]  [Create a char-set parser with `ns` encoding from a char-set
161                        definition string, `cs`.]]
162    [[`~cp`]            [Negate `cp`. The result is a negated char parser that
163                        matches any character in the `ns` encoding except the
164                        characters matched by `cp`.]]
165]
166
167[heading Attributes]
168
169[table
170    [[Expression]       [Attribute]]
171    [[`c`]              [__unused__ or if `c` is a __qi_lazy_argument__, the character
172                        type returned by invoking it.]]
173    [[`lit(c)`]         [__unused__ or if `c` is a __qi_lazy_argument__, the character
174                        type returned by invoking it.]]
175    [[`ns::char_`]      [The character type of the __char_encoding_namespace__, `ns`.]]
176    [[`ns::char_(c)`]   [The character type of the __char_encoding_namespace__, `ns`.]]
177    [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
178    [[`ns::char_(cs)`]  [The character type of the __char_encoding_namespace__, `ns`.]]
179    [[`~cp`]            [The attribute of `cp`.]]
180]
181
182[heading Complexity]
183
184[:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
185`wchar_t`). These have *O(log N)* complexity, where N is the number of
186distinct character ranges in the set.]
187
188[heading Example]
189
190[note The test harness for the example(s) below is presented in the
191__qi_basics_examples__ section.]
192
193Some using declarations:
194
195[reference_using_declarations_lit_char]
196
197Basic literals:
198
199[reference_char_literals]
200
201Range:
202
203[reference_char_range]
204
205Character set:
206
207[reference_char_set]
208
209Lazy char_ using __phoenix__
210
211[reference_char_phoenix]
212
213[endsect] [/ Char]
214
215[/------------------------------------------------------------------------------]
216[section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]
217
218[heading Description]
219
220The library has the full repertoire of single character parsers for
221character classification. This includes the usual `alnum`, `alpha`,
222`digit`, `xdigit`, etc. parsers. These parsers have an associated
223__char_encoding_namespace__. This is needed when doing basic operations
224such as inhibiting case sensitivity.
225
226[heading Header]
227
228    // forwards to <boost/spirit/home/qi/char/char_class.hpp>
229    #include <boost/spirit/include/qi_char_class.hpp>
230
231Also, see __include_structure__.
232
233[heading Namespace]
234
235[table
236    [[Name]]
237    [[`ns::alnum`]]
238    [[`ns::alpha`]]
239    [[`ns::blank`]]
240    [[`ns::cntrl`]]
241    [[`ns::digit`]]
242    [[`ns::graph`]]
243    [[`ns::lower`]]
244    [[`ns::print`]]
245    [[`ns::punct`]]
246    [[`ns::space`]]
247    [[`ns::upper`]]
248    [[`ns::xdigit`]]
249]
250
251In the table above, `ns` represents a __char_encoding_namespace__.
252
253[heading Model of]
254
255[:__primitive_parser_concept__]
256
257[variablelist Notation
258    [[`ns`]             [A __char_encoding_namespace__.]]
259]
260
261[heading Expression Semantics]
262
263Semantics of an expression is defined only where it differs from, or is
264not defined in __primitive_parser_concept__.
265
266[table
267    [[Expression]       [Semantics]]
268    [[`ns::alnum`]      [Matches alpha-numeric characters]]
269    [[`ns::alpha`]      [Matches alphabetic characters]]
270    [[`ns::blank`]      [Matches spaces or tabs]]
271    [[`ns::cntrl`]      [Matches control characters]]
272    [[`ns::digit`]      [Matches numeric digits]]
273    [[`ns::graph`]      [Matches non-space printing characters]]
274    [[`ns::lower`]      [Matches lower case letters]]
275    [[`ns::print`]      [Matches printable characters]]
276    [[`ns::punct`]      [Matches punctuation symbols]]
277    [[`ns::space`]      [Matches spaces, tabs, returns, and newlines]]
278    [[`ns::upper`]      [Matches upper case letters]]
279    [[`ns::xdigit`]     [Matches hexadecimal digits]]
280]
281
282[heading Attributes]
283
284[:The character type of the __char_encoding_namespace__, `ns`.]
285
286[heading Complexity]
287
288[:O(N)]
289
290[heading Example]
291
292[note The test harness for the example(s) below is presented in the
293__qi_basics_examples__ section.]
294
295Some using declarations:
296
297[reference_using_declarations_char_class]
298
299Basic usage:
300
301[reference_char_class]
302
303[endsect] [/ Char Classification]
304
305[endsect]
306