• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[#manual]
2[section User manual]
3
4[section What is a parser]
5
6See the [link parser parser] section of the [link reference reference] for the
7explanation of what a parser is.
8
9[section The input of the parsers]
10
11Parsers take a [link string `string`] as input, which represents a string
12for template metaprograms. For example the string `"Hello World!"` can be
13defined the following way:
14
15  string<'H','e','l','l','o',' ','W','o','r','l','d','!'>
16
17This syntax makes the input of the parsers difficult to read. Metaparse works
18with compilers using C++98, but the input of the parsers has to be defined the
19way it is described above.
20
21Based on `constexpr`, a feature provided by C++11, Metaparse provides a macro,
22[link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`] for defining strings:
23
24  BOOST_METAPARSE_STRING("Hello World!")
25
26This defines a [link string `string`] as well, however, it is easier to
27read. The maximum length of the string that can be defined this way is limited,
28however, this limit is configurable. It is specified by the
29`BOOST_METAPARSE_LIMIT_STRING_SIZE` macro.
30
31[endsect]
32
33[section Source positions]
34
35A source position is described using a compile-time data structure. The
36following functions can be used to query it:
37
38* [link get_col `get_col`]
39* [link get_line `get_line`]
40
41The beginning of the input is [link start `start`] which requires
42`<boost/metaparse/start.hpp>` to be included.
43
44[endsect]
45
46[section Error handling]
47
48An error is described using a compile-time data structure. It contains
49information about the source position where the error was detected and some
50[link parsing_error_message description] about the error.
51[link debug_parsing_error `debug_parsing_error`] can be used to display the
52error message. Metaparse provides the
53[link BOOST_METAPARSE_DEFINE_ERROR `BOOST_METAPARSE_DEFINE_ERROR`] macro for
54defining simple [link parsing_error_message parsing error message]s.
55
56[endsect]
57
58[section Some examples of simple parsers]
59
60* A parser that parses nothing and always succeeds is
61  [link return_ `return_`].
62* A parser that always fails is [link fail `fail`].
63* A parser that parses one character and returns the parsed character as the
64  result is [link one_char `one_char`].
65
66[endsect]
67
68[section Combining parsers]
69
70Complex parsers can be built by combining simple parsers. The parser library
71contains a number of parser combinators that build new parsers from already
72existing ones.
73
74For example
75[link accept_when `accept_when`]`<Parser, Predicate, RejectErrorMsg>` is a
76parser. It uses `Parser` to parse the input. When `Parser` rejects the input,
77the combinator returns the error `Parser` failed with. When `Parser` is
78successful, the combinator validates the result using `Predicate`. If the
79predicate returns true, the combinator accepts the input, otherwise it generates
80an error with the message `RejectErrorMsg`.
81
82Having [link accept_when `accept_when`], [link one_char `one_char`] can be
83used to build parsers that accept only digit characters, only whitespaces, etc.
84For example [link digit `digit`] accepts only digit characters:
85
86  typedef
87    boost::metaparse::accept_when<
88      boost::metaparse::one_char,
89      boost::metaparse::util::is_digit,
90      boost::metaparse::errors::digit_expected
91    >
92    digit;
93
94[endsect]
95
96[section Sequence]
97
98The result of a successful parsing is some value and the remaining string that
99was not parsed. The remaining string can be processed by another parser. The
100parser library provides a parser combinator, [link sequence `sequence`],
101that takes a number of parsers as arguments and builds a new parser from them
102that:
103
104* Parses the input using the first parser
105* If parsing succeeds, it parses the remaining string with the second parser
106* It continues applying the parsers in order as long as they succeed
107* If all of them succeed, it returns the list of results
108* If any of the parsers fails, the combinator fails as well and returns the
109  error the first failing parser returned with
110
111[endsect]
112
113[#repetition]
114[section Repetition]
115
116It is a common thing to parse a list of things of unknown length. As an example
117let's start with something simple: the text is a list of numbers. For example:
118
119  11 13 3 21
120
121We want the result of parsing to be the sum of these values. Metaparse provides
122the [link int_ `int_`] parser we can use to parse one of these numbers.
123Metaparse provides the [link token `token`] combinator to consume the
124whitespaces after the number. So the following parser parses one number and the
125whitespaces after it:
126
127  using int_token = token<int_>;
128
129The result of parsing is a boxed integer value: the value of the parsed number.
130For example parsing
131[link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`]`("13  ")` gives
132`boost::mpl::int_<13>` as the result.
133
134Our example input is a list of numbers. Each number can be parsed by
135`int_token`:
136
137[$images/metaparse/repeated_diag0.png [width 70%]]
138
139This diagram shows how the repeated application of `int_token` can parse the
140example input. Metaparse provides the [link repeated `repeated`] parser to
141easily implement this. The result of parsing is a typelist: the list of the
142individual numbers.
143
144[$images/metaparse/repeated_diag1.png [width 70%]]
145
146This diagram shows how [link repeated `repeated`]`<int_token>` works. It uses
147the `int_token` parser repeatedly and builds a `boost::mpl::vector` from the
148results it provides.
149
150But we need the sum of these, so we need to summarise the result. We can do this
151by wrapping our parser, [link repeated `repeated`]`<int_token>` with
152[link transform `transform`]. That gives us the opportunity to specify a
153function transforming this typelist to some other value - the sum of the
154elements in our case. Initially let's ignore how to summarise the elements in
155the vector. Let's assume that it can be implemented by a lambda expression and
156use `boost::mpl::lambda<...>::type` representing that lambda expression. Here is
157an example using [link transform `transform`] and this lambda expression:
158
159  using sum_parser =
160    transform<
161      repeated<int_token>,
162      boost::mpl::lambda<...>::type
163    >;
164
165The  [link transform `transform`]`<>` parser combinator wraps the
166[link repeated `repeated`]`<int_token>` to build the parser we need. Here is a
167diagram showing how it works:
168
169[$images/metaparse/repeated_diag2.png [width 70%]]
170
171As the diagram shows, the
172[link transform `transform`]`<`[link repeated `repeated`]`<int_token>, ...>`
173parser parses the input using [link repeated `repeated`]`<int_token>` and then
174does some processing on the result of parsing.
175
176Let's implement the missing lambda expression that tells
177[link transform `transform`] how to change the result coming from
178[link repeated `repeated`]`<int_token>`. We can summarise the numbers in a
179typelist by using Boost.MPL's `fold` or `accumulate`. Here is an example doing
180that:
181
182  using sum_op = mpl::lambda<mpl::plus<mpl::_1, mpl::_2>>::type;
183
184  using sum_parser =
185    transform<
186      repeated<int_token>,
187      mpl::lambda<
188        mpl::fold<mpl::_1, mpl::int_<0>, sum_op>
189      >::type
190    >;
191
192Here is an extended version of the above diagram showing what happens here:
193
194[$images/metaparse/repeated_diag3.png [width 70%]]
195
196This example parses the input, builds the list of numbers and then loops over it
197and summarises the values. It starts with the second argument of `fold`,
198`int_<0>` and adds every item of the list of numbers (which is the result of
199the parser [link repeated `repeated`]`<int_token>`) one by one.
200
201[note
202  Note that [link transform `transform`] wraps another parser,
203  [link repeated `repeated`]`<int_token>` here. It parses the input with that
204  parser, gets the result of that parsing and changes that result.
205  [link transform `transform`] itself will be a parser returning that updated
206  result.
207]
208
209[#introducing-foldl]
210[section Introducing foldl]
211
212It works, however, this is rather inefficient: it has a loop parsing the
213integers one by one, building a typelist and then it loops over this typelist to
214summarise the result. Using template metaprograms in your applications can have
215a serious impact on the compiler's memory usage and the speed of the
216compilation, therefore I recommend being careful with these things.
217
218Metaparse offers more efficient ways of achieving the same result. You don't
219need two loops: you can merge them together and add every number to your summary
220right after parsing it. Metaparse offers the [link foldl `foldl`] for this.
221
222With [link foldl `foldl`] you specify:
223
224* the parser to parse the individual elements of the list
225  (which is `int_token` in our example)
226* the initial value used for folding (which is `int_<0>` in our example)
227* the forward operation merging the sub-result we have so far and the value
228  coming from the last application of the parser (this was `sum_op` in our
229  example)
230
231Our parser can be implemented this way:
232
233  using better_sum_parser = foldl<int_token, mpl::int_<0>, sum_op>;
234
235As you can see the implementation of the parser is more compact.
236Here is a diagram showing what happens when you use this parser to parse some
237input:
238
239[$images/metaparse/foldl_diag1.png [width 70%]]
240
241As you can see, not only the implementation of the parser is more compact, but
242it achieves the same result by doing less as well. It parses the input by
243applying `int_token` repeatedly, just like the previous solution. But it
244produces the final result without building a typelist as an internal step. Here
245is how it works internally:
246
247[$images/metaparse/foldl_diag2.png [width 70%]]
248
249It summarises the results of the repeated `int_token` application using
250`sum_op`. This implementation is more efficient. It accepts an empty string as a
251valid input: the sum of it is `0`. It may be good for you, in which case you are
252done. If you don't wan to accept it, you can use [link foldl1 `foldl1`] instead
253of [link foldl `foldl`]. This is the same, but it rejects empty input.
254(Metaparse offers [link repeated1 `repeated1`] as well if you choose the first
255approach and would like to reject empty string)
256
257[endsect]
258
259[#introducing-foldr]
260[section Introducing foldr]
261
262[note
263  Note that if you are reading this manual for the first time, you probably want
264  to skip this section and proceed with
265  [link introducing-foldl_start_with_parser Introducing foldl_start_with_parser]
266]
267
268You might have noticed that Metaparse offers [link foldr `foldr`] as well. The
269difference between [link foldl `foldl`] and [link foldr `foldr`] is the
270direction in which the results are summarised. (`l` stands for ['from the Left]
271and `r` stands for ['from the Right]) Here is a diagram showing how
272`better_sum_parser` works if it is implemented using [link foldr `foldr`]:
273
274[$images/metaparse/foldr_diag1.png [width 70%]]
275
276As you can see this is very similar to using [link foldl `foldl`], but the
277results coming out of the individual applications of `int_token` are summarised
278in a right-to-left order. As `sum_op` is addition, it does not affect the end
279result, but in other cases it might.
280
281[note
282  Note that the implementation of [link foldl `foldl`] is more efficient than
283  [link foldr `foldr`]. Prefer [link foldl `foldl`] whenever possible.
284]
285
286As you might expect it, Metaparse offers [link foldr1 `foldr1`] as well, which
287folds from the right and rejects empty input.
288
289[endsect]
290
291[#introducing-foldl_start_with_parser]
292[section Introducing foldl_start_with_parser]
293
294Let's change the grammar of our little language. Instead of a list of numbers,
295let's expect numbers separated by a `+` symbol. Our example input becomes the
296following:
297
298  BOOST_METAPARSE_STRING("11 + 13 + 3 + 21")
299
300Parsing it with [link foldl `foldl`] or [link repeated `repeated`] is difficult:
301there has to be a `+` symbol before every element ['except] the first one. None
302of the already introduced repetition constructs offer a way of treating the
303first element in a different way.
304
305If we forget about the first number for a moment, the rest of the input is
306`"+ 13 + 3 + 21"`. This can easily be parsed by [link foldl `foldl`] (or
307[link repeated `repeated`]):
308
309  using plus_token = token<lit_c<'+'>>;
310  using plus_int = last_of<plus_token, int_token>;
311
312  using sum_parser2 = foldl<plus_int, int_<0>, sum_op>;
313
314It uses `plus_int`, that is [link last_of `last_of`]`<plus_token, int_token>`
315as the parser that is used repeatedly to get the numbers. It does the following:
316
317* Uses `plus_token` to parse the `+` symbol and any whitespace that might follow
318  it.
319* Uses then `int_token` to parse the number
320* Combines the above two with [link last_of `last_of`] to use both parsers in
321  order and keep only the result of using the second one (the result of parsing
322  the `+` symbol is thrown away - we don't care about it).
323
324This way [link last_of `last_of`]`<plus_token, int_token>` returns the value of
325the number as the result of parsing, just like our previous parser, `int_token`
326did. Because of this, it can be used as a drop-in replacement of `int_token` in
327the previous example and we get a parser for our updated language. Or at least
328for all number except the first one.
329
330This [link foldl `foldl`] can not parse the first element, because it expects a
331`+` symbol before every number. You might think of making the `+` symbol
332optional in the above approach - don't do that. It makes the parser accept
333`"11 + 13 3 21"` as well as the `+` symbol is now optional ['everywhere].
334
335What you could do is parsing the first element with `int_token`, the rest of
336the elements with the above [link foldl `foldl`]-based solution and add the
337result of the two. This is left as an exercise to the reader.
338
339Metaparse offers [link foldl_start_with_parser `foldl_start_with_parser`] to
340implement this. [link foldl_start_with_parser `foldl_start_with_parser`] is the
341same as [link foldl `foldl`]. The difference is that instead of an initial value
342to combine the list elements with it takes an ['initial parser]:
343
344  using plus_token = token<lit_c<'+'>>;
345  using plus_int = last_of<plus_token, int_token>;
346
347  using sum_parser3 = foldl_start_with_parser<plus_int, int_token, sum_op>;
348
349[link foldl_start_with_parser `foldl_start_with_parser`] starts with applying
350that initial parser and uses the result it returns as the initial value for
351folding. It does the same as [link foldl `foldl`] after that. The following
352diagram shows how it can be used to parse a list of numbers separated by `+`
353symbols:
354
355[$images/metaparse/foldl_start_with_parser_diag1.png [width 70%]]
356
357As the diagram shows, it start parsing the list of numbers with `int_token`,
358uses its value as the starting value for folding (earlier approaches were using
359the value `int_<0>` as this starting value). Then it parses all elements of the
360list by using `plus_int` multiple times.
361
362[endsect]
363
364[#introducing-foldr_start_with_parser]
365[section Introducing foldr_start_with_parser]
366
367[note
368  Note that if you are reading this manual for the first time, you probably want
369  to skip this section and try creating some parsers using
370  [link foldl_start_with_parser `foldl_start_with_parser`] instead.
371]
372
373[@foldl_start_with_parser.hpp `foldl_start_with_parser`] has its
374['from the right] pair,
375[link foldr_start_with_parser `foldr_start_with_parser`]. It uses the same
376elements as [link foldl_start_with_parser `foldl_start_with_parser`] but in a
377different order. Here is a parser for our example language implemented with
378[link foldr_start_with_parser `foldr_start_with_parser`]:
379
380  using plus_token = token<lit_c<'+'>>;
381  using int_plus = first_of<int_token, plus_token>;
382
383  using sum_parser4 = foldr_start_with_parser<int_plus, int_token, sum_op>;
384
385Note that it uses `int_plus` instead of `plus_int`. This is because the parser
386the initial value for folding comes from is used after `int_plus` has parsed the
387input as many times as it could. It might sound strange for the first time, but
388the following diagram should help you understand how it works:
389
390[$images/metaparse/foldr_start_with_parser_diag1.png [width 70%]]
391
392As you can see, it starts with the parser that is applied repeatedly on the
393input, thus instead of parsing `plus_token int_token` repeatedly, we need to
394parse `int_token plus_token` repeatedly. The last number is not followed by `+`,
395thus `int_plus` fails to parse it and it stops the iteration.
396[link foldr_start_with_parser `foldr_start_with_parser`] then uses the other
397parser, `int_token` to parse the input. It succeeds and the result it returns is
398used as the starting value for folding from the right.
399
400[note
401  Note that as the above description also suggests, the implementation of
402  [link foldl_start_with_parser `foldl_start_with_parser`] is more efficient
403  than [link foldr_start_with_parser `foldr_start_with_parser`]. Prefer
404  [link foldl_start_with_parser `foldl_start_with_parser`] whenever possible.
405]
406
407[endsect]
408
409[#introducing-foldl_reject_incomplete_start_with_parser]
410[section Introducing foldl_reject_incomplete_start_with_parser]
411
412Using a parser built with
413[link foldl_start_with_parser `foldl_start_with_parser`] we can parse the input
414when the input is correct. However, it is not always the case. Consider the
415following input for example:
416
417  BOOST_METAPARSE_STRING("11 + 13 + 3 + 21 +")
418
419This is an invalid expression. However, if we parse it using the
420[link foldl_start_with_parser `foldl_start_with_parser`]-based parser presented
421earlier (`sum_parser3`), it accepts the input and the result is `48`. This is
422because [link foldl_start_with_parser `foldl_start_with_parser`] parses the
423input ['as long as it can]. It parses the first`int_token` (`11`) and then it
424starts parsing the `plus_int` elements (`+ 13`, `+ 3`, `+ 21`). After parsing
425all of these, it tries to parse the remaining `" +"` input using `plus_int`
426which fails and therefore
427[link foldl_start_with_parser `foldl_start_with_parser`] stops after `+ 21`.
428
429The problem is that the parser parses the longest sub-expression starting from
430the beginning, that represents a valid expression. The rest is ignored. The
431parser can be wrapped by [link entire_input `entire_input`] to make sure to
432reject expressions with invalid extra characters at the end, however, that
433won't make the error message useful. ([link entire_input `entire_input`] can
434only tell the author of the invalid expression that after `+ 21` is something
435wrong).
436
437Metaparse provides
438[link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`],
439which does the same as [link foldl_start_with_parser `foldl_start_with_parser`],
440except that once no further repetitions are found, it checks ['where] the
441repeated parser (in our example `plus_int`) fails. When it can make any progress
442(eg. it finds a `+` symbol), then
443[link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`]
444assumes, that the expression's author intended to make the repetition longer,
445but made a mistake and propagates the error message coming from that last broken
446expression.
447
448[$images/metaparse/foldl_reject_incomplete_start_with_parser_diag1.png [width 70%]]
449
450The above diagram shows how
451[link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`]
452parses the example invalid input and how it fails. This can be used for better
453error reporting from the parsers.
454
455Other folding parsers also have their `f` version. (eg.
456[link foldr_reject_incomplete `foldr_reject_incomplete`],
457[link foldl_reject_incomplete1 `foldl_reject_incomplete1`], etc).
458
459[endsect]
460
461[#finding-the-right-folding-parser-combinator]
462[section Finding the right folding parser combinator]
463
464As you might have noticed, there are a lot of different folding parser
465combinators. To help you find the right one, the following naming convention is
466used:
467
468[$images/metaparse/folds.png [width 70%]]
469
470[note
471  Note that there is no `foldr_reject_incomplete_start_with_parser`. The `p`
472  version of the right-folding parsers applies the special parser, whose result
473  is the initial value, after the repeated elements. Therefore, when the parser
474  parsing one repeated element fails, `foldr_start_with_parser` would apply that
475  special final parser instead of checking how the repeated element's parser
476  failed.
477]
478
479[endsect]
480
481[endsect]
482
483[#result_types]
484[section What can be built from a compile-time string?]
485
486Parsers built using Metaparse are template metaprograms parsing text (or code)
487at compile-time. Here is a list of things that can be the "result" of parsing:
488
489* A ['type]. An example for this is a parser parsing a `printf` format string
490  and returning the typelist (eg. `boost::mpl::vector`) of the expected
491  arguments.
492* A ['constant value]. An example for this is the result of a calculator
493  language. See the [link getting_started Getting Started] section for further
494  details.
495* A ['runtime object]. A static runtime object can be generated that might be
496  used at runtime. An example for this is parsing regular expressions at
497  compile-time and building `boost::xpressive::sregex` objects. See the
498  `regex` example of Metaparse for an example.
499* A C++ ['function], which might be called at runtime. A C++ function can be
500  generated that can be called at runtime. It is good for generating native
501  (and optimised) code from EDSLs. See the `compile_to_native_code` example of
502  Metaparse as an example for this.
503* A [link metafunction_class ['template metafunction class]]. The result of
504  parsing might be a type, which is a
505  [link metafunction_class template metafunction class]. This is good for
506  building an EDSL for template metaprogramming. See the `meta_hs` example of
507  Metaparse as an example for this.
508
509[endsect]
510
511[section Grammars]
512
513Metaparse provides a way to define grammars in a syntax that resembles EBNF. The
514[link grammar `grammar`] template can be used to define a grammar. It can be
515used the following way:
516
517  grammar<BOOST_METAPARSE_STRING("plus_exp")>
518    ::import<BOOST_METAPARSE_STRING("int_token"), token<int_>>::type
519
520    ::rule<BOOST_METAPARSE_STRING("ws ::= (' ' | '\n' | '\r' | '\t')*")>::type
521    ::rule<BOOST_METAPARSE_STRING("plus_token ::= '+' ws"), front<_1>>::type
522    ::rule<BOOST_METAPARSE_STRING("plus_exp ::= int_token (plus_token int_token)*"), plus_action>::type
523
524The code above defines a parser from a grammar definition. The start symbol of
525the grammar is `plus_exp`. The lines beginning with `::rule` define rules.
526Rules optionally have a semantic action, which is a metafunction class that
527transforms the result of parsing after the rule has been applied.
528Existing parsers can be bound to names and be used in the rules by importing
529them. Lines beginning with `::import` bind existing parsers to names.
530
531The result of a grammar definition is a parser which can be given to other
532parser combinators or be used directly. Given that grammars can import existing
533parsers and build new ones, they are parser combinators as well.
534
535[endsect]
536
537[endsect]
538
539[section Parsing based on `constexpr`]
540
541Metaparse is based on template metaprogramming, however, C++11 provides
542`constexpr`, which can be used for parsing at compile-time as well. While
543implementing parsers based on `constexpr` is easier for a C++ developer, since
544its syntax resembles the regular syntax of the language, the result of parsing
545has to be a `constexpr` value. Parsers based on template metaprogramming can
546build types as the result of parsing. These types may be boxed `constexpr`
547values but can be metafunction classes, classes with static functions which can
548be called at runtime, etc.
549
550When a parser built with Metaparse needs a sub-parser for processing a part of
551the input text and generating a `constexpr` value as the result of parsing, one
552can implement the sub-parser based on `constexpr` functions. Metaparse
553can be integrated with them and lift their results into C++ template
554metaprogramming. An example demonstrating this feature can be found among the
555examples (`constexpr_parser`). This capability makes it possible to integrate
556Metaparse with parsing libraries based on `constexpr`.
557
558[endsect]
559
560[section What types of grammars can be used?]
561
562It is possible to write parsers for ['context free grammars] using Metaparse.
563However, this is not the most general category of grammars that can be used. As
564Metaparse is a highly extendable framework, it is not clear what should be
565considered to be the limit of Metaparse itself. For example Metaparse provides
566the [link accept_when `accept_when`] [link parser_combinator parser combinator].
567It can be used to provide arbitrary predicates for enabled/disabling a specific
568rule. One can go as far as providing the Turing machine (as a
569[link metafunction metafunction]) of the entire grammar as a predicate, so one
570can build parsers for ['unrestricted grammars] that can be parsed using a Turing
571machine. Note that such a parser would not be considered to be a parser built
572with Metaparse, however, it is not clear how far a solution might go and still
573be considered using Metaparse.
574
575Metaparse assumes that the parsers are ['deterministic], as they have only "one"
576result. It is of course possible to write parsers and combinators that return a
577set (or list or some other container) of results as that "one" result, but that
578can be considered building a new parser library. There is no clear boundary for
579Metaparse.
580
581Metaparse supports building ['top-down parsers] and ['left-recursion] is not
582supported as it would lead to infinite recursion. ['Right-recursion] is
583supported, however, in most cases the
584[link repetition iterative parser combinators] provide better alternatives.
585
586[endsect]
587
588[endsect]
589
590