• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/==============================================================================
2    Copyright (C) 2001-2015 Joel de Guzman
3    Copyright (C) 2001-2011 Hartmut Kaiser
4
5    Distributed under the Boost Software License, Version 1.0. (See accompanying
6    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8
9[section:roman Roman Numerals]
10
11This example demonstrates:
12
13* The Symbol Table
14* Non-terminal rules
15
16[heading Symbol Table]
17
18The symbol table holds a dictionary of symbols where each symbol is a sequence
19of characters. The template class, can work efficiently with 8, 16, 32 and even
2064 bit characters. Mutable data of type T are associated with each symbol.
21
22Traditionally, symbol table management is maintained separately outside the BNF
23grammar through semantic actions. Contrary to standard practice, the Spirit
24symbol table class `symbols` is a parser. An object of which may be used
25anywhere in the EBNF grammar specification. It is an example of a dynamic
26parser. A dynamic parser is characterized by its ability to modify its behavior
27at run time. Initially, an empty symbols object matches nothing. At any time,
28symbols may be added or removed, thus, dynamically altering its behavior.
29
30Each entry in a symbol table may have an associated mutable data slot. In this
31regard, one can view the symbol table as an associative container (or map) of
32key-value pairs where the keys are strings.
33
34The symbols class expects one template parameter to specify the data type
35associated with each symbol: its attribute. There are a couple of
36namespaces in X3 where you can find various versions of the symbols class
37for handling different  character encoding including ascii, standard,
38standard_wide, iso8859_1, and unicode. The default symbol parser type in
39the main x3 namespace is standard.
40
41Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
42mind that the data associated with each slot is the parser's attribute (which is
43passed to attached semantic actions).
44
45    struct hundreds_ : x3::symbols<unsigned>
46    {
47        hundreds_()
48        {
49            add
50                ("C"    , 100)
51                ("CC"   , 200)
52                ("CCC"  , 300)
53                ("CD"   , 400)
54                ("D"    , 500)
55                ("DC"   , 600)
56                ("DCC"  , 700)
57                ("DCCC" , 800)
58                ("CM"   , 900)
59            ;
60        }
61
62    } hundreds;
63
64Here's a parser for roman tens (10..90):
65
66    struct tens_ : x3::symbols<unsigned>
67    {
68        tens_()
69        {
70            add
71                ("X"    , 10)
72                ("XX"   , 20)
73                ("XXX"  , 30)
74                ("XL"   , 40)
75                ("L"    , 50)
76                ("LX"   , 60)
77                ("LXX"  , 70)
78                ("LXXX" , 80)
79                ("XC"   , 90)
80            ;
81        }
82
83    } tens;
84
85and, finally, for ones (1..9):
86
87    struct ones_ : x3::symbols<unsigned>
88    {
89        ones_()
90        {
91            add
92                ("I"    , 1)
93                ("II"   , 2)
94                ("III"  , 3)
95                ("IV"   , 4)
96                ("V"    , 5)
97                ("VI"   , 6)
98                ("VII"  , 7)
99                ("VIII" , 8)
100                ("IX"   , 9)
101            ;
102        }
103
104    } ones;
105
106Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
107They are all parsers.
108
109[heading Rules]
110
111Up until now, we've been inlining our parser expressions, passing them directly
112to the `phrase_parse` function. The expression evaluates into a temporary,
113unnamed parser which is passed into the `phrase_parse` function, used, and then
114destroyed. This is fine for small parsers. When the expressions get complicated,
115you'd want to break the expressions into smaller easier-to-understand pieces,
116name them, and refer to them from other parser expressions by name.
117
118A parser expression can be assigned to what is called a "rule". There are
119various ways to declare rules. The simplest form is:
120
121    rule<ID> const r = "some-name";
122
123[heading Rule ID]
124
125At the very least, the rule needs an identification tag. This ID can be any
126struct or class type and need not be defined. Forward declaration would suffice.
127In subsequent tutorials, we will see that the rule ID can have additional
128functionalities for error handling and annotation.
129
130[heading Rule Name]
131
132The name is optional, but is useful for debugging and error handling, as
133we'll see later. Notice that rule `r` is declared `const`. Rules are
134immutable and are best declared as `const`. Rules are lightweight and can be
135passed around by value. Its only member variable is a `std::string`: its
136name.
137
138[note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and
139`parse` without having to specify the skip parser]
140
141[heading Rule Attributes]
142
143For our next example, there's one more rule form you should know about:
144
145    rule<ID, Attribute> const r = "some-name";
146
147The Attribute parameter specifies the attribute type of the rule. You've seen
148that our parsers can have an attribute. Recall that the `double_` parser has
149an attribute of `double`. To be precise, these are /synthesized/ attributes.
150The parser "synthesizes" the attribute value. If the parser is a function,
151think of them as function return values.
152
153[heading Rule Definition]
154
155After having declared a rule, you need a definition for the rule. Example:
156
157    auto const r_def = double_ >> *(',' >> double_);
158
159By convention, rule definitions have a _def suffix. Like rules, rule definitions
160are immutable and are best declared as `const`.
161
162[#__tutorial_spirit_define__]
163[heading BOOST_SPIRIT_DEFINE]
164
165Now that we have a rule and its definition, we tie the rule with a rule
166definition using the `BOOST_SPIRIT_DEFINE` macro:
167
168    BOOST_SPIRIT_DEFINE(r);
169
170Behind the scenes, what's actually happening is that we are defining a `parse_rule`
171function in the client namespace that tells X3 how to invoke the rule.
172And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an
173overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls
174the appropriate `parse_rule` function.
175
176[note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules.
177Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`]
178
179[heading Grammars]
180
181Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete
182entity for encapsulating rules. In X3, a grammar is simply a logical group of
183rules that work together, typically with a single top-level start rule which
184serves as the main entry point. X3 grammars are grouped using namespaces.
185The roman numeral grammar is a very nice and simple example of a grammar:
186
187    namespace parser
188    {
189        using x3::eps;
190        using x3::lit;
191        using x3::_val;
192        using x3::_attr;
193        using ascii::char_;
194
195        auto set_zero = [&](auto& ctx){ _val(ctx) = 0; };
196        auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; };
197        auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); };
198
199        x3::rule<class roman, unsigned> const roman = "roman";
200
201        auto const roman_def =
202            eps                 [set_zero]
203            >>
204            (
205                -(+lit('M')     [add1000])
206                >>  -hundreds   [add]
207                >>  -tens       [add]
208                >>  -ones       [add]
209            )
210        ;
211
212        BOOST_SPIRIT_DEFINE(roman);
213    }
214
215Things to take notice of:
216
217* The start rule's attribute is `unsigned`.
218
219* `_val(ctx)` gets a reference to the rule's synthesized attribute.
220
221* `_attr(ctx)` gets a reference to the parser's synthesized attribute.
222
223* `eps` is a special spirit parser that consumes no input but is always
224  successful. We use it to initialize the rule's synthesized
225  attribute, to zero before anything else. The actual parser starts at
226  `+lit('M')`, parsing roman thousands. Using `eps` this way is good
227  for doing pre and post initializations.
228
229* The rule `roman` and the definition `roman_def` are const objects.
230
231* The rule's ID is `class roman`. C++ allows you to declare the class
232  in the actual template declaration as you can see in the example:
233
234    x3::rule<class roman, unsigned> const roman = "roman";
235
236[heading Let's Parse!]
237
238    bool r = parse(iter, end, roman, result);
239
240    if (r && iter == end)
241    {
242        std::cout << "-------------------------\n";
243        std::cout << "Parsing succeeded\n";
244        std::cout << "result = " << result << std::endl;
245        std::cout << "-------------------------\n";
246    }
247    else
248    {
249        std::string rest(iter, end);
250        std::cout << "-------------------------\n";
251        std::cout << "Parsing failed\n";
252        std::cout << "stopped at: \": " << rest << "\"\n";
253        std::cout << "-------------------------\n";
254    }
255
256`roman` is our roman numeral parser. This time around we are using the
257no-skipping version of the parse functions. We do not want to skip any spaces!
258We are also passing in an attribute, `unsigned result`, which will receive the
259parsed value.
260
261The full cpp file for this example can be found here:
262[@../../../example/x3/roman.cpp roman.cpp]
263
264[endsect]
265