• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/==============================================================================
2    Copyright (C) 2001-2015 Joel de Guzman
3    Copyright (C) 2001-2011 Hartmut Kaiser
4
5    Distributed under the Boost Software License, Version 1.0. (See accompanying
6    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8
9[section:roman Roman Numerals]
10
11This example demonstrates:
12
13* The Symbol Table
14* Non-terminal rules
15
16[heading Symbol Table]
17
18The symbol table holds a dictionary of symbols where each symbol is a sequence
19of characters. The template class, can work efficiently with 8, 16, 32 and even
2064 bit characters. Mutable data of type T are associated with each symbol.
21
22Traditionally, symbol table management is maintained separately outside the BNF
23grammar through semantic actions. Contrary to standard practice, the Spirit
24symbol table class `symbols` is a parser. An object of which may be used
25anywhere in the EBNF grammar specification. It is an example of a dynamic
26parser. A dynamic parser is characterized by its ability to modify its behavior
27at run time. Initially, an empty symbols object matches nothing. At any time,
28symbols may be added or removed, thus, dynamically altering its behavior.
29
30Each entry in a symbol table may have an associated mutable data slot. In this
31regard, one can view the symbol table as an associative container (or map) of
32key-value pairs where the keys are strings.
33
34The symbols class expects one template parameter to specify the data type
35associated with each symbol: its attribute. There are a couple of
36namespaces in X3 where you can find various versions of the symbols class
37for handling different  character encoding including ascii, standard,
38standard_wide, iso8859_1, and unicode. The default symbol parser type in
39the main x3 namespace is standard.
40
41Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
42mind that the data associated with each slot is the parser's attribute (which is
43passed to attached semantic actions).
44
45    struct hundreds_ : x3::symbols<unsigned>
46    {
47        hundreds_()
48        {
49            add
50                ("C"    , 100)
51                ("CC"   , 200)
52                ("CCC"  , 300)
53                ("CD"   , 400)
54                ("D"    , 500)
55                ("DC"   , 600)
56                ("DCC"  , 700)
57                ("DCCC" , 800)
58                ("CM"   , 900)
59            ;
60        }
61
62    } hundreds;
63
64Here's a parser for roman tens (10..90):
65
66    struct tens_ : x3::symbols<unsigned>
67    {
68        tens_()
69        {
70            add
71                ("X"    , 10)
72                ("XX"   , 20)
73                ("XXX"  , 30)
74                ("XL"   , 40)
75                ("L"    , 50)
76                ("LX"   , 60)
77                ("LXX"  , 70)
78                ("LXXX" , 80)
79                ("XC"   , 90)
80            ;
81        }
82
83    } tens;
84
85and, finally, for ones (1..9):
86
87    struct ones_ : x3::symbols<unsigned>
88    {
89        ones_()
90        {
91            add
92                ("I"    , 1)
93                ("II"   , 2)
94                ("III"  , 3)
95                ("IV"   , 4)
96                ("V"    , 5)
97                ("VI"   , 6)
98                ("VII"  , 7)
99                ("VIII" , 8)
100                ("IX"   , 9)
101            ;
102        }
103
104    } ones;
105
106Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
107They are all parsers.
108
109[heading Rules]
110
111Up until now, we've been inlining our parser expressions, passing them directly
112to the `phrase_parse` function. The expression evaluates into a temporary,
113unnamed parser which is passed into the `phrase_parse` function, used, and then
114destroyed. This is fine for small parsers. When the expressions get complicated,
115you'd want to break the expressions into smaller easier-to-understand pieces,
116name them, and refer to them from other parser expressions by name.
117
118A parser expression can be assigned to what is called a "rule". There are
119various ways to declare rules. The simplest form is:
120
121    rule<ID> const r = "some-name";
122
123[heading Rule ID]
124
125At the very least, the rule needs an identification tag. This ID can be any
126struct or class type and need not be defined. Forward declaration would suffice.
127In subsequent tutorials, we will see that the rule ID can have additional
128functionalities for error handling and annotation.
129
130[heading Rule Name]
131
132The name is optional, but is useful for debugging and error handling, as
133we'll see later. Notice that rule `r` is declared `const`. Rules are
134immutable and are best declared as `const`. Rules are lightweight and can be
135passed around by value. Its only member variable is a `std::string`: its
136name.
137
138[note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and
139`parse` without having to specify the skip parser]
140
141[heading Rule Attributes]
142
143For our next example, there's one more rule form you should know about:
144
145    rule<ID, Attribute> const r = "some-name";
146
147The Attribute parameter specifies the attribute type of the rule. You've seen
148that our parsers can have an attribute. Recall that the `double_` parser has
149an attribute of `double`. To be precise, these are /synthesized/ attributes.
150The parser "synthesizes" the attribute value. If the parser is a function,
151think of them as function return values.
152
153[heading Rule Definition]
154
155After having declared a rule, you need a definition for the rule. Example:
156
157    auto const r_def = double_ >> *(',' >> double_);
158
159By convention, rule definitions have a _def suffix. Like rules, rule definitions
160are immutable and are best declared as `const`.
161
162[#__tutorial_spirit_define__]
163[heading BOOST_SPIRIT_DEFINE]
164
165Now that we have a rule and its definition, we tie the rule with a rule
166definition using the `BOOST_SPIRIT_DEFINE` macro:
167
168    BOOST_SPIRIT_DEFINE(r);
169
170Behind the scenes, what's actually happening is that we are defining a `parse_rule`
171function in the client namespace that tells X3 how to invoke the rule. For example,
172given a rule named `my_rule` and a corresponding definition named `my_rule_def`,
173`BOOST_SPIRIT_DEFINE(my_rule)` expands to this code:
174
175    template <typename Iterator, typename Context>
176    inline bool parse_rule(
177        decltype(my_rule)
178      , Iterator& first, Iterator const& last
179      , Context const& context, decltype(my_rule)::attribute_type& attr)
180    {
181        using boost::spirit::x3::unused;
182        static auto const def_ = my_rule_def;
183        return def_.parse(first, last, context, unused, attr);
184    }
185
186And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an
187overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls
188the appropriate `parse_rule` function.
189
190[note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules.
191Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`]
192
193[heading Grammars]
194
195Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete
196entity for encapsulating rules. In X3, a grammar is simply a logical group of
197rules that work together, typically with a single top-level start rule which
198serves as the main entry point. X3 grammars are grouped using namespaces.
199The roman numeral grammar is a very nice and simple example of a grammar:
200
201    namespace parser
202    {
203        using x3::eps;
204        using x3::lit;
205        using x3::_val;
206        using x3::_attr;
207        using ascii::char_;
208
209        auto set_zero = [&](auto& ctx){ _val(ctx) = 0; };
210        auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; };
211        auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); };
212
213        x3::rule<class roman, unsigned> const roman = "roman";
214
215        auto const roman_def =
216            eps                 [set_zero]
217            >>
218            (
219                -(+lit('M')     [add1000])
220                >>  -hundreds   [add]
221                >>  -tens       [add]
222                >>  -ones       [add]
223            )
224        ;
225
226        BOOST_SPIRIT_DEFINE(roman);
227    }
228
229Things to take notice of:
230
231* The start rule's attribute is `unsigned`.
232
233* `_val(ctx)` gets a reference to the rule's synthesized attribute.
234
235* `_attr(ctx)` gets a reference to the parser's synthesized attribute.
236
237* `eps` is a special spirit parser that consumes no input but is always
238  successful. We use it to initialize the rule's synthesized
239  attribute, to zero before anything else. The actual parser starts at
240  `+lit('M')`, parsing roman thousands. Using `eps` this way is good
241  for doing pre and post initializations.
242
243* The rule `roman` and the definition `roman_def` are const objects.
244
245* The rule's ID is `class roman`. C++ allows you to declare the class
246  in the actual template declaration as you can see in the example:
247
248    x3::rule<class roman, unsigned> const roman = "roman";
249
250[heading Let's Parse!]
251
252    bool r = parse(iter, end, roman, result);
253
254    if (r && iter == end)
255    {
256        std::cout << "-------------------------\n";
257        std::cout << "Parsing succeeded\n";
258        std::cout << "result = " << result << std::endl;
259        std::cout << "-------------------------\n";
260    }
261    else
262    {
263        std::string rest(iter, end);
264        std::cout << "-------------------------\n";
265        std::cout << "Parsing failed\n";
266        std::cout << "stopped at: \": " << rest << "\"\n";
267        std::cout << "-------------------------\n";
268    }
269
270`roman` is our roman numeral parser. This time around we are using the
271no-skipping version of the parse functions. We do not want to skip any spaces!
272We are also passing in an attribute, `unsigned result`, which will receive the
273parsed value.
274
275The full cpp file for this example can be found here:
276[@../../../example/x3/roman.cpp roman.cpp]
277
278[endsect]
279