1[/============================================================================== 2 Copyright (C) 2001-2015 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section:roman Roman Numerals] 10 11This example demonstrates: 12 13* The Symbol Table 14* Non-terminal rules 15 16[heading Symbol Table] 17 18The symbol table holds a dictionary of symbols where each symbol is a sequence 19of characters. The template class, can work efficiently with 8, 16, 32 and even 2064 bit characters. Mutable data of type T are associated with each symbol. 21 22Traditionally, symbol table management is maintained separately outside the BNF 23grammar through semantic actions. Contrary to standard practice, the Spirit 24symbol table class `symbols` is a parser. An object of which may be used 25anywhere in the EBNF grammar specification. It is an example of a dynamic 26parser. A dynamic parser is characterized by its ability to modify its behavior 27at run time. Initially, an empty symbols object matches nothing. At any time, 28symbols may be added or removed, thus, dynamically altering its behavior. 29 30Each entry in a symbol table may have an associated mutable data slot. In this 31regard, one can view the symbol table as an associative container (or map) of 32key-value pairs where the keys are strings. 33 34The symbols class expects one template parameter to specify the data type 35associated with each symbol: its attribute. There are a couple of 36namespaces in X3 where you can find various versions of the symbols class 37for handling different character encoding including ascii, standard, 38standard_wide, iso8859_1, and unicode. The default symbol parser type in 39the main x3 namespace is standard. 40 41Here's a parser for roman hundreds (100..900) using the symbol table. Keep in 42mind that the data associated with each slot is the parser's attribute (which is 43passed to attached semantic actions). 44 45 struct hundreds_ : x3::symbols<unsigned> 46 { 47 hundreds_() 48 { 49 add 50 ("C" , 100) 51 ("CC" , 200) 52 ("CCC" , 300) 53 ("CD" , 400) 54 ("D" , 500) 55 ("DC" , 600) 56 ("DCC" , 700) 57 ("DCCC" , 800) 58 ("CM" , 900) 59 ; 60 } 61 62 } hundreds; 63 64Here's a parser for roman tens (10..90): 65 66 struct tens_ : x3::symbols<unsigned> 67 { 68 tens_() 69 { 70 add 71 ("X" , 10) 72 ("XX" , 20) 73 ("XXX" , 30) 74 ("XL" , 40) 75 ("L" , 50) 76 ("LX" , 60) 77 ("LXX" , 70) 78 ("LXXX" , 80) 79 ("XC" , 90) 80 ; 81 } 82 83 } tens; 84 85and, finally, for ones (1..9): 86 87 struct ones_ : x3::symbols<unsigned> 88 { 89 ones_() 90 { 91 add 92 ("I" , 1) 93 ("II" , 2) 94 ("III" , 3) 95 ("IV" , 4) 96 ("V" , 5) 97 ("VI" , 6) 98 ("VII" , 7) 99 ("VIII" , 8) 100 ("IX" , 9) 101 ; 102 } 103 104 } ones; 105 106Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions. 107They are all parsers. 108 109[heading Rules] 110 111Up until now, we've been inlining our parser expressions, passing them directly 112to the `phrase_parse` function. The expression evaluates into a temporary, 113unnamed parser which is passed into the `phrase_parse` function, used, and then 114destroyed. This is fine for small parsers. When the expressions get complicated, 115you'd want to break the expressions into smaller easier-to-understand pieces, 116name them, and refer to them from other parser expressions by name. 117 118A parser expression can be assigned to what is called a "rule". There are 119various ways to declare rules. The simplest form is: 120 121 rule<ID> const r = "some-name"; 122 123[heading Rule ID] 124 125At the very least, the rule needs an identification tag. This ID can be any 126struct or class type and need not be defined. Forward declaration would suffice. 127In subsequent tutorials, we will see that the rule ID can have additional 128functionalities for error handling and annotation. 129 130[heading Rule Name] 131 132The name is optional, but is useful for debugging and error handling, as 133we'll see later. Notice that rule `r` is declared `const`. Rules are 134immutable and are best declared as `const`. Rules are lightweight and can be 135passed around by value. Its only member variable is a `std::string`: its 136name. 137 138[note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and 139`parse` without having to specify the skip parser] 140 141[heading Rule Attributes] 142 143For our next example, there's one more rule form you should know about: 144 145 rule<ID, Attribute> const r = "some-name"; 146 147The Attribute parameter specifies the attribute type of the rule. You've seen 148that our parsers can have an attribute. Recall that the `double_` parser has 149an attribute of `double`. To be precise, these are /synthesized/ attributes. 150The parser "synthesizes" the attribute value. If the parser is a function, 151think of them as function return values. 152 153[heading Rule Definition] 154 155After having declared a rule, you need a definition for the rule. Example: 156 157 auto const r_def = double_ >> *(',' >> double_); 158 159By convention, rule definitions have a _def suffix. Like rules, rule definitions 160are immutable and are best declared as `const`. 161 162[#__tutorial_spirit_define__] 163[heading BOOST_SPIRIT_DEFINE] 164 165Now that we have a rule and its definition, we tie the rule with a rule 166definition using the `BOOST_SPIRIT_DEFINE` macro: 167 168 BOOST_SPIRIT_DEFINE(r); 169 170Behind the scenes, what's actually happening is that we are defining a `parse_rule` 171function in the client namespace that tells X3 how to invoke the rule. 172And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an 173overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls 174the appropriate `parse_rule` function. 175 176[note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules. 177Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`] 178 179[heading Grammars] 180 181Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete 182entity for encapsulating rules. In X3, a grammar is simply a logical group of 183rules that work together, typically with a single top-level start rule which 184serves as the main entry point. X3 grammars are grouped using namespaces. 185The roman numeral grammar is a very nice and simple example of a grammar: 186 187 namespace parser 188 { 189 using x3::eps; 190 using x3::lit; 191 using x3::_val; 192 using x3::_attr; 193 using ascii::char_; 194 195 auto set_zero = [&](auto& ctx){ _val(ctx) = 0; }; 196 auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; }; 197 auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); }; 198 199 x3::rule<class roman, unsigned> const roman = "roman"; 200 201 auto const roman_def = 202 eps [set_zero] 203 >> 204 ( 205 -(+lit('M') [add1000]) 206 >> -hundreds [add] 207 >> -tens [add] 208 >> -ones [add] 209 ) 210 ; 211 212 BOOST_SPIRIT_DEFINE(roman); 213 } 214 215Things to take notice of: 216 217* The start rule's attribute is `unsigned`. 218 219* `_val(ctx)` gets a reference to the rule's synthesized attribute. 220 221* `_attr(ctx)` gets a reference to the parser's synthesized attribute. 222 223* `eps` is a special spirit parser that consumes no input but is always 224 successful. We use it to initialize the rule's synthesized 225 attribute, to zero before anything else. The actual parser starts at 226 `+lit('M')`, parsing roman thousands. Using `eps` this way is good 227 for doing pre and post initializations. 228 229* The rule `roman` and the definition `roman_def` are const objects. 230 231* The rule's ID is `class roman`. C++ allows you to declare the class 232 in the actual template declaration as you can see in the example: 233 234 x3::rule<class roman, unsigned> const roman = "roman"; 235 236[heading Let's Parse!] 237 238 bool r = parse(iter, end, roman, result); 239 240 if (r && iter == end) 241 { 242 std::cout << "-------------------------\n"; 243 std::cout << "Parsing succeeded\n"; 244 std::cout << "result = " << result << std::endl; 245 std::cout << "-------------------------\n"; 246 } 247 else 248 { 249 std::string rest(iter, end); 250 std::cout << "-------------------------\n"; 251 std::cout << "Parsing failed\n"; 252 std::cout << "stopped at: \": " << rest << "\"\n"; 253 std::cout << "-------------------------\n"; 254 } 255 256`roman` is our roman numeral parser. This time around we are using the 257no-skipping version of the parse functions. We do not want to skip any spaces! 258We are also passing in an attribute, `unsigned result`, which will receive the 259parsed value. 260 261The full cpp file for this example can be found here: 262[@../../../example/x3/roman.cpp roman.cpp] 263 264[endsect] 265