1[/============================================================================== 2 Copyright (C) 2001-2015 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section:roman Roman Numerals] 10 11This example demonstrates: 12 13* The Symbol Table 14* Non-terminal rules 15 16[heading Symbol Table] 17 18The symbol table holds a dictionary of symbols where each symbol is a sequence 19of characters. The template class, can work efficiently with 8, 16, 32 and even 2064 bit characters. Mutable data of type T are associated with each symbol. 21 22Traditionally, symbol table management is maintained separately outside the BNF 23grammar through semantic actions. Contrary to standard practice, the Spirit 24symbol table class `symbols` is a parser. An object of which may be used 25anywhere in the EBNF grammar specification. It is an example of a dynamic 26parser. A dynamic parser is characterized by its ability to modify its behavior 27at run time. Initially, an empty symbols object matches nothing. At any time, 28symbols may be added or removed, thus, dynamically altering its behavior. 29 30Each entry in a symbol table may have an associated mutable data slot. In this 31regard, one can view the symbol table as an associative container (or map) of 32key-value pairs where the keys are strings. 33 34The symbols class expects one template parameter to specify the data type 35associated with each symbol: its attribute. There are a couple of 36namespaces in X3 where you can find various versions of the symbols class 37for handling different character encoding including ascii, standard, 38standard_wide, iso8859_1, and unicode. The default symbol parser type in 39the main x3 namespace is standard. 40 41Here's a parser for roman hundreds (100..900) using the symbol table. Keep in 42mind that the data associated with each slot is the parser's attribute (which is 43passed to attached semantic actions). 44 45 struct hundreds_ : x3::symbols<unsigned> 46 { 47 hundreds_() 48 { 49 add 50 ("C" , 100) 51 ("CC" , 200) 52 ("CCC" , 300) 53 ("CD" , 400) 54 ("D" , 500) 55 ("DC" , 600) 56 ("DCC" , 700) 57 ("DCCC" , 800) 58 ("CM" , 900) 59 ; 60 } 61 62 } hundreds; 63 64Here's a parser for roman tens (10..90): 65 66 struct tens_ : x3::symbols<unsigned> 67 { 68 tens_() 69 { 70 add 71 ("X" , 10) 72 ("XX" , 20) 73 ("XXX" , 30) 74 ("XL" , 40) 75 ("L" , 50) 76 ("LX" , 60) 77 ("LXX" , 70) 78 ("LXXX" , 80) 79 ("XC" , 90) 80 ; 81 } 82 83 } tens; 84 85and, finally, for ones (1..9): 86 87 struct ones_ : x3::symbols<unsigned> 88 { 89 ones_() 90 { 91 add 92 ("I" , 1) 93 ("II" , 2) 94 ("III" , 3) 95 ("IV" , 4) 96 ("V" , 5) 97 ("VI" , 6) 98 ("VII" , 7) 99 ("VIII" , 8) 100 ("IX" , 9) 101 ; 102 } 103 104 } ones; 105 106Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions. 107They are all parsers. 108 109[heading Rules] 110 111Up until now, we've been inlining our parser expressions, passing them directly 112to the `phrase_parse` function. The expression evaluates into a temporary, 113unnamed parser which is passed into the `phrase_parse` function, used, and then 114destroyed. This is fine for small parsers. When the expressions get complicated, 115you'd want to break the expressions into smaller easier-to-understand pieces, 116name them, and refer to them from other parser expressions by name. 117 118A parser expression can be assigned to what is called a "rule". There are 119various ways to declare rules. The simplest form is: 120 121 rule<ID> const r = "some-name"; 122 123[heading Rule ID] 124 125At the very least, the rule needs an identification tag. This ID can be any 126struct or class type and need not be defined. Forward declaration would suffice. 127In subsequent tutorials, we will see that the rule ID can have additional 128functionalities for error handling and annotation. 129 130[heading Rule Name] 131 132The name is optional, but is useful for debugging and error handling, as 133we'll see later. Notice that rule `r` is declared `const`. Rules are 134immutable and are best declared as `const`. Rules are lightweight and can be 135passed around by value. Its only member variable is a `std::string`: its 136name. 137 138[note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and 139`parse` without having to specify the skip parser] 140 141[heading Rule Attributes] 142 143For our next example, there's one more rule form you should know about: 144 145 rule<ID, Attribute> const r = "some-name"; 146 147The Attribute parameter specifies the attribute type of the rule. You've seen 148that our parsers can have an attribute. Recall that the `double_` parser has 149an attribute of `double`. To be precise, these are /synthesized/ attributes. 150The parser "synthesizes" the attribute value. If the parser is a function, 151think of them as function return values. 152 153[heading Rule Definition] 154 155After having declared a rule, you need a definition for the rule. Example: 156 157 auto const r_def = double_ >> *(',' >> double_); 158 159By convention, rule definitions have a _def suffix. Like rules, rule definitions 160are immutable and are best declared as `const`. 161 162[#__tutorial_spirit_define__] 163[heading BOOST_SPIRIT_DEFINE] 164 165Now that we have a rule and its definition, we tie the rule with a rule 166definition using the `BOOST_SPIRIT_DEFINE` macro: 167 168 BOOST_SPIRIT_DEFINE(r); 169 170Behind the scenes, what's actually happening is that we are defining a `parse_rule` 171function in the client namespace that tells X3 how to invoke the rule. For example, 172given a rule named `my_rule` and a corresponding definition named `my_rule_def`, 173`BOOST_SPIRIT_DEFINE(my_rule)` expands to this code: 174 175 template <typename Iterator, typename Context> 176 inline bool parse_rule( 177 decltype(my_rule) 178 , Iterator& first, Iterator const& last 179 , Context const& context, decltype(my_rule)::attribute_type& attr) 180 { 181 using boost::spirit::x3::unused; 182 static auto const def_ = my_rule_def; 183 return def_.parse(first, last, context, unused, attr); 184 } 185 186And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an 187overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls 188the appropriate `parse_rule` function. 189 190[note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules. 191Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`] 192 193[heading Grammars] 194 195Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete 196entity for encapsulating rules. In X3, a grammar is simply a logical group of 197rules that work together, typically with a single top-level start rule which 198serves as the main entry point. X3 grammars are grouped using namespaces. 199The roman numeral grammar is a very nice and simple example of a grammar: 200 201 namespace parser 202 { 203 using x3::eps; 204 using x3::lit; 205 using x3::_val; 206 using x3::_attr; 207 using ascii::char_; 208 209 auto set_zero = [&](auto& ctx){ _val(ctx) = 0; }; 210 auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; }; 211 auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); }; 212 213 x3::rule<class roman, unsigned> const roman = "roman"; 214 215 auto const roman_def = 216 eps [set_zero] 217 >> 218 ( 219 -(+lit('M') [add1000]) 220 >> -hundreds [add] 221 >> -tens [add] 222 >> -ones [add] 223 ) 224 ; 225 226 BOOST_SPIRIT_DEFINE(roman); 227 } 228 229Things to take notice of: 230 231* The start rule's attribute is `unsigned`. 232 233* `_val(ctx)` gets a reference to the rule's synthesized attribute. 234 235* `_attr(ctx)` gets a reference to the parser's synthesized attribute. 236 237* `eps` is a special spirit parser that consumes no input but is always 238 successful. We use it to initialize the rule's synthesized 239 attribute, to zero before anything else. The actual parser starts at 240 `+lit('M')`, parsing roman thousands. Using `eps` this way is good 241 for doing pre and post initializations. 242 243* The rule `roman` and the definition `roman_def` are const objects. 244 245* The rule's ID is `class roman`. C++ allows you to declare the class 246 in the actual template declaration as you can see in the example: 247 248 x3::rule<class roman, unsigned> const roman = "roman"; 249 250[heading Let's Parse!] 251 252 bool r = parse(iter, end, roman, result); 253 254 if (r && iter == end) 255 { 256 std::cout << "-------------------------\n"; 257 std::cout << "Parsing succeeded\n"; 258 std::cout << "result = " << result << std::endl; 259 std::cout << "-------------------------\n"; 260 } 261 else 262 { 263 std::string rest(iter, end); 264 std::cout << "-------------------------\n"; 265 std::cout << "Parsing failed\n"; 266 std::cout << "stopped at: \": " << rest << "\"\n"; 267 std::cout << "-------------------------\n"; 268 } 269 270`roman` is our roman numeral parser. This time around we are using the 271no-skipping version of the parse functions. We do not want to skip any spaces! 272We are also passing in an attribute, `unsigned result`, which will receive the 273parsed value. 274 275The full cpp file for this example can be found here: 276[@../../../example/x3/roman.cpp roman.cpp] 277 278[endsect] 279