1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section Roman Numerals] 10 11This example demonstrates: 12 13* symbol table 14* rule 15* grammar 16 17[heading Symbol Table] 18 19The symbol table holds a dictionary of symbols where each symbol is a sequence 20of characters (a `char`, `wchar_t`, `int`, enumeration etc.) . The template 21class, parameterized by the character type, can work efficiently with 8, 16, 32 22and even 64 bit characters. Mutable data of type T are associated with each 23symbol. 24 25Traditionally, symbol table management is maintained separately outside the BNF 26grammar through semantic actions. Contrary to standard practice, the Spirit 27symbol table class `symbols` is a parser. An object of which may be used 28anywhere in the EBNF grammar specification. It is an example of a dynamic 29parser. A dynamic parser is characterized by its ability to modify its behavior 30at run time. Initially, an empty symbols object matches nothing. At any time, 31symbols may be added or removed, thus, dynamically altering its behavior. 32 33Each entry in a symbol table has an associated mutable data slot. In this 34regard, one can view the symbol table as an associative container (or map) of 35key-value pairs where the keys are strings. 36 37The symbols class expects two template parameters. The first parameter specifies 38the character type of the symbols. The second specifies the data type associated 39with each symbol: its attribute. 40 41Here's a parser for roman hundreds (100..900) using the symbol table. Keep in 42mind that the data associated with each slot is the parser's attribute (which is 43passed to attached semantic actions). 44 45[import ../../example/qi/roman.cpp] 46 47[tutorial_roman_hundreds] 48 49Here's a parser for roman tens (10..90): 50 51[tutorial_roman_tens] 52 53and, finally, for ones (1..9): 54 55[tutorial_roman_ones] 56 57Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions. 58They are all parsers. 59 60[heading Rules] 61 62Up until now, we've been inlining our parser expressions, passing them directly 63to the `phrase_parse` function. The expression evaluates into a temporary, 64unnamed parser which is passed into the `phrase_parse` function, used, and then 65destroyed. This is fine for small parsers. When the expressions get complicated, 66you'd want to break the expressions into smaller easier-to-understand pieces, 67name them, and refer to them from other parser expressions by name. 68 69A parser expression can be assigned to what is called a "rule". There are 70various ways to declare rules. The simplest form is: 71 72 rule<Iterator> r; 73 74At the very least, the rule needs to know the iterator type it will be working 75on. This rule cannot be used with `phrase_parse`. It can only be used with the 76`parse` function -- a version that does not do white space skipping (does not 77have the skipper argument). If you want to have it skip white spaces, you need 78to pass in the type skip parser, as in the next form: 79 80 rule<Iterator, Skipper> r; 81 82Example: 83 84 rule<std::string::iterator, space_type> r; 85 86This type of rule can be used for both `phrase_parse` and `parse`. 87 88For our next example, there's one more rule form you should know about: 89 90 rule<Iterator, Signature> r; 91 92or 93 94 rule<Iterator, Signature, Skipper> r; 95 96[tip All rule template arguments after Iterator can be supplied in any order.] 97 98The Signature specifies the attributes of the rule. You've seen that our parsers 99can have an attribute. Recall that the `double_` parser has an attribute of 100`double`. To be precise, these are /synthesized/ attributes. The parser 101"synthesizes" the attribute value. Think of them as function return values. 102 103There's another type of attribute called "inherited" attribute. We won't need 104them for now, but it's good that you be aware of such attributes. You can think 105of them as function arguments. And, rightly so, the rule signature is a function 106signature of the form: 107 108 result(argN, argN,..., argN) 109 110After having declared a rule, you can now assign any parser expression to it. 111Example: 112 113 r = double_ >> *(',' >> double_); 114 115[heading Grammars] 116 117A grammar encapsulates one or more rules. It has the same template parameters as 118the rule. You declare a grammar by: 119 120# deriving a struct (or class) from the `grammar` class template 121# declare one or more rules as member variables 122# initialize the base grammar class by giving it the start rule (its the first 123 rule that gets called when the grammar starts parsing) 124# initialize your rules in your constructor 125 126The roman numeral grammar is a very nice and simple example of a grammar: 127 128[tutorial_roman_grammar] 129 130Things to take notice of: 131 132* The grammar and start rule signature is `unsigned()`. It has a synthesized 133 attribute (return value) of type `unsigned` with no inherited attributes 134 (arguments). 135 136* We did not specify a skip-parser. We don't want to skip in between the 137 numerals. 138 139* `roman::base_type` is a typedef for `grammar<Iterator, unsigned()>`. If 140 `roman` was not a template, you could simply write: base_type(start) 141 142* It's best to make your grammar templates such that they can be reused 143 for different iterator types. 144 145* `_val` is another __phoenix__ placeholder representing the rule's synthesized 146 attribute. 147 148* `eps` is a special spirit parser that consumes no input but is always 149 successful. We use it to initialize `_val`, the rule's synthesized 150 attribute, to zero before anything else. The actual parser starts at 151 `+lit('M')`, parsing roman thousands. Using `eps` this way is good 152 for doing pre and post initializations. 153 154* The expression `a || b` reads: match a or b and in sequence. That is, if both 155 `a` and `b` match, it must be in sequence; this is equivalent to `a >> -b | b`, 156 but more efficient. 157 158[heading Let's Parse!] 159 160[tutorial_roman_grammar_parse] 161 162`roman_parser` is an object of type `roman`, our roman numeral parser. This time 163around we are using the no-skipping version of the parse functions. We do not 164want to skip any spaces! We are also passing in an attribute, `unsigned result`, 165which will receive the parsed value. 166 167The full cpp file for this example can be found here: [@../../example/qi/roman.cpp] 168 169 170[endsect] 171