1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2<html> 3 <head> 4 <meta content= 5 "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org" 6 name="generator"> 7 <title> 8 Quick Start 9 </title> 10 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 11 <link rel="stylesheet" href="theme/style.css" type="text/css"> 12 </head> 13 <body> 14 <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> 15 <tr> 16 <td width="10"></td> 17 <td width="85%"> 18 <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Quick 19 Start</b></font> 20 </td> 21 <td width="112"> 22 <a href="http://spirit.sf.net"><img src="theme/spirit.gif" 23 width="112" height="48" align="right" border="0"></a> 24 </td> 25 </tr> 26 </table><br> 27 <table border="0"> 28 <tr> 29 <td width="10"></td> 30 <td width="30"> 31 <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a> 32 </td> 33 <td width="30"> 34 <a href="introduction.html"><img src="theme/l_arr.gif" border="0"> 35 </a> 36 </td> 37 <td width="30"> 38 <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0"> 39 </a> 40 </td> 41 </tr> 42 </table> 43 <h2> 44 <b>Why would you want to use Spirit?</b> 45 </h2> 46 <p> 47 Spirit is designed to be a practical parsing tool. At the very least, the 48 ability to generate a fully-working parser from a formal EBNF 49 specification inlined in C++ significantly reduces development time. 50 While it may be practical to use a full-blown, stand-alone parser such as 51 YACC or ANTLR when we want to develop a computer language such as C or 52 Pascal, it is certainly overkill to bring in the big guns when we wish to 53 write extremely small micro-parsers. At that end of the spectrum, 54 programmers typically approach the job at hand not as a formal parsing 55 task but through ad hoc hacks using primitive tools such as 56 <tt>scanf</tt>. True, there are tools such as regular-expression 57 libraries (such as <a href= 58 "http://www.boost.org/libs/regex/index.html">boost regex</a>) or scanners 59 (such as <a href="http://www.boost.org/libs/tokenizer/index.html">boost 60 tokenizer</a>), but these tools do not scale well when we need to write 61 more elaborate parsers. Attempting to write even a moderately-complex 62 parser using these tools leads to code that is hard to understand and 63 maintain. 64 </p> 65 <p> 66 One prime objective is to make the tool easy to use. When one thinks of a 67 parser generator, the usual reaction is "it must be big and complex with 68 a steep learning curve." Not so. Spirit is designed to be fully scalable. 69 The framework is structured in layers. This permits learning on an 70 as-needed basis, after only learning the minimal core and basic concepts. 71 </p> 72 <p> 73 For development simplicity and ease in deployment, the entire framework 74 consists of only header files, with no libraries to link against or 75 build. Just put the spirit distribution in your include path, compile and 76 run. Code size? -very tight. In the quick start example that we shall 77 present in a short while, the code size is dominated by the instantiation 78 of the <tt>std::vector</tt> and <tt>std::iostream</tt>. 79 </p> 80 <h2> 81 <b>Trivial Example #1</b></h2> 82 <p>Create a parser that will parse 83 a floating-point number. 84 </p> 85 <pre><code><font color="#000000"> </font></code><span class="identifier">real_p</span> 86</pre> 87<p> 88 (You've got to admit, that's trivial!) The above code actually generates 89 a Spirit <tt>real_parser</tt> (a built-in parser) which parses a floating 90 point number. Take note that parsers that are meant to be used directly 91 by the user end with "<tt>_p</tt>" in their names as a Spirit convention. 92 Spirit has many pre-defined parsers and consistent naming conventions 93 help you keep from going insane! 94 </p> 95 <h2> 96 <b>Trivial Example #2</b></h2> 97 <p> 98 Create a parser that will accept a line consisting of two floating-point 99 numbers. 100 </p> 101 102<pre><code><font color="#000000"> </font></code><code><span class= 103"identifier">real_p</span> <span class= 104 "special">>></span> <span class="identifier">real_p</span></code> 105</pre> 106<p> 107 Here you see the familiar floating-point numeric parser 108 <code><tt>real_p</tt></code> used twice, once for each number. What's 109 that <tt class="operators">>></tt> operator doing in there? Well, 110 they had to be separated by something, and this was chosen as the 111 "followed by" sequence operator. The above program creates a parser from 112 two simpler parsers, glueing them together with the sequence operator. 113 The result is a parser that is a composition of smaller parsers. 114 Whitespace between numbers can implicitly be consumed depending on how 115 the parser is invoked (see below). 116 </p> 117 <p> 118 Note: when we combine parsers, we end up with a "bigger" parser, But it's 119 still a parser. Parsers can get bigger and bigger, nesting more and more, 120 but whenever you glue two parsers together, you end up with one bigger 121 parser. This is an important concept. 122 </p> 123 <h2> 124 <b>Trivial Example #3</b></h2> 125 <p> 126 Create a parser that will accept an arbitrary number of floating-point 127 numbers. (Arbitrary means anything from zero to infinity) 128 </p> 129 130<pre><code><font color="#000000"> </font></code><code><span class= 131"special">*</span><span class="identifier">real_p</span></code> 132</pre> 133<p> 134 This is like a regular-expression Kleene Star, though the syntax might 135 look a bit odd for a C++ programmer not used to seeing the <tt class= 136 "operators">*</tt> operator overloaded like this. Actually, if you know 137 regular expressions it may look odd too since the star is <b>before</b> 138 the expression it modifies. C'est la vie. Blame it on the fact that we 139 must work with the syntax rules of C++. 140 </p> 141 <p> 142 Any expression that evaluates to a parser may be used with the Kleene 143 Star. Keep in mind, though, that due to C++ operator precedence rules you 144 may need to put the expression in parentheses for complex expressions. 145 The Kleene Star is also known as a Kleene Closure, but we call it the 146 Star in most places. 147 </p> 148 <h3> 149 <b><a name="list_of_numbers"></a> Example #4 [ A Just Slightly Less Trivial Example</b> 150] </h3> 151 <p> 152 This example will create a parser that accepts a comma-delimited list of numbers and put the numbers in a vector. 153</p> 154 <h4><strong> Step 1. Create the parser</strong></h4> 155 <pre><code><font color="#000000"> </font></code><code><span class= 156"identifier">real_p</span> <span class= 157 "special">>></span> <span class="special">*(</span><span class= 158 "identifier">ch_p</span><span class="special">(</span><span class= 159 "literal">','</span><span class="special">)</span> <span class= 160 "special">>></span> <span class= 161 "identifier">real_p</span><span class="special">)</span></code> 162</pre> 163 <p> 164 Notice <tt>ch_p(',')</tt>. It is a literal character parser that can 165 recognize the comma <tt>','</tt>. In this case, the Kleene Star is 166 modifying a more complex parser, namely, the one generated by the 167 expression: 168 </p> 169 170 <pre><code><font color="#000000"> </font></code><code><span class= 171 "special">(</span><span class="identifier">ch_p</span><span class= 172 "special">(</span><span class="literal">','</span><span class= 173 "special">)</span> <span class="special">>></span> <span class= 174 "identifier">real_p</span><span class="special">)</span></code> 175</pre> 176<p> 177 Note that this is a case where the parentheses are necessary. The Kleene 178 star encloses the complete expression above. 179 </p> 180 <h4> 181 <b><strong>Step 2. </strong>Using a Parser (now that it's created)</b></h4> 182 <p> 183 Now that we have created a parser, how do we use it? Like the result of 184 any C++ temporary object, we can either store it in a variable, or call 185 functions directly on it. 186 </p> 187 <p> 188 We'll gloss over some low-level C++ details and just get to the good 189 stuff. 190 </p> 191 <p> 192 If <b><tt>r</tt></b> is a rule (don't worry about what rules exactly are 193 for now. This will be discussed later. Suffice it to say that the rule is 194 a placeholder variable that can hold a parser), then we store the parser 195 as a rule like this: 196 </p> 197 198<pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= 199 "identifier">r</span> <span class="special">=</span> <span class= 200 "identifier">real_p</span> <span class= 201 "special">>> *(</span><span class= 202 "identifier">ch_p</span><span class="special">(</span><span class= 203 "literal">','</span><span class="special">) >></span> <span class= 204 "identifier">real_p</span><span class="special">);</span></font></code> 205</pre> 206<p> 207 Not too exciting, just an assignment like any other C++ expression you've 208 used for years. The cool thing about storing a parser in a rule is this: 209 rules are parsers, and now you can refer to it <b>by name</b>. (In this 210 case the name is <tt><b>r</b></tt>). Notice that this is now a full 211 assignment expression, thus we terminate it with a semicolon, 212 "<tt>;</tt>". 213 </p> 214 <p> 215 That's it. We're done with defining the parser. So the next step is now 216 invoking this parser to do its work. There are a couple of ways to do 217 this. For now, we shall use the free <tt>parse</tt> function that takes 218 in a <tt>char const*</tt>. The function accepts three arguments: 219 </p> 220 <blockquote> 221 <p> 222 <img src="theme/bullet.gif" width="12" height="12"> The null-terminated 223 <tt>const char*</tt> input<br> 224 <img src="theme/bullet.gif" width="12" height="12"> The parser 225 object<br> 226 <img src="theme/bullet.gif" width="12" height="12"> Another parser 227 called the <b>skip parser</b> 228 </p> 229 </blockquote> 230 <p> 231 In our example, we wish to skip spaces and tabs. Another parser named 232 <tt>space_p</tt> is included in Spirit's repertoire of predefined 233 parsers. It is a very simple parser that simply recognizes whitespace. We 234 shall use <tt>space_p</tt> as our skip parser. The skip parser is the one 235 responsible for skipping characters in between parser elements such as 236 the <tt>real_p</tt> and the <tt>ch_p</tt>. 237 </p> 238 <p> 239 Ok, so now let's parse! 240 </p> 241 242<pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= 243"identifier">r</span> <span class="special">=</span> <span class= 244"identifier">real_p</span> <span class= 245 "special">>></span> <span class="special">*(</span><span class= 246 "identifier">ch_p</span><span class="special">(</span><span class= 247 "literal">','</span><span class="special">)</span> <span class= 248 "special">>></span> <span class= 249 "identifier">real_p</span><span class="special">); 250</span> <span class="identifier"> parse</span><span class= 251"special">(</span><span class="identifier">str</span><span class= 252"special">,</span> <span class="identifier">r</span><span class= 253"special">,</span> <span class="identifier">space_p</span><span class= 254"special">)</span> <span class= 255"comment">// Not a full statement yet, patience...</span></font></code> 256</pre> 257<p> 258 The parse function returns an object (called <tt>parse_info</tt>) that 259 holds, among other things, the result of the parse. In this example, we 260 need to know: 261 </p> 262 <blockquote> 263 <p> 264 <img src="theme/bullet.gif" width="12" height="12"> Did the parser 265 successfully recognize the input <tt>str</tt>?<br> 266 <img src="theme/bullet.gif" width="12" height="12"> Did the parser 267 <b>fully</b> parse and consume the input up to its end? 268 </p> 269 </blockquote> 270 <p> 271 To get a complete picture of what we have so far, let us also wrap this 272 parser inside a function: 273 </p> 274 275<pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= 276"keyword">bool 277</span> <span class="identifier"> parse_numbers</span><span class= 278"special">(</span><span class="keyword">char</span> <span class= 279"keyword">const</span><span class="special">*</span> <span class= 280"identifier">str</span><span class="special">) 281 { 282</span> <span class="keyword"> return</span> <span class= 283"identifier">parse</span><span class="special">(</span><span class= 284"identifier">str</span><span class="special">,</span> <span class= 285"identifier">real_p</span> <span class= 286 "special">>></span> <span class="special">*(</span><span class= 287 "literal">','</span> <span class="special">>></span> <span class= 288 "identifier">real_p</span><span class="special">),</span> <span class= 289 "identifier">space_p</span><span class="special">).</span><span class= 290 "identifier">full</span><span class="special">; 291 }</span></font></code> 292</pre> 293<p> 294 Note in this case we dropped the named rule and inlined the parser 295 directly in the call to parse. Upon calling parse, the expression 296 evaluates into a temporary, unnamed parser which is passed into the 297 parse() function, used, and then destroyed. 298 </p> 299 <table border="0" width="80%" align="center"> 300 <tr> 301 <td class="note_box"> 302 <img src="theme/note.gif" width="16" height="16"><b>char and wchar_t 303 operands</b><br> 304 <br> 305 The careful reader may notice that the parser expression has 306 <tt class="quotes">','</tt> instead of <tt>ch_p(',')</tt> as the 307 previous examples did. This is ok due to C++ syntax rules of 308 conversion. There are <tt>>></tt> operators that are overloaded 309 to accept a <tt>char</tt> or <tt>wchar_t</tt> argument on its left or 310 right (but not both). An operator may be overloaded if at least one 311 of its parameters is a user-defined type. In this case, the 312 <tt>real_p</tt> is the 2nd argument to <tt>operator<span class= 313 "operators">>></span></tt>, and so the proper overload of 314 <tt class="operators">>></tt> is used, converting 315 <tt class="quotes">','</tt> into a character literal parser.<br> 316 <br> 317 The problem with omitting the <tt>ch_p</tt> call should be obvious: 318 <tt>'a' >> 'b'</tt> is <b>not</b> a spirit parser, it is a 319 numeric expression, right-shifting the ASCII (or another encoding) 320 value of <tt class="quotes">'a'</tt> by the ASCII value of 321 <tt class="quotes">'b'</tt>. However, both <tt>ch_p('a') >> 322 'b'</tt> and <tt>'a' >> ch_p('b')</tt> are Spirit sequence 323 parsers for the letter <tt class="quotes">'a'</tt> followed by 324 <tt class="quotes">'b'</tt>. You'll get used to it, sooner or 325 later. 326 </td> 327 </tr> 328 </table> 329 <p> 330 Take note that the object returned from the parse function has a member 331 called <tt>full</tt> which returns true if both of our requirements above 332 are met (i.e. the parser fully parsed the input). 333 </p> 334 <h4> 335 <b> Step 3. Semantic Actions</b></h4> 336 <p> 337 Our parser above is really nothing but a recognizer. It answers the 338 question <i class="quotes">"did the input match our grammar?"</i>, but it 339 does not remember any data, nor does it perform any side effects. 340 Remember: we want to put the parsed numbers into a vector. This is done 341 in an <b>action</b> that is linked to a particular parser. For example, 342 whenever we parse a real number, we wish to store the parsed number after 343 a successful match. We now wish to extract information from the parser. 344 Semantic actions do this. Semantic actions may be attached to any point 345 in the grammar specification. These actions are C++ functions or functors 346 that are called whenever a part of the parser successfully recognizes a 347 portion of the input. Say you have a parser <b>P</b>, and a C++ function 348 <b>F</b>, you can make the parser call <b>F</b> whenever it matches an 349 input by attaching <b>F</b>: 350 </p> 351 352<pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= 353"identifier">P</span><span class="special">[&</span><span class= 354"identifier">F</span><span class="special">]</span></font></code> 355</pre> 356<p> 357 Or if <b>F</b> is a function object (a functor): 358 </p> 359 360<pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= 361"identifier">P</span><span class="special">[</span><span class= 362"identifier">F</span><span class="special">]</span></font></code> 363</pre> 364<p> 365 The function/functor signature depends on the type of the parser to which 366 it is attached. The parser <tt>real_p</tt> passes a single argument: the 367 parsed number. Thus, if we were to attach a function <b>F</b> to 368 <tt>real_p</tt>, we need <b>F</b> to be declared as: 369 </p> 370 371<pre><code> </code><code><span class= 372"keyword">void</span> <span class="identifier">F</span><span class= 373"special">(</span><span class="keyword">double</span> <span class= 374"identifier">n</span><span class="special">);</span></code></pre> 375<p> 376 For our example however, again, we can take advantage of some predefined 377 semantic functors and functor generators (<img src="theme/lens.gif" 378 width="15" height="16"> A functor generator is a function that returns 379 a functor). For our purpose, Spirit has a functor generator 380 <tt>push_back_a(c)</tt>. In brief, this semantic action, when called, 381 <b>appends</b> the parsed value it receives from the parser it is 382 attached to, to the container <tt>c</tt>. 383 </p> 384 <p> 385 Finally, here is our complete comma-separated list parser: 386 </p> 387 388<pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= 389"keyword">bool 390</span> <span class="identifier">parse_numbers</span><span class= 391"special">(</span><span class="keyword">char</span> <span class= 392"keyword">const</span><span class="special">*</span> <span class= 393"identifier">str</span><span class="special">,</span> <span class= 394"identifier">vector</span><span class="special"><</span><span class= 395"keyword">double</span><span class= 396 "special">>&</span> <span class="identifier">v</span><span class= 397 "special">) 398 { 399</span> <span class="keyword">return</span> <span class= 400"identifier">parse</span><span class="special">(</span><span class= 401"identifier">str</span><span class="special">, 402 403</span> <span class="comment"> // Begin grammar 404</span> <span class="special"> ( 405</span> <span class="identifier">real_p</span><span class= 406"special">[</span><span class="identifier">push_back_a</span><span class= 407"special">(</span><span class="identifier">v</span><span class= 408"special">)]</span> <span class="special">>></span> <span class= 409"special">*(</span><span class="literal">','</span> <span class= 410"special">>></span> <span class= 411 "identifier">real_p</span><span class="special">[</span><span class= 412 "identifier">push_back_a</span><span class="special">(</span><span class= 413 "identifier">v</span><span class="special">)]) 414 ) 415</span> <span class="special"> , 416</span> <span class="comment"> // End grammar 417 418</span> <span class="identifier"> space_p</span><span class= 419"special">).</span><span class="identifier">full</span><span class="special">; 420 }</span></font></code> 421</pre> 422<p> 423 This is the same parser as above. This time with appropriate semantic 424 actions attached to strategic places to extract the parsed numbers and 425 stuff them in the vector <tt>v</tt>. The parse_numbers function returns 426 true when successful. 427 </p> 428 <p> 429 <img src="theme/lens.gif" width="15" height="16"> The full source code 430 can be <a href="../example/fundamental/number_list.cpp">viewed here</a>. 431 This is part of the Spirit distribution. 432 </p> 433 <table border="0"> 434 <tr> 435 <td width="10"></td> 436 <td width="30"> 437 <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a> 438 </td> 439 <td width="30"> 440 <a href="introduction.html"><img src="theme/l_arr.gif" border="0"> 441 </a> 442 </td> 443 <td width="30"> 444 <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0"> 445 </a> 446 </td> 447 </tr> 448 </table><br> 449 <hr size="1"> 450 <p class="copyright"> 451 Copyright © 1998-2003 Joel de Guzman<br> 452 Copyright © 2002 Chris Uzdavinis<br> 453 <br> 454 <font size="2">Use, modification and distribution is subject to the 455 Boost Software License, Version 1.0. (See accompanying file 456 LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)</font> 457 </p> 458 <blockquote> 459 460 </blockquote> 461 </body> 462</html> 463