1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2<html> 3 <head> 4 <meta content= 5 "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org" 6 name="generator"> 7 <title> 8 Basic Concepts 9 </title> 10 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 11 <link rel="stylesheet" href="theme/style.css" type="text/css"> 12 </head> 13 <body> 14 <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> 15 <tr> 16 <td width="10"></td> 17 <td width="85%"> 18 <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Basic 19 Concepts</b></font> 20 </td> 21 <td width="112"> 22 <a href="http://spirit.sf.net"><img src="theme/spirit.gif" 23 width="112" height="48" align="right" border="0"></a> 24 </td> 25 </tr> 26 </table><br> 27 <table border="0"> 28 <tr> 29 <td width="10"></td> 30 <td width="30"> 31 <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a> 32 </td> 33 <td width="30"> 34 <a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a> 35 </td> 36 <td width="30"> 37 <a href="organization.html"><img src="theme/r_arr.gif" border="0"> 38 </a> 39 </td> 40 </tr> 41 </table> 42 <p> 43 There are a few fundamental concepts that need to be understood well: 1) 44 The <strong>Parser</strong>, 2) <strong>Match</strong>, 3) The 45 <strong>Scanner</strong>, and 4) <strong>Semantic Actions</strong>. These 46 basic concepts interact with one another, and the functionalities of each 47 interweave throughout the framework to make it one coherent whole. 48 </p> 49 <table width="48%" border="0" align="center"> 50 <tr> 51 <td height="211"> 52 <img src="theme/intro1.png"> 53 </td> 54 </tr> 55 </table> 56 <h2> 57 The Parser 58 </h2> 59 <p> 60 Central to the framework is the parser. The parser does the actual work 61 of recognizing a linear input stream of data read sequentially from start 62 to end by the scanner. The parser attempts to match the input following a 63 well-defined set of specifications known as grammar rules. The parser 64 reports the success or failure to its client through a match object. When 65 successful, the parser calls a client-supplied semantic action. Finally, 66 the semantic action extracts structural information depending on the data 67 passed by the parser and the hierarchical context of the parser it is 68 attached to. 69 </p> 70 <p> 71 Parsers come in different flavors. The Spirit framework comes bundled 72 with an extensive set of pre-defined parsers that perform various parsing 73 tasks from the trivial to the complex. The parser, as a concept, has a 74 public conceptual interface contract. Following the contract, anyone can 75 write a conforming parser that will play along well with the framework's 76 predefined components. We shall provide a blueprint detailing the 77 conceptual interface of the parser later. 78 </p> 79 <p> 80 Clients of the framework generally do not need to write their own 81 hand-coded parsers at all. Spirit has an immense repertoire of 82 pre-defined parsers covering all aspects of syntax and semantic analysis. 83 We shall examine this repertoire of parsers in the following sections. In 84 the rare case where a specific functionality is not available, it is 85 extremely easy to write a user-defined parser. The ease in writing a 86 parser entity is the main reason for Spirit's extensibility. 87 </p> 88 <h2> 89 Primitives and Composites 90 </h2> 91 <p> 92 Spirit parsers fall into two categories: <b>primitives</b> and 93 <b>composites</b>. These two categories are more or less synonymous to 94 terminals and non-terminals in parsing lingo. Primitives are 95 non-decomposable atomic units. Composites on the other hand are parsers 96 that are composed of other parsers which can in turn be a primitive or 97 another composite. To illustrate, consider the Spirit expression: 98 </p> 99 100<pre><code><font color="#000000"> </font></code><code><span class="identifier">real_p</span> <span class= 101"special">>></span> <span class="special">*(</span><span class= 102"literal">','</span> <span class="special">>></span> <span class= 103"identifier">real_p</span><span class="special">)</span></code> 104</pre> 105<p> 106 <tt><tt>real_p</tt></tt> is a primitive parser that can parse real 107 numbers. The quoted comma <tt class="quotes">','</tt> in the expression 108 is a shortcut and is equivalent to <tt>ch_p<span class= 109 "operators">(</span><span class="quotes">','</span><span class= 110 "operators">)</span></tt>, which is another primitive parser that 111 recognizes single characters. 112 </p> 113 <p> 114 The expression above corresponds to the following parse tree: 115 </p> 116 <table width="29%" border="0" align="center"> 117 <tr> 118 <td> 119 <img src="theme/intro7.png"> 120 </td> 121 </tr> 122 </table> 123 <p> 124 The expression: 125 </p> 126 127<pre><code><font color="#000000"> </font></code><span class= 128 "literal">','</span> <span class="special">>></span> <span class= 129 "identifier">real_p</span> 130</pre> 131<p> 132 composes a <b>sequence</b> parser. The <tt>sequence</tt> parser is a 133 composite parser comprising two parsers: the one on its left hand side 134 (lhs), <tt>ch_p<span class="operators">(</span><span class= 135 "quotes">','</span><span class="operators">)</span></tt> ; and the other 136 on its right hand side (rhs), <tt>real_p</tt>. This composite parser, 137 when called, calls its lhs and rhs in sequence and reports a successful 138 match only if both are successful. 139 </p> 140 <table width="14%" border="0" align="center"> 141 <tr> 142 <td> 143 <img src="theme/intro2.png"> 144 </td> 145 </tr> 146 </table> 147 <p> 148 The <tt>sequence</tt> parser is a binary composite. It is composed of two 149 parsers. There are unary composites as well. Unary composites hold only a 150 single subject. Like the binary composite, the unary composite may change 151 the behavior of its embedded subject. One particular example is the 152 <b>Kleene star</b>. The Kleene star, when called to parse, calls its sole 153 subject zero or more times. "Zero or more" implies that the Kleene star 154 always returns a successful match, possibly matching the null string: "". 155 </p> 156 <p> 157 The expression: 158 </p> 159 160<pre><code><font color="#000000"> </font></code><code><span class= 161"special">*(</span><span class="literal">','</span> <span class= 162"special">>></span> <span class= 163 "identifier">real_p</span><span class="special">)</span></code> 164</pre> 165 <p> 166 wraps the whole sequence composite above inside a <tt>kleene_star</tt>. 167 </p> 168 <table width="17%" border="0" align="center"> 169 <tr> 170 <td> 171 <img src="theme/intro3.png"> 172 </td> 173 </tr> 174 </table> 175 <p> 176 Finally, the full expression composes a <tt>real_p</tt> primitive parser 177 and the <tt>kleene_star</tt> we have above into another higher level 178 <tt>sequence</tt> parser composite. 179 </p> 180 <table width="34%" border="0" align="center"> 181 <tr> 182 <td> 183 <img src="theme/intro4.png"> 184 </td> 185 </tr> 186 </table> 187 <p> 188 A few simple classes, when composed and structured in a hierarchy, form a 189 very powerful object-oriented recursive-descent parsing engine. These 190 classes provide the infrastructure needed for the construction of 191 more-complex parsers. The final parser composite is a non-deterministic 192 recursive-descent parser with infinite look-ahead. 193 </p> 194 <p> 195 Top-down descent traverses the hierarchy. The outer <tt>sequence</tt> 196 calls the leftmost <tt>real_p</tt> parser. If successful, the 197 <tt>kleene_star</tt> is called next. The <tt>kleene_star</tt> calls the 198 inner <tt>sequence</tt> repeatedly in a loop until it fails to match, or 199 the input is exhausted. Inside, <tt>ch_p(',')</tt> and then 200 <tt>real_p</tt> are called in sequence. The following diagram illustrates 201 what is happening, somewhat reminiscent of Pascal syntax diagrams. 202 </p> 203 <table width="37%" border="0" align="center"> 204 <tr> 205 <td> 206 <img src="theme/intro5.png"> 207 </td> 208 </tr> 209 </table> 210 <p> 211 The flexibility of object embedding and composition combined with 212 recursion opens up a unique approach to parsing. Subclasses are free to 213 form aggregates and algorithms of arbitrary complexity. Complex parsers 214 can be created with the composition of only a few primitive classes. 215 </p> 216 <p> 217 The framework is designed to be fully open-ended and extensible. New 218 primitives or composites, from the trivial to the complex, may be added 219 any time. Composition happens (statically) at compile time. This is 220 possible through the expressive flexibility of C++ expression templates 221 and template meta-programming. 222 </p> 223 <p> 224 The result is a composite composed of primitives and smaller composites. 225 This embedding strategy gives us the ability to build hierarchical 226 structures that fully model EBNF expressions of arbitrary complexity. 227 Later on, we shall see more primitive and composite building blocks. 228 </p> 229 <h2> 230 The Scanner 231 </h2> 232 <p> 233 Like the parser, the scanner is also an abstract concept. The task of the 234 scanner is to feed the sequential input data stream to the parser. The 235 scanner is composed of two STL conforming forward iterators, first and 236 last, where first is held by reference and last, by value. The first 237 iterator is held by reference to allow re-positioning by the parser. A 238 set of policies governs how the scanner behaves. Parsers extract data 239 from the scanner and position the iterator appropriately through its 240 member functions. 241 </p> 242 <p> 243 Knowledge of the intricacies of these policies is not required at all in 244 most cases. However, knowledge of the scanner's basic API is required to 245 write fully-conforming Spirit parsers. The scanner's API will be outlined 246 in a separate section. In addition, for the power users and the 247 adventurous among us, a full section will be devoted to covering the 248 scanner policies. The scanner policies make Spirit very flexible and 249 extensible. For instance, some of the policies may be modified to filter 250 data. A practical example is a scanner policy that does not distinguish 251 upper and lower case whereby making it useful for parsing case 252 insensitive input. Another example is a scanner policy that strips white 253 spaces from the input. 254 </p> 255 <h2> 256 The Match 257 </h2> 258 <p> 259 The parser has a conceptual parse member function taking in a scanner and 260 returning a match object. The primary function of the match object is to 261 report parsing success (or failure) back to the parser's caller; i.e., it 262 evaluates to true if the parse function is successful, false otherwise. 263 If the parse is successful, the match object may also be queried to 264 report the number of characters matched (using <tt>match.length()</tt>). 265 The length is non-negative if the match is successful, and the typical 266 length of a parse failure is -1. A zero length is perfectly valid and 267 still represents a successful match. 268 </p> 269 <p> 270 Parsers may have attribute data associated with it. For example, the 271 real_p parser has a numeric datum associated with it. This attribute is 272 the parsed number. This attribute is passed on to the returned match 273 object. The match object may be queried to get this attribute. This datum 274 is valid only when the match is successful. 275 </p> 276 <h2> 277 Semantic Actions 278 </h2> 279 <p> 280 A composite parser forms a hierarchy. Parsing proceeds from the topmost 281 parent parser which delegates and apportions the parsing task to its 282 children recursively to its children's children and so on until a 283 primitive is reached. By attaching semantic actions to various points in 284 this hierarchy, in effect we can transform the flat linear input stream 285 into a structured representation. This is essentially what parsers do. 286 </p> 287 <p> 288 Recall our example above: 289 </p> 290 291<pre><code><font color="#000000"> </font></code><code><span class= 292"identifier">real_p</span> <span class= 293 "special">>></span> <span class="special">*(</span><span class= 294 "literal">','</span> <span class="special">>></span> <span class= 295 "identifier">real_p</span><span class="special">)</span></code> 296</pre> 297<p> 298 By hooking a function (or functor) into the real_p parsers, we can 299 extract the numbers from the input: 300 </p> 301 302<pre><code><font color="#000000"> </font></code><span class= 303"identifier">real_p</span><span class="special">[&</span><span class= 304"identifier">f</span><span class="special">]</span> <span class= 305"special">>></span> <span class="special">*(</span><span class= 306"literal">','</span> <span class="special">>></span> <span class= 307"identifier">real_p</span><span class="special">[&</span><span class= 308"identifier">f</span><span class="special">])</span> 309</pre> 310<table width="41%" border="0" align="center"> 311 <tr> 312 <td> 313 <img src="theme/intro6.png"> 314 </td> 315 </tr> 316 </table> 317 318<p> where <tt>f</tt> is a function that takes in a single argument. The <tt><span class="operators">[&</span>f<span class= 319 "operators">]</span></tt> hooks the parser with the function such that when 320 <tt>real_p</tt> recognizes a valid number, the function <tt>f</tt> is called. 321 It is up to the function then to do what is appropriate. For example, it can 322 stuff the numbers in a vector. Or perhaps, if the grammar is changed slightly 323 by replacing <tt class="quotes">','</tt> with <tt class="quotes">'+'</tt>, then 324 we have a primitive calculator that computes sums. The function <tt>f</tt> then 325 can then be made to add all incoming numbers.<br> 326</p> 327<table border="0"> 328 <tr> 329 <td width="10"></td> 330 <td width="30"> 331 <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a> 332 </td> 333 <td width="30"> 334 <a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a> 335 </td> 336 <td width="30"> 337 <a href="organization.html"><img src="theme/r_arr.gif" border="0"> 338 </a> 339 </td> 340 </tr> 341 </table><br> 342 <hr size="1"> 343 <p class="copyright"> 344 Copyright © 1998-2003 Joel de Guzman<br> 345 <br> 346 <font size="2">Use, modification and distribution is subject to the Boost 347 Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or 348 copy at http://www.boost.org/LICENSE_1_0.txt)</font> 349 </p> 350 <p> 351 352 </p> 353 </body> 354</html> 355