1<html> 2<head> 3<title>Rationale</title> 4<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 5<link rel="stylesheet" href="theme/style.css" type="text/css"> 6</head> 7 8<body> 9<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> 10 <tr> 11 <td width="10"> 12 </td> 13 <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Rationale</b></font></td> 14 <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> 15 </tr> 16</table> 17<br> 18<table border="0"> 19 <tr> 20 <td width="10"></td> 21 <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> 22 <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td> 23 <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td> 24 </tr> 25</table> 26<p><img src="theme/lens.gif" width="15" height="16"> <strong>Virtual functions: 27 From static to dynamic C++</strong></p> 28<p>Rules straddle the border between static and dynamic C++. In effect, a rule 29 transforms compile-time polymorphism (using templates) into run-time polymorphism 30 (using virtual functions). This is necessary due to C++'s inability to automatically 31 declare a variable of a type deduced from an arbitrarily complex expression 32 in the right-hand side (rhs) of an assignment. Basically, we want to do something 33 like:</p> 34<pre><code><font color="#000000"> <span class=identifier>T </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>an_arbitrarily_complex_expression</span><span class=special>;</span></font></code></pre> 35<p>without having to know or care about the resulting type of the right-hand side 36 (rhs) of the assignment expression. Apart from this, we also need a facility 37 to forward declare an unknown type:</p> 38<pre><code><font color="#000000"><span class=special> </span><span class=identifier>T </span><span class=identifier>rule</span><span class=special>; 39 </span><span class=special>... 40 </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></font></code></pre> 41<p>These limitations lead us to this implementation of rules. This comes at the 42 expense of the overhead of a virtual function call, once through each invocation 43 of a rule.</p> 44<p><img src="theme/lens.gif" width="15" height="16"> <strong>Multiple declaration 45 </strong> </p> 46<p>Some BNF variants allow multiple declarations of a <tt>rule</tt>. The declarations 47 are taken as alternatives. Example:</p> 48<pre> 49 <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a</span><span class=special>; </span><span class=identifier> 50 r </span><span class=special>= </span><span class=identifier>b</span><span class=special>;</span></code></pre> 51<p> is equivalent to: </p> 52<pre> 53 <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></code></pre> 54<p>Spirit v1.3 allowed this behavior. However, the current version of Spirit <b>no 55 longer</b> allows this because experience shows that this behavior leads to 56 unwanted gotchas (for instance, it does not allow rules to be held in containers). 57 In the current release of Spirit, a second assignment to a rule will simply 58 redefine it. The old definition is destructed. This follows more closely C++ 59 semantics and is more in line with what the user expects the rule to behave.</p> 60<p><img src="theme/lens.gif" width="15" height="16"> <b>Sequencing Syntax</b> 61 <br> 62 <br> 63 The comma operator as in a, b seems to be a better candidate, syntax-wise. But 64 then the problem is with its precedence. It has the lowest precedence in C/C++, 65 which makes it virtually useless. <br> 66 <br> 67 Bjarne Stroustrup, in his article <a href="references.html#generalized_overloading">"Generalizing 68 Overloading for C++2000"</a> talks about overloading whitespace. Such a 69 feature would allow juxtapositioning of parser objects exactly as we do in (E)BNF 70 (e.g. a b | c instead of a >> b | c). Unfortunately, the article was dated 71 April 1, 1998. Oh well.</p> 72<p><img src="theme/lens.gif" width="15" height="16"> <b>Forward iterators</b><br> 73 <br> 74 In general, the scanner expects at least a standard conforming forward iterator. 75 Forward iterators are needed for backtracking where the iterator needs to be 76 saved and restored later. Generally speaking, Spirit is a backtracking parser. 77 The implication of this is that at some point, the iterator position will have 78 to be saved to allow the parser to backtrack to a previous point. Thus, for 79 backtracking to work, the framework requires at least a forward iterator.<br> 80 <br> 81 Some parsers might require more specialized iterators (bi-directional or even 82 random access). Perhaps in the future, deterministic parsers when added to the 83 framework, will perform no backtracking and will need just a single token lookahead, 84 hence will require input iterators only.</p> 85<p><img src="theme/lens.gif" width="15" height="16"><b> Why are subrules important?</b><br> 86 <br> 87 Subrules open up the opportunity to do aggressive meta programming as well because 88 they do not rely on virtual functions. The virtual function is the meta-programmer's 89 hell. Not only does it slow down the program due to the virtual function indirect 90 call, it is also an opaque wall where no metaprogram can get past. It kills 91 all meta-information beyond the virtual function call. Worse, the virtual function 92 cannot be templated. Which means that its arguments have to be tied to a actual 93 types. Many problems stem from this limitation. <br> 94 <br> 95 While Spirit is a currently classified as a non-deterministic recursive-descent 96 parser, Doug Gregor first noted that other parsing techniques apart from top-down 97 recursive descent may be applied. For instance, apart from non-deterministic 98 recursive descent, deterministic LL(1) and LR(1) can theoretically be implemented 99 using the same expression template front end. Spirit rules use virtual functions 100 to encode the RHS parser expression in an opaque abstract parser type. While 101 it serves its purpose well, the rule's virtual functions are the stumbling blocks 102 to more advanced metaprogramming. Subrules are free from virtual functions.</p> 103<p><img src="theme/lens.gif" width="15" height="16"><b> <a name="exhaustive_rd"></a>Exhaustive 104 backtracking and greedy RD</b></p> 105<p>Spirit doesn't do exhaustive backtracking like regular expressions are expected 106 to. For example:</p> 107<pre> <span class="special">*</span>chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">) >></span> chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">);</span></pre> 108<p>will always fail to match because Spirit's Kleene star does not back off when 109 the rest of the rule fails to match. </p> 110<p>Actually, there's a solution to this greedy RD problem. Such a scheme is discussed 111 in section 6.6.2 of <a 112href="http://www.cs.vu.nl/%7Edick/PTAPG.html">Parsing Techniques: A Practical 113 Guide</a>. The trick involves passing a <em>tail</em> parser (in addition to 114 the scanner) to each parser. The start parser will then simply be: <tt>start 115 >> end_p;</tt> (end_p is the start's tail). </p> 116<p>Spirit is greedy --using straight forward, naive RD. It is certainly possible 117 to implement the fully backtracking scheme presented above, but there will be 118 also certainly be a performance hit. The scheme will always try to match all 119 possible parser paths (full parser hierarchy traversal) until it reaches a point 120 of certainty, that the whole thing matches or fails to match. </p> 121<table border="0" width="80%" align="center"> 122 <tr> 123 <td class="note_box"><p><img src="theme/note.gif" width="16" height="16"><b>Backtracking 124 and Greedy RD </b><br> 125 <br> 126 Spirit is quite consistent and intuitive about when it backtracks and 127 to where, although it may not be obvious to those coming from different 128 backgrounds. In general, any (sub)parser will, given the same input, always 129 match the same portion of the input (or fail to match the input at all). 130 This means that Spirit is inherently greedy. Spirit will only backtrack 131 when a (sub)parser fails to match the input, and it will always backtrack 132 to the next choice point upward (not backward) in the parser structure. 133 In other words abb|ab will match "ab", as will a(bb|b), but 134 (ab|a)b won't because the (ab|a) subparser will always match the 'b' after 135 the 'a' if it is available.</p> 136 <p>--Rainer Deyke</p> 137 </td> 138 </tr> 139</table> 140<p>There's a strong preference on "simplicity with all the knobs when you 141 need them" approach, right now. On the other hand, the flexibility of Spirit 142 makes it possible to have different optional schemes available. It might be 143 possible to implement an exhaustive backtracking RD scheme as an optional feature 144 in the future. </p> 145<table border="0"> 146 <tr> 147 <td width="10"></td> 148 <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> 149 <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td> 150 <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td> 151 </tr> 152</table> 153<br> 154<hr size="1"> 155<p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> 156 <br> 157 <font size="2">Use, modification and distribution is subject to the Boost Software 158 License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at 159 http://www.boost.org/LICENSE_1_0.txt)</font></p> 160<p class="copyright"> </p> 161</body> 162</html> 163