1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section Rationale] 10 11[heading Naming] 12 13Why use the name "Spirit", "Qi" and "Karma"? Because xpressive names 14have a better spirit, brings qi to your software and will enhance your 15karma so they can heal your (con)fusion and make you wave like a phoenix 16from the ashes. (Joachim Faulhaber) 17 18[heading Type Erasure: From static to dynamic C++] 19 20Rules straddle the border between static and dynamic C++. In effect, a 21rule transforms compile-time polymorphism (using templates) into 22run-time polymorphism (using virtual functions). This is necessary due 23to C++'s inability to automatically declare a variable of a type deduced 24from an arbitrarily complex expression in the right-hand side (rhs) of 25an assignment. Basically, we want to do something like: 26 27 T rule = an_arbitrarily_complex_expression; 28 29without having to know or care about the resulting type of the 30right-hand side (rhs) of the assignment expression. This can be done 31with modern C++ 0x compilers using `auto`: 32 33 auto rule = an_arbitrarily_complex_expression; 34 35Apart from this, we also need a facility to forward declare an unknown 36type: 37 38 T rule; 39 ... 40 rule = a | b; 41 42These limitations lead us to this implementation of rules. This comes at 43the expense of the overhead of a type-erased call which is an indirect 44function call that connot be inlined, once through each invocation of a 45rule. 46 47[heading Multiple declaration] 48 49Some BNF variants allow multiple declarations of a rule. The 50declarations are taken as alternatives. Example: 51 52 r = a; 53 r = b; 54 55is equivalent to: 56 57 r = a | b; 58 59Spirit v1.3 allowed this behavior. However, the current version of 60Spirit no longer allows this because experience shows that this behavior 61leads to unwanted gotchas (for instance, it does not allow rules to be 62held in containers). In the current release of Spirit, a second 63assignment to a rule will simply redefine it. The old definition is 64destructed. This follows more closely C++ semantics and is more in line 65with what the user expects the rule to behave. 66 67[heading Sequencing Syntax] 68 69The comma operator as in `a, b` seems to be a better candidate, 70syntax-wise. But then the problem is with its precedence. It has the 71lowest precedence in C/C++, which makes it virtually useless. 72 73Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000" 74talks about overloading whitespace. Such a feature would allow 75juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b 76| c instead of a >> b | c). Unfortunately, the article was dated April 771, 1998. Oh well. 78 79[heading Forward iterators] 80 81In general, the expected iterator is at least a standard conforming 82forward iterator. Forward iterators are needed for backtracking where 83the iterator needs to be saved and restored later. Generally speaking, 84Spirit is a backtracking parser. The implication of this is that at some 85point, the iterator position will have to be saved to allow the parser 86to backtrack to a previous point. Thus, for backtracking to work, the 87framework requires at least a forward iterator. 88 89There are remedies of course. In cases where we need to use input 90iterators, you can use the __multi_pass__ iterator to make the forward 91iterators. 92 93Some parsers might require more specialized iterators (bi-directional or 94even random access). Perhaps in the future, deterministic parsers when 95added to the framework, will perform no backtracking and will need just 96a single token lookahead, hence will require input iterators only. 97 98[heading Exhaustive backtracking and greedy RD] 99 100Spirit doesn't do exhaustive backtracking like regular expressions are 101expected to. For example: 102 103 *char_('a') >> char_('a'); 104 105will always fail to match because Spirit's Kleene star does not back off 106when the rest of the rule fails to match. 107 108Actually, there's a solution to this greedy RD problem. Such a scheme is 109discussed in section 6.6.2 of Parsing Techniques: A Practical Guide. The 110trick involves passing a tail parser (in addition to the scanner) to 111each parser. The start parser will then simply be: 112 113 start >> end; 114 115(end is the start's tail). 116 117Spirit is greedy --using straight forward, naive RD. It is certainly 118possible to implement the fully backtracking scheme presented above, but 119there will be also certainly be a performance hit. The scheme will 120always try to match all possible parser paths (full parser hierarchy 121traversal) until it reaches a point of certainty, that the whole thing 122matches or fails to match. 123 124[:Backtracking and Greedy RD 125 126Spirit is quite consistent and intuitive about when it backtracks and to 127where, although it may not be obvious to those coming from different 128backgrounds. In general, any (sub)parser will, given the same input, 129always match the same portion of the input (or fail to match the input 130at all). This means that Spirit is inherently greedy. Spirit will only 131backtrack when a (sub)parser fails to match the input, and it will 132always backtrack to the next choice point upward (not backward) in the 133parser structure. In other words abb|ab will match `"ab"`, as will 134`a(bb|b)`, but `(ab|a)b` won't because the `(ab|a)` subparser will 135always match the `'b'` after the `'a'` if it is available. 136 137--Rainer Deyke] 138 139This is the very nature of __peg__. 140 141There's a strong preference on "simplicity with all the knobs when you 142need them" approach, right now. On the other hand, the flexibility of 143Spirit makes it possible to have different optional schemes available. 144It might be possible to implement an exhaustive backtracking RD scheme 145as an optional feature in the future. 146 147[endsect] 148