1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2<html><head> 3 4<title>Primitives</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 5<link rel="stylesheet" href="theme/style.css" type="text/css"></head> 6<body> 7<table background="theme/bkd2.gif" border="0" cellspacing="2" width="100%"> 8 <tbody><tr> 9 <td width="10"> 10 </td> 11 <td width="85%"> 12 <font face="Verdana, Arial, Helvetica, sans-serif" size="6"><b>Primitives</b></font> 13 </td> 14 <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" align="right" border="0" height="48" width="112"></a></td> 15 </tr> 16</tbody></table> 17<br> 18<table border="0"> 19 <tbody><tr> 20 <td width="10"></td> 21 <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> 22 <td width="30"><a href="organization.html"><img src="theme/l_arr.gif" border="0"></a></td> 23 <td width="30"><a href="operators.html"><img src="theme/r_arr.gif" border="0"></a></td> 24 </tr> 25</tbody></table> 26<p>The framework predefines some parser primitives. These are the most basic building 27 blocks that the client uses to build more complex parsers. These primitive parsers 28 are template classes, making them very flexible.</p> 29<p>These primitive parsers can be instantiated directly or through a templatized 30 helper function. Generally, the helper function is far simpler to deal with 31 as it involves less typing.</p> 32<p>We have seen the character literal parser before through the generator function 33 <tt>ch_p</tt> which is not really a parser but, rather, a parser generator. 34 Class <tt>chlit<CharT></tt> is the actual template class behind the character 35 literal parser. To instantiate a <tt>chlit</tt> object, you must explicitly 36 provide the character type, <tt>CharT</tt>, as a template parameter which determines 37 the type of the character. This type typically corresponds to the input type, 38 usually <tt>char</tt> or <tt>wchar_t</tt>. The following expression creates 39 a temporary parser object which will recognize the single letter <span class="quotes">'X'</span>.</p> 40<pre><code><font color="#000000"><span class="identifier"> </span><span class="identifier">chlit</span><span class="special"><</span><span class="keyword">char</span><span class="special">>(</span><span class="literal">'X'</span><span class="special">);</span></font></code></pre> 41<p>Using <tt>chlit</tt>'s generator function <tt>ch_p</tt> simplifies the usage 42 of the <tt>chlit<></tt> class (this is true of most Spirit parser classes 43 since most have corresponding generator functions). It is convenient to call 44 the function because the compiler will deduce the template type through argument 45 deduction for us. The example above could be expressed less verbosely using 46 the <tt>ch_p </tt>helper function. </p> 47<pre><code><font color="#000000"><span class="special"> </span><span class="identifier">ch_p</span><span class="special">(</span><span class="literal">'X'</span><span class="special">) </span><span class="comment">// equivalent to chlit<char>('X') object</span></font></code></pre> 48<table align="center" border="0" width="80%"> 49 <tbody><tr> 50 <td class="note_box"><img src="theme/lens.gif" height="16" width="15"> <b>Parser 51 generators</b><br> 52 <br> 53 Whenever you see an invocation of the parser generator function, it is equivalent 54 to the parser itself. Therefore, we often call <tt>ch_p</tt> a character 55 parser, even if, technically speaking, it is a function that generates a 56 character parser.</td> 57 </tr> 58</tbody></table> 59<p>The following grammar snippet shows these forms in action:</p> 60<pre><code><span class="comment"> </span><span class="comment">// a rule can "store" a parser object. They're covered<br> </span><span class="comment">// later, but for now just consider a rule as an opaque type<br> </span><span class="identifier">rule</span><span class="special"><> </span><span class="identifier">r1</span><span class="special">, </span><span class="identifier">r2</span><span class="special">, </span><span class="identifier">r3</span><span class="special">;<br><br> </span><span class="identifier">chlit</span><span class="special"><</span><span class="keyword">char</span><span class="special">> </span><span class="identifier">x</span><span class="special">(</span><span class="literal">'X'</span><span class="special">); </span><span class="comment">// declare a parser named x<br><br> </span><span class="identifier">r1 </span><span class="special">= </span><span class="identifier">chlit</span><span class="special"><</span><span class="keyword">char</span><span class="special">>(</span><span class="literal">'X'</span><span class="special">); </span><span class="comment">// explicit declaration<br> </span><span class="identifier">r2 </span><span class="special">= </span><span class="identifier">x</span><span class="special">; </span><span class="comment">// using x<br> </span><span class="identifier">r3 </span><span class="special">= </span><span class="identifier">ch_p</span><span class="special">(</span><span class="literal">'X'</span><span class="special">) </span><span class="comment">// using the generator</span></code></pre> 61<h2> chlit and ch_p</h2> 62<p>Matches a single character literal. <tt>chlit</tt> has a single template type 63 parameter which defaults to <tt>char</tt> (i.e. <tt>chlit<></tt> is equivalent 64 to <tt>chlit<char></tt>). This type parameter is the character type that 65 <tt>chlit</tt> will recognize when parsing. The function generator version deduces 66 the template type parameters from the actual function arguments. The <tt>chlit</tt> 67 class constructor accepts a single parameter: the character it will match the 68 input against. Examples:</p> 69<pre><code><span class="comment"> </span><span class="identifier">r1 </span><span class="special">= </span><span class="identifier">chlit</span><span class="special"><>(</span><span class="literal">'X'</span><span class="special">);<br> </span><span class="identifier">r2 </span><span class="special">= </span><span class="identifier">chlit</span><span class="special"><</span><span class="keyword">wchar_t</span><span class="special">>(</span><span class="identifier">L</span><span class="literal">'X'</span><span class="special">);<br> </span><span class="identifier">r3 </span><span class="special">= </span><span class="identifier">ch_p</span><span class="special">(</span><span class="literal">'X'</span><span class="special">);</span></code></pre> 70<p>Going back to our original example:</p> 71<pre><code><span class="special"> </span><span class="identifier">group </span><span class="special">= </span><span class="literal">'(' </span><span class="special">>> </span><span class="identifier">expr </span><span class="special">>> </span><span class="literal">')'</span><span class="special">;<br> </span><span class="identifier">expr1 </span><span class="special">= </span><span class="identifier">integer </span><span class="special">| </span><span class="identifier">group</span><span class="special">;<br> </span><span class="identifier">expr2 </span><span class="special">= </span><span class="identifier">expr1 </span><span class="special">>> </span><span class="special">*((</span><span class="literal">'*' </span><span class="special">>> </span><span class="identifier">expr1</span><span class="special">) </span><span class="special">| </span><span class="special">(</span><span class="literal">'/' </span><span class="special">>> </span><span class="identifier">expr1</span><span class="special">));<br> </span><span class="identifier">expr </span><span class="special">= </span><span class="identifier">expr2 </span><span class="special">>> </span><span class="special">*((</span><span class="literal">'+' </span><span class="special">>> </span><span class="identifier">expr2</span><span class="special">) </span><span class="special">| </span><span class="special">(</span><span class="literal">'-' </span><span class="special">>> </span><span class="identifier">expr2</span><span class="special">));</span></code></pre> 72<p></p> 73<p>the character literals <tt class="quotes">'('</tt>, <tt class="quotes">')'</tt>, 74 <tt class="quotes">'+'</tt>, <tt class="quotes">'-'</tt>, <tt class="quotes">'*'</tt> 75 and <tt class="quotes">'/'</tt> in the grammar declaration are <tt>chlit</tt> 76 objects that are implicitly created behind the scenes.</p> 77<table align="center" border="0" width="80%"> 78 <tbody><tr> 79 <td class="note_box"><img src="theme/lens.gif" height="16" width="15"> <b>char 80 operands</b> <br> 81 <br> 82 The reason this works is from two special templatized overloads of <tt>operator<span class="operators">>></span></tt> 83 that takes a (<tt>char</tt>, <tt> ParserT</tt>), or (<tt>ParserT</tt>, <tt>char</tt>). 84 These functions convert the character into a <tt>chlit</tt> object.</td> 85 </tr> 86</tbody></table> 87<p> One may prefer to declare these explicitly as:</p> 88<pre><code><span class="special"> </span><span class="identifier">chlit</span><span class="special"><> </span><span class="identifier">plus</span><span class="special">(</span><span class="literal">'+'</span><span class="special">);<br> </span><span class="identifier">chlit</span><span class="special"><> </span><span class="identifier">minus</span><span class="special">(</span><span class="literal">'-'</span><span class="special">);<br> </span><span class="identifier">chlit</span><span class="special"><> </span><span class="identifier">times</span><span class="special">(</span><span class="literal">'*'</span><span class="special">);<br> </span><span class="identifier">chlit</span><span class="special"><> </span><span class="identifier">divide</span><span class="special">(</span><span class="literal">'/'</span><span class="special">);<br> </span><span class="identifier">chlit</span><span class="special"><> </span><span class="identifier">oppar</span><span class="special">(</span><span class="literal">'('</span><span class="special">);<br> </span><span class="identifier">chlit</span><span class="special"><> </span><span class="identifier">clpar</span><span class="special">(</span><span class="literal">')'</span><span class="special">);</span></code></pre> 89<h2>range and range_p</h2> 90<p>A <tt>range</tt> of characters is created from a low/high character pair. Such 91 a parser matches a single character that is in the <tt>range</tt>, including 92 both endpoints. Like <tt>chlit</tt>, <tt>range</tt> has a single template type 93 parameter which defaults to <tt>char</tt>. The <tt>range</tt> class constructor 94 accepts two parameters: the character range (<i>from</i> and <i>to</i>, inclusive) 95 it will match the input against. The function generator version is <tt>range_p</tt>. 96 Examples:</p> 97<pre><code><span class="special"> </span><span class="identifier">range</span><span class="special"><>(</span><span class="literal">'A'</span><span class="special">,</span><span class="literal">'Z'</span><span class="special">) </span><span class="comment">// matches 'A'..'Z'<br> </span><span class="identifier">range_p</span><span class="special">(</span><span class="literal">'a'</span><span class="special">,</span><span class="literal">'z'</span><span class="special">) </span><span class="comment">// matches 'a'..'z'</span></code></pre> 98<p>Note, the first character must be "before" the second, according 99 to the underlying character encoding characters. The range, like chlit is a 100 single character parser.</p> 101<table align="center" border="0" width="80%"> 102 <tbody><tr> 103 <td class="note_box"><img src="theme/alert.gif" height="16" width="16"><b> 104 Character mapping</b><br> 105 <br> 106 Character mapping to is inherently platform dependent. It is not guaranteed 107 in the standard for example that 'A' < 'Z', however, in many occasions, 108 we are well aware of the character set we are using such as ASCII, ISO-8859-1 109 or Unicode. Take care though when porting to another platform.</td> 110 </tr> 111</tbody></table> 112<h2> strlit and str_p</h2> 113<p>This parser matches a string literal. <tt>strlit</tt> has a single template 114 type parameter: an iterator type. Internally, <tt>strlit</tt> holds a begin/end 115 iterator pair pointing to a string or a container of characters. The <tt>strlit</tt> 116 attempts to match the current input stream with this string. The template type 117 parameter defaults to <tt>char const<span class="operators">*</span></tt>. <tt>strlit</tt> 118 has two constructors. The first accepts a null-terminated character pointer. 119 This constructor may be used to build <tt>strlits</tt> from quoted string literals. 120 The second constructor takes in a first/last iterator pair. The function generator 121 version is <tt>str_p</tt>. Examples:</p> 122<pre><code><span class="comment"> </span><span class="identifier">strlit</span><span class="special"><>(</span><span class="string">"Hello World"</span><span class="special">)<br> </span><span class="identifier">str_p</span><span class="special">(</span><span class="string">"Hello World"</span><span class="special">)<br><br> </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string </span><span class="identifier">msg</span><span class="special">(</span><span class="string">"Hello World"</span><span class="special">);<br> </span><span class="identifier">strlit</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">::</span><span class="identifier">const_iterator</span><span class="special">>(</span><span class="identifier">msg</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(), </span><span class="identifier">msg</span><span class="special">.</span><span class="identifier">end</span><span class="special">());</span></code></pre> 123<table align="center" border="0" width="80%"> 124 <tbody><tr> 125 <td class="note_box"><img src="theme/note.gif" height="16" width="16"> <b>Character 126 and phrase level parsing</b><br> 127 <br> 128 Typical parsers regard the processing of characters (symbols that form words 129 or lexemes) and phrases (words that form sentences) as separate domains. 130 Entities such as reserved words, operators, literal strings, numerical constants, 131 etc., which constitute the terminals of a grammar are usually extracted 132 first in a separate lexical analysis stage.<br> 133 <br> 134 At this point, as evident in the examples we have so far, it is important 135 to note that, contrary to standard practice, the Spirit framework handles 136 parsing tasks at both the character level as well as the phrase level. One 137 may consider that a lexical analyzer is seamlessly integrated in the Spirit 138 framework.<br> 139 <br> 140 Although the Spirit parser library does not need a separate lexical analyzer, 141 there is no reason why we cannot have one. One can always have as many parser 142 layers as needed. In theory, one may create a preprocessor, a lexical analyzer 143 and a parser proper, all using the same framework.</td> 144 </tr> 145</tbody></table> 146<h2>chseq and chseq_p</h2> 147<p>Matches a character sequence. <tt>chseq</tt> has the same template type parameters 148 and constructor parameters as strlit. The function generator version is <tt>chseq_p</tt>. 149 Examples:</p> 150<pre><code><span class="special"> </span><span class="identifier">chseq</span><span class="special"><>(</span><span class="string">"ABCDEFG"</span><span class="special">)<br> </span><span class="identifier">chseq_p</span><span class="special">(</span><span class="string">"ABCDEFG"</span><span class="special">)</span></code></pre> 151<p><tt>strlit</tt> is an implicit lexeme. That is, it works solely on the character 152 level. <tt>chseq</tt>, <tt>strlit</tt>'s twin, on the other hand, can work on 153 both the character and phrase levels. What this simply means is that it can 154 ignore white spaces in between the string characters. For example:</p> 155<pre><code><span class="special"> </span><span class="identifier">chseq</span><span class="special"><>(</span><span class="string">"ABCDEFG"</span><span class="special">)</span></code></pre> 156<p>can parse:</p> 157<pre><span class="special"> </span><span class="identifier">ABCDEFG<br> </span><span class="identifier">A </span><span class="identifier">B </span><span class="identifier">C </span><span class="identifier">D </span><span class="identifier">E </span><span class="identifier">F </span><span class="identifier">G<br> </span><span class="identifier">AB </span><span class="identifier">CD </span><span class="identifier">EFG</span></pre> 158<h2>More character parsers</h2> 159<p>The framework also predefines the full repertoire of single character parsers:</p> 160<table align="center" border="0" width="90%"> 161 <tbody><tr> 162 <td class="table_title" colspan="2">Single character parsers</td> 163 </tr> 164 <tr> 165 <td class="table_cells" width="30%"><b>anychar_p</b></td> 166 <td class="table_cells" width="70%">Matches any single character (including 167 the null terminator: '\0')</td> 168 </tr> 169 <tr> 170 <td class="table_cells" width="30%"><b>alnum_p</b></td> 171 <td class="table_cells" width="70%">Matches alpha-numeric characters</td> 172 </tr> 173 <tr> 174 <td class="table_cells" width="30%"><b>alpha_p</b></td> 175 <td class="table_cells" width="70%">Matches alphabetic characters</td> 176 </tr> 177 <tr> 178 <td class="table_cells" width="30%"><b>blank_p</b></td> 179 <td class="table_cells" width="70%">Matches spaces or tabs</td> 180 </tr> 181 <tr> 182 <td class="table_cells" width="30%"><b>cntrl_p</b></td> 183 <td class="table_cells" width="70%">Matches control characters</td> 184 </tr> 185 <tr> 186 <td class="table_cells" width="30%"><b>digit_p</b></td> 187 <td class="table_cells" width="70%">Matches numeric digits</td> 188 </tr> 189 <tr> 190 <td class="table_cells" width="30%"><b>graph_p</b></td> 191 <td class="table_cells" width="70%">Matches non-space printing characters</td> 192 </tr> 193 <tr> 194 <td class="table_cells" width="30%"><b>lower_p</b></td> 195 <td class="table_cells" width="70%">Matches lower case letters</td> 196 </tr> 197 <tr> 198 <td class="table_cells" width="30%"><b>print_p</b></td> 199 <td class="table_cells" width="70%">Matches printable characters</td> 200 </tr> 201 <tr> 202 <td class="table_cells" width="30%"><b>punct_p</b></td> 203 <td class="table_cells" width="70%">Matches punctuation symbols</td> 204 </tr> 205 <tr> 206 <td class="table_cells" width="30%"><b>space_p</b></td> 207 <td class="table_cells" width="70%">Matches spaces, tabs, returns, and newlines</td> 208 </tr> 209 <tr> 210 <td class="table_cells" width="30%"><b>upper_p</b></td> 211 <td class="table_cells" width="70%">Matches upper case letters</td> 212 </tr> 213 <tr> 214 <td class="table_cells" width="30%"><b>xdigit_p</b></td> 215 <td class="table_cells" width="70%">Matches hexadecimal digits</td> 216 </tr> 217</tbody></table> 218<h2><a name="negation"></a>negation ~</h2> 219<p>Single character parsers such as the <tt>chlit</tt>, <tt>range</tt>, <tt>anychar_p</tt>, 220 <tt>alnum_p</tt> etc. can be negated. For example:</p> 221<pre><code><span class="special"> ~</span><span class="identifier">ch_p</span><span class="special">(</span><span class="literal">'x'</span><span class="special">)</span></code></pre> 222<p>matches any character except <tt>'x'</tt>. Double negation of a character parser 223 cancels out the negation. <tt>~~alpha_p</tt> is equivalent to <tt>alpha_p</tt>.</p> 224<h2>eol_p</h2> 225<p>Matches the end of line (CR/LF and combinations thereof).</p> 226<h2><b>nothing_p</b></h2> 227<p>Never matches anything and always fails.</p> 228<h2>end_p</h2> 229<p>Matches the end of input (returns a sucessful match with 0 length when the 230 input is exhausted)</p><h2>eps_p</h2> 231<p>The <strong>Epsilon</strong> (<tt>epsilon_p</tt> and <tt>eps_p</tt>) is a multi-purpose 232 parser that returns a zero length match. See <a href="epsilon.html">Epsilon</a> for details.</p><p></p> 233<table border="0"> 234 <tbody><tr> 235 <td width="10"></td> 236 <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> 237 <td width="30"><a href="organization.html"><img src="theme/l_arr.gif" border="0"></a></td> 238 <td width="30"><a href="operators.html"><img src="theme/r_arr.gif" border="0"></a></td> 239 </tr> 240</tbody></table> 241<br> 242<hr size="1"> 243<p class="copyright">Copyright � 1998-2003 Joel de Guzman<br> 244 Copyright � 2003 Martin Wille<br> 245 <br> 246 <font size="2">Use, modification and distribution is subject to the Boost Software 247 License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at 248 http://www.boost.org/LICENSE_1_0.txt) </font> </p> 249<p> </p> 250</body></html>