1<html> 2<head> 3<title>Character Sets</title> 4<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 5<link rel="stylesheet" href="theme/style.css" type="text/css"> 6</head> 7 8<body> 9<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> 10 <tr> 11 <td width="10"> 12 </td> 13 <td width="85%"> 14 <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Character Sets</b></font> 15 </td> 16 <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> 17 </tr> 18</table> 19<br> 20<table border="0"> 21 <tr> 22 <td width="10"></td> 23 <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> 24 <td width="30"><a href="loops.html"><img src="theme/l_arr.gif" border="0"></a></td> 25 <td width="30"><a href="confix.html"><img src="theme/r_arr.gif" border="0"></a></td> 26 </tr> 27</table> 28<p>The character set <tt>chset</tt> matches a set of characters over a finite 29 range bounded by the limits of its template parameter <tt>CharT</tt>. This class 30 is an optimization of a parser that acts on a set of single characters. The 31 template class is parameterized by the character type <tt>CharT</tt> and can 32 work efficiently with 8, 16 and 32 and even 64 bit characters.</p> 33<pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>CharT </span><span class=special>= </span><span class=keyword>char</span><span class=special>> 34 </span><span class=keyword>class </span><span class=identifier>chset</span><span class=special>;</span></pre> 35<p>The <tt>chset</tt> is constructed from literals (e.g. <tt>'x'</tt>), <tt>ch_p</tt> 36 or <tt>chlit<></tt>, <tt>range_p</tt> or <tt>range<></tt>, <tt>anychar_p</tt> 37 and <tt>nothing_p</tt> (see <a href="primitives.html">primitives</a>) or copy-constructed 38 from another <tt>chset</tt>. The <tt>chset</tt> class uses a copy-on-write scheme 39 that enables instances to be passed along easily by value.</p> 40<table width="80%" border="0" align="center"> 41 <tr> 42 <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Sparse 43 bit vectors</b><br> 44 <br> 45 To accommodate 16/32 and 64 bit characters, the <tt>chset</tt> class 46 statically switches from a <tt>std::bitset</tt> implementation when the 47 character type is not greater than 8 bits, to a sparse bit/boolean set which 48 uses a sorted vector of disjoint ranges (<tt>range_run</tt>). The set is 49 constructed from ranges such that adjacent or overlapping ranges are coalesced.<br> 50 <br> 51 range_runs are very space-economical in situations where there are lots 52 of ranges and a few individual disjoint values. Searching is O(log n) where 53 n is the number of ranges.</td> 54 </tr> 55</table> 56<p> Examples:<br> 57</p> 58<pre><span class=identifier> </span><span class=identifier>chset</span><span class=special><> </span><span class=identifier>s1</span><span class=special>(</span><span class=literal>'x'</span><span class=special>); 59 </span><span class=identifier>chset</span><span class=special><> </span><span class=identifier>s2</span><span class=special>(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>s1</span><span class=special>);</span></pre> 60<p>Optionally, character sets may also be constructed using a definition string 61 following a syntax that resembles posix style regular expression character sets, 62 except that double quotes delimit the set elements instead of square brackets 63 and there is no special negation <tt>^</tt> character.</p> 64<pre> <span class=identifier>range </span><span class=special>= </span><span class=identifier>anychar_p </span><span class=special>>> </span><span class=literal>'-' </span><span class=special>>> </span><span class=identifier>anychar_p</span><span class=special>; 65 </span><span class=identifier>set </span><span class=special>= *(</span><span class=identifier>range_p </span><span class=special>| </span><span class=identifier>anychar_p</span><span class=special>);</span></pre> 66<p>Since we are defining the set using a C string, the usual C/C++ literal string 67 syntax rules apply. Examples:<br> 68</p> 69<pre> <span class=identifier>chset</span><span class=special><> </span><span class=identifier>s1</span><span class=special>(</span><span class=string>"a-zA-Z"</span><span class=special>); </span><span class=comment>// alphabetic characters 70 </span><span class=identifier>chset</span><span class=special><> </span><span class=identifier>s2</span><span class=special>(</span><span class=string>"0-9a-fA-F"</span><span class=special>); </span><span class=comment>// hexadecimal characters 71 </span><span class=identifier>chset</span><span class=special><> </span><span class=identifier>s3</span><span class=special>(</span><span class=string>"actgACTG"</span><span class=special>); </span><span class=comment>// DNA identifiers 72 </span><span class=identifier>chset</span><span class=special><> </span><span class=identifier>s4</span><span class=special>(</span><span class=string>"\x7f\x7e"</span><span class=special>); </span><span class=comment>// Hexadecimal 0x7F and 0x7E</span></pre> 73<p>The standard Spirit set operators apply (see <a href="operators.html">operators</a>) 74 plus an additional character-set-specific inverse (negation <tt>~</tt>) operator:<span class=comment></span></p> 75 76<table width="90%" border="0" align="center"> 77 <tr> 78 <td class="table_title" colspan="2">Character set operators</td> 79 </tr> 80 <tr> 81 <td class="table_cells" width="28%"><b>~a</b></td> 82 <td class="table_cells" width="72%">Set inverse</td> 83 </tr> 84 <tr> 85 <td class="table_cells" width="28%"><b>a | b</b></td> 86 <td class="table_cells" width="72%">Set union</td> 87 </tr> 88 <tr> 89 <td class="table_cells" width="28%"><b>a & </b></td> 90 <td class="table_cells" width="72%">Set intersection</td> 91 </tr> 92 <tr> 93 <td class="table_cells" width="28%"><b>a - b</b></td> 94 <td class="table_cells" width="72%">Set difference</td> 95 </tr> 96 <tr> 97 <td class="table_cells" width="28%"><b>a ^ b</b></td> 98 <td class="table_cells" width="72%">Set xor</td> 99 </tr> 100</table> 101<p></p> 102<p></p> 103<p></p> 104<p></p> 105<p></p> 106<p></p> 107<p></p> 108<p></p> 109<p>where operands a and b are both <tt>chsets</tt> or one of the operand is either 110 a literal character, <tt>ch_p</tt> or <tt>chlit</tt>, <tt>range_p</tt> or <tt>range</tt>, 111 <tt>anychar_p</tt> or <tt>nothing_p</tt>. Special optimized overloads are provided 112 for <tt>anychar_p</tt> and <tt>nothing_p</tt> operands. A <tt>nothing_p</tt> 113 operand is converted to an empty set, while an <tt>anychar_p</tt> operand is 114 converted to a set having elements of the full range of the character type used 115 (e.g. 0-255 for unsigned 8 bit chars).</p> 116<p>A special case is <tt>~anychar_p</tt> which yields <tt>nothing_p</tt>, but 117 <tt>~nothing_p</tt> is illegal. Inversion of <tt>anychar_p</tt> is asymmetrical, 118 a one-way trip comparable to converting <tt>T*</tt> to a <tt>void*.</tt></p> 119<table width="90%" border="0" align="center"> 120 <tr> 121 <td class="table_title" colspan="2">Special conversions</td> 122 </tr> 123 <tr> 124 <td class="table_cells" width="28%"><b>chset<CharT>(nothing_p)</b></td> 125 <td class="table_cells" width="72%">empty set</td> 126 </tr> 127 <tr> 128 <td class="table_cells" width="28%"><b>chset<CharT>(anychar_p)</b></td> 129 <td class="table_cells" width="72%">full range of CharT (e.g. 0-255 for unsigned 130 8 bit chars)</td> 131 </tr> 132 <tr> 133 <td class="table_cells" width="28%"><b>~anychar_p</b></td> 134 <td class="table_cells" width="72%">nothing_p</td> 135 </tr> 136 <tr> 137 <td class="table_cells" width="28%"><b>~nothing_p</b></td> 138 <td class="table_cells" width="72%">illegal</td> 139 </tr> 140</table> 141 142<p></p><table border="0"> 143 <tr> 144 <td width="10"></td> 145 <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> 146 <td width="30"><a href="loops.html"><img src="theme/l_arr.gif" border="0"></a></td> 147 <td width="30"><a href="confix.html"><img src="theme/r_arr.gif" border="0"></a></td> 148 </tr> 149</table> 150<br> 151<hr size="1"> 152<p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> 153 <br> 154<font size="2">Use, modification and distribution is subject to the Boost Software 155 License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at 156 http://www.boost.org/LICENSE_1_0.txt) </font> </p> 157</body> 158</html> 159