1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2 3<html> 4<head> 5 <meta http-equiv="Content-Language" content="en-us"> 6 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 7 <meta name="GENERATOR" content="Microsoft FrontPage 6.0"> 8 <meta name="ProgId" content="FrontPage.Editor.Document"> 9 10 <title>Boost Char Separator</title> 11</head> 12 13<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink= 14"#FF0000"> 15 <p><img src="../../../boost.png" alt="C++ Boost" width="277" height= 16 "86"><br></p> 17 18 <h1>char_separator<Char, Traits></h1> 19 20 <p>The <tt>char_separator</tt> class breaks a sequence of characters into 21 tokens based on character delimiters much in the same way that 22 <tt>strtok()</tt> does (but without all the evils of non-reentrancy and 23 destruction of the input sequence).</p> 24 25 <p>The <tt>char_separator</tt> class is used in conjunction with the 26 <a href="token_iterator.htm"><tt>token_iterator</tt></a> or <a href= 27 "tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.</p> 28 29 <h2>Definitions</h2> 30 31 <p>The <tt>strtok()</tt> function does not include matches with the 32 character delimiters in the output sequence of tokens. However, sometimes 33 it is useful to have the delimiters show up in the output sequence, 34 therefore <tt>char_separator</tt> provides this as an option. We refer to 35 delimiters that show up as output tokens as <b><i>kept delimiters</i></b> 36 and delimiters that do now show up as output tokens as <b><i>dropped 37 delimiters</i></b>.</p> 38 39 <p>When two delimiters appear next to each other in the input sequence, 40 there is the question of whether to output an <b><i>empty token</i></b> or 41 to skip ahead. The behaviour of <tt>strtok()</tt> is to skip ahead. The 42 <tt>char_separator</tt> class provides both options.</p> 43 44 <h2>Examples</h2> 45 46 <p>This first examples shows how to use <tt>char_separator</tt> as a 47 replacement for the <tt>strtok()</tt> function. We've specified three 48 character delimiters, and they will not show up as output tokens. We have 49 not specified any kept delimiters, and by default any empty tokens will be 50 ignored.</p> 51 52 <blockquote> 53 <pre> 54// char_sep_example_1.cpp 55#include <iostream> 56#include <boost/tokenizer.hpp> 57#include <string> 58 59int main() 60{ 61 std::string str = ";;Hello|world||-foo--bar;yow;baz|"; 62 typedef boost::tokenizer<boost::char_separator<char> > 63 tokenizer; 64 boost::char_separator<char> sep("-;|"); 65 tokenizer tokens(str, sep); 66 for (tokenizer::iterator tok_iter = tokens.begin(); 67 tok_iter != tokens.end(); ++tok_iter) 68 std::cout << "<" << *tok_iter << "> "; 69 std::cout << "\n"; 70 return EXIT_SUCCESS; 71} 72</pre> 73 </blockquote>The output is: 74 75 <blockquote> 76 <pre> 77<Hello> <world> <foo> <bar> <yow> <baz> 78</pre> 79 </blockquote> 80 81 <p>The next example shows tokenizing with two dropped delimiters '-' and 82 ';' and a single kept delimiter '|'. We also specify that empty tokens 83 should show up in the output when two delimiters are next to each 84 other.</p> 85 86 <blockquote> 87 <pre> 88// char_sep_example_2.cpp 89#include <iostream> 90#include <boost/tokenizer.hpp> 91#include <string> 92 93int main() 94{ 95 std::string str = ";;Hello|world||-foo--bar;yow;baz|"; 96 typedef boost::tokenizer<boost::char_separator<char> > 97 tokenizer; 98 boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens); 99 tokenizer tokens(str, sep); 100 for (tokenizer::iterator tok_iter = tokens.begin(); 101 tok_iter != tokens.end(); ++tok_iter) 102 std::cout << "<" << *tok_iter << "> "; 103 std::cout << "\n"; 104 return EXIT_SUCCESS; 105} 106</pre> 107 </blockquote>The output is: 108 109 <blockquote> 110 <pre> 111<> <> <Hello> <|> <world> <|> <> <|> <> <foo> <> <bar> <yow> <baz> <|> <> 112</pre> 113 </blockquote> 114 115 <p>The final example shows tokenizing on punctuation and whitespace 116 characters using the default constructor of the 117 <tt>char_separator</tt>.</p> 118 119 <blockquote> 120 <pre> 121// char_sep_example_3.cpp 122#include <iostream> 123#include <boost/tokenizer.hpp> 124#include <string> 125 126int main() 127{ 128 std::string str = "This is, a test"; 129 typedef boost::tokenizer<boost::char_separator<char> > Tok; 130 boost::char_separator<char> sep; // default constructed 131 Tok tok(str, sep); 132 for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter) 133 std::cout << "<" << *tok_iter << "> "; 134 std::cout << "\n"; 135 return EXIT_SUCCESS; 136} 137</pre> 138 </blockquote>The output is: 139 140 <blockquote> 141 <pre> 142<This> <is> <,> <a> <test> 143</pre> 144 </blockquote> 145 146 <h2>Template parameters</h2> 147 148 <table border summary=""> 149 <tr> 150 <th>Parameter</th> 151 152 <th>Description</th> 153 154 <th>Default</th> 155 </tr> 156 157 <tr> 158 <td><tt>Char</tt></td> 159 160 <td>The type of elements within a token, typically <tt>char</tt>.</td> 161 162 <td> </td> 163 </tr> 164 165 <tr> 166 <td><tt>Traits</tt></td> 167 168 <td>The <tt>char_traits</tt> for the character type.</td> 169 170 <td><tt>char_traits<char></tt></td> 171 </tr> 172 </table> 173 174 <h2>Model of</h2><a href="tokenizerfunction.htm">Tokenizer Function</a> 175 176 <h2>Members</h2> 177 <hr> 178 <pre> 179explicit char_separator(const Char* dropped_delims, 180 const Char* kept_delims = "", 181 empty_token_policy empty_tokens = drop_empty_tokens) 182</pre> 183 184 <p>This creates a <tt>char_separator</tt> object, which can then be used to 185 create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> or 186 <a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. The 187 <tt>dropped_delims</tt> and <tt>kept_delims</tt> are strings of characters 188 where each character is used as delimiter during tokenizing. Whenever a 189 delimiter is seen in the input sequence, the current token is finished, and 190 a new token begins. The delimiters in <tt>dropped_delims</tt> do not show 191 up as tokens in the output whereas the delimiters in <tt>kept_delims</tt> 192 do show up as tokens. If <tt>empty_tokens</tt> is 193 <tt>drop_empty_tokens</tt>, then empty tokens will not show up in the 194 output. If <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty 195 tokens will show up in the output.</p> 196 <hr> 197 <pre> 198explicit char_separator() 199</pre> 200 201 <p>The function <tt>std::isspace()</tt> is used to identify dropped 202 delimiters and <tt>std::ispunct()</tt> is used to identify kept delimiters. 203 In addition, empty tokens are dropped.</p> 204 <hr> 205 <pre> 206template <typename InputIterator, typename Token> 207bool operator()(InputIterator& next, InputIterator end, Token& tok) 208</pre> 209 210 <p>This function is called by the <a href= 211 "token_iterator.htm"><tt>token_iterator</tt></a> to perform tokenizing. The 212 user typically does not call this function directly.</p> 213 <hr> 214 215 <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src= 216 "../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional" 217 height="31" width="88"></a></p> 218 219 <p>Revised 220 <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25 221 December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p> 222 223 <p><i>Copyright © 2001-2002 Jeremy Siek and John R. Bandela</i></p> 224 225 <p><i>Distributed under the Boost Software License, Version 1.0. (See 226 accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or 227 copy at <a href= 228 "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> 229</body> 230</html> 231