1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2 3<html> 4<head> 5 <meta http-equiv="Content-Language" content="en-us"> 6 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 7 8 <title>Type-safe 'printf-like' format class</title> 9</head> 10 11<body bgcolor="#FFFFFF" text="#000000"> 12 <h1><img align="middle" alt="boost.png (6897 bytes)" height="86" src= 13 "../../../boost.png" width="277">Type-safe 'printf-like' <b>format 14 class</b></h1> 15 16 <h2>Choices made</h2> 17 18 <p>"Le pourquoi du comment" ( - "the why of the how")</p> 19 <hr> 20 21 <h3>The syntax of the format-string</h3> 22 23 <p>Format is a new library. One of its goal is to provide a replacement for 24 printf, that means format can parse a format-string designed for printf, 25 apply it to the given arguments, and produce the same result as printf 26 would have.<br> 27 With this constraint, there were roughly 3 possible choices for the syntax 28 of the format-string :</p> 29 30 <ol> 31 <li>Use the exact same syntax of printf. It's well known by many 32 experienced users, and fits almost all needs. But with C++ streams, the 33 type-conversion character, crucial to determine the end of a directive, 34 is only useful to set some associated formatting options, in a C++ 35 streams context (%x for setting hexa, etc..) It would be better to make 36 this obligatory type-conversion character, with modified meaning, 37 optional.</li> 38 39 <li>extend printf syntax while maintaining compatibility, by using 40 characters and constructs not yet valid as printf syntax. e.g. : "%1%", 41 "%[1]", "%|1$d|", .. Using begin / end marks, all sort of extension can 42 be considered.</li> 43 44 <li>Provide a non-legacy mode, in parallel of the printf-compatible one, 45 that can be designed to fit other objectives without constraints of 46 compatibilty with the existing printf syntax.<br> 47 But Designing a replacement to printf's syntax, that would be clearly 48 better, and as much powerful, is yet another task than building a format 49 class. When such a syntax is designed, we should consider splitting 50 Boost.format into 2 separate libraries : one working hand in hand with 51 this new syntax, and another supporting the legacy syntax (possibly a 52 fast version, built with safety improvement above snprintf or the 53 like).</li> 54 </ol>In the absence of a full, clever, new syntax clearly better adapted to 55 C++ streams than printf, the second approach was chosen. Boost.format uses 56 printf's syntax, with extensions (tabulations, centered alignements) that 57 can be expressed using extensions to this syntax.<br> 58 And alternate compatible notations are provided to address the weaknesses 59 of printf's : 60 61 <ul> 62 <li><i>"%<b>N</b>%"</i> as a simpler positional, typeless and optionless 63 notation.</li> 64 65 <li><i>%|spec|</i> as a way to encapsulate printf directive in movre 66 visually evident structures, at the same time making printf's 67 'type-conversion character' optional.</li> 68 </ul> 69 <hr> 70 71 <h3>Why are arguments passed through an operator rather than a function 72 call ?</h3><br> 73 The inconvenience of the operator approach (for some people) is that it 74 might be confusing. It's a usual warning that too much of overloading 75 operators gets people real confused.<br> 76 Since the use of format objects will be in specific contexts ( most often 77 right after a "cout << ") and look like a formatting string followed 78 by arguments indeed : 79 80 <blockquote> 81 <pre> 82format(" %s at %s with %s\n") % x % y % z; 83</pre> 84 </blockquote>we can hope it wont confuse people that much. 85 86 <p>An other fear about operators, is precedence problems. What if I someday 87 write <b>format("%s") % x+y</b><br> 88 instead of <i>format("%s") % (x+y)</i> ??<br> 89 It will make a mistake at compile-time, so the error will be immediately 90 detected.<br> 91 indeed, this line calls <i>tmp = operator%( format("%s"), x)</i><br> 92 and then <i>operator+(tmp, y)</i><br> 93 tmp will be a format object, for which no implicit conversion is defined, 94 and thus the call to operator+ will fail. (except if you define such an 95 operator, of course). So you can safely assume precedence mistakes will be 96 noticed at compilation.</p> 97 98 <p><br> 99 On the other hand, the function approach has a true inconvenience. It needs 100 to define lots of template function like :</p> 101 102 <blockquote> 103 <pre> 104template <class T1, class T2, .., class TN> 105string format(string s, const T1& x1, .... , const T1& xN); 106 107</pre> 108 </blockquote>and even if we define those for N up to 500, that is still a 109 limitation, that C's printf does not have.<br> 110 Also, since format somehow emulates printf in some cases, but is far from 111 being fully equivalent to printf, it's best to use a radically different 112 appearance, and using operator calls succeeds very well in that ! 113 114 <p><br> 115 Anyhow, if we actually chose the formal function call templates system, it 116 would only be able to print Classes T for which there is an</p> 117 118 <blockquote> 119 <pre> 120operator<< ( stream, const T&) 121</pre> 122 </blockquote>Because allowing both const and non const produces a 123 combinatorics explosion - if we go up to 10 arguments, we need 2^10 124 functions.<br> 125 (providing overloads on T& / const T& is at the frontier of defects 126 of the C++ standard, and thus is far from guaranteed to be supported. But 127 right now several compilers support those overloads)<br> 128 There is a lot of chances that a class which only provides the non-const 129 equivalent is badly designed, but yet it is another unjustified restriction 130 to the user.<br> 131 Also, some manipulators are functions, and can not be passed as const 132 references. The function call approach thus does not support manipulators 133 well. 134 135 <p>In conclusion, using a dedicated binary operator is the simplest, most 136 robust, and least restrictive mechanism to pass arguments when you can't 137 know the number of arguments at compile-time.</p> 138 <hr> 139 140 <h3>Why operator% rather than a member function 'with(..)' 141 ??</h3>technically, 142 143 <blockquote> 144 <pre> 145format(fstr) % x1 % x2 % x3; 146</pre> 147 </blockquote>has the same structure as 148 149 <blockquote> 150 <pre> 151format(fstr).with( x1 ).with( x2 ).with( x3 ); 152</pre> 153 </blockquote>which does not have any precedence problem. The only drawback, 154 is it's harder for the eye to catch what is done in this line, than when we 155 are using operators. calling .with(..), it looks just like any other line 156 of code. So it may be a better solution, depending on tastes. The extra 157 characters, and overall cluttered aspect of the line of code using 158 'with(..)' were enough for me to opt for a true operator. 159 <hr> 160 161 <h3>Why operator% rather than usual formatting operator<< ??</h3> 162 163 <ul> 164 <li>because passing arguments to a format object is *not* the same as 165 sending variables, sequentially, into a stream, and because a format 166 object is not a stream, nor a manipulator.<br> 167 We use an operator to pass arguments. format will use them as a 168 function would, it simply takes arguments one by one.<br> 169 format objects can not provide stream-like behaviour. When you try to 170 implement a format object that acts like a manipulator, returning a 171 stream, you make the user beleive it is completely like a 172 stream-manipulator. And sooner or later, the user is deceived by this 173 point of view.<br> 174 The most obvious example of that difference in behaviour is 175 176 <blockquote> 177 <pre> 178cout << format("%s %s ") << x; 179cout << y ; // uh-oh, format is not really a stream manipulator 180</pre> 181 </blockquote> 182 </li> 183 184 <li>precedence of % is higher than that of <<. It can be viewd as a 185 problem, because + and - thus needs to be grouped inside parentheses, 186 while it is not necessary with '<<'. But if the user forgets, the 187 mistake is catched at compilation, and hopefully he won't forget 188 again.<br> 189 On the other hand, the higher precedence makes format's behaviour very 190 straight-forward. 191 192 <blockquote> 193 <pre> 194cout << format("%s %s ") % x % y << endl; 195</pre> 196 </blockquote>is treated exaclt like : 197 198 <blockquote> 199 <pre> 200cout << ( format("%s %s ") % x % y ) << endl; 201</pre> 202 </blockquote>So using %, the life of a format object does not interfere 203 with the surrounding stream context. This is the simplest possible 204 behaviour, and thus the user is able to continue using the stream after 205 the format object.<br> 206 <br> 207 With operator<<, things are much more problematic in this 208 situation. This line : 209 210 <blockquote> 211 <pre> 212cout << format("%s %s ") << x << y << endl; 213</pre> 214 </blockquote>is understood as : 215 216 <blockquote> 217 <pre> 218( ( ( cout << format("%s %s ") ) << x ) << y ) << endl; 219</pre> 220 </blockquote>Several alternative implementations chose 221 operator<<, and there is only one way to make it work :<br> 222 the first call to 223 224 <blockquote> 225 <pre> 226operator<<( ostream&, format const&) 227</pre> 228 </blockquote>returns a proxy, encapsulating both the final destination 229 (cout) and the format-string information<br> 230 Passing arguments to format, or to the final destination after 231 completion of the format are indistinguishable. This is a problem. 232 233 <p>I examined several possible implementations, and none is completely 234 satsifying.<br> 235 E.g. : In order to catch users mistake, it makes sense to raise 236 exceptions when the user passes too many arguments. But in this 237 context, supplementary arguments are most certainly aimed at the final 238 destination. There are several choices here :</p> 239 240 <ul> 241 <li>You can give-up detection of arity excess, and have the proxy's 242 template member operator<<( const T&) simply forward all 243 supplementary arguments to cout.</li> 244 245 <li>Require the user to close the format arguments with a special 246 manipulator, 'endf', in this way : 247 248 <blockquote> 249 <pre> 250cout << format("%s %s ") << x << y << endf << endl; 251</pre> 252 </blockquote>You can define endf to be a function that returns the 253 final destination stored inside the proxy. Then it's okay, after 254 endf the user is calling << on cout again. 255 </li> 256 257 <li>An intermediate solution, is to adress the most frequent use, 258 where the user simply wants to output one more manipulator item to 259 cout (a std::flush, or endl, ..) 260 261 <blockquote> 262 <pre> 263cout << format("%s %s \n") << x << y << flush ; 264</pre> 265 </blockquote>Then, the solution is to overload the operator<< 266 for manipulators. This way You don't need endf, but outputting a 267 non-manipulator item right after the format arguments is a mistake. 268 </li> 269 </ul><br> 270 The most complete solution is the one with the endf manipualtor. With 271 operator%, there is no need for this end-format function, plus you 272 instantly see which arguments are going into the format object, and 273 which are going to the stream. 274 </li> 275 276 <li>Esthetically : '%' is the same letter as used inside the 277 format-string. That is quite nice to have the same letter used for 278 passing each argument. '<<' is 2 letters, '%' is one. '%' is also 279 smaller in size. It overall improves visualisation (we see what goes with 280 what) : 281 282 <blockquote> 283 <pre> 284cout << format("%s %s %s") %x %y %z << "And avg is" << format("%s\n") %avg; 285</pre> 286 </blockquote>compared to : 287 288 <blockquote> 289 <pre> 290cout << format("%s %s %s") << x << y << z << endf <<"And avg is" << format("%s\n") << avg; 291</pre> 292 </blockquote>"<<" misleadingly puts the arguments at the same 293 level as any object passed to the stream. 294 </li> 295 296 <li>python also uses % for formatting, so you see it's not so "unheard 297 of" ;-)</li> 298 </ul> 299 <hr> 300 301 <h3>Why operator% rather than operator(), or operator[] ??</h3> 302 303 <p>operator() has the merit of being the natural way to send an argument 304 into a function. And some think that operator[] 's meaning apply well to 305 the usage in format.<br> 306 They're as good as operator% technically, but quite ugly. (that's a matter 307 of taste)<br> 308 And deepd down, using operator% for passing arguments that were referred to 309 by "%" in the format string seems much more natural to me than using those 310 operators.</p> 311 <hr> 312 313 <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src= 314 "../../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional" 315 height="31" width="88"></a></p> 316 317 <p>Revised 318 <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->02 December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38510" --></p> 319 320 <p><i>Copyright © 2001 Samuel Krempp</i></p> 321 322 <p><i>Distributed under the Boost Software License, Version 1.0. (See 323 accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or 324 copy at <a href= 325 "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> 326</body> 327</html> 328