1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> 2<html> 3<head> 4<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 5<title>Appendices</title> 6<link rel="stylesheet" href="../../../doc/src/boostbook.css" type="text/css"> 7<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 8<link rel="home" href="../index.html" title="The Boost C++ Libraries BoostBook Documentation Subset"> 9<link rel="up" href="../xpressive.html" title="Chapter 46. Boost.Xpressive"> 10<link rel="prev" href="../boost_xpressive/acknowledgments.html" title="Acknowledgments"> 11<link rel="next" href="../yap.html" title="Chapter 47. Boost.YAP"> 12</head> 13<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 14<table cellpadding="2" width="100%"><tr> 15<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../boost.png"></td> 16<td align="center"><a href="../../../index.html">Home</a></td> 17<td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td> 18<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 19<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 20<td align="center"><a href="../../../more/index.htm">More</a></td> 21</tr></table> 22<hr> 23<div class="spirit-nav"> 24<a accesskey="p" href="../boost_xpressive/acknowledgments.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../xpressive.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../yap.html"><img src="../../../doc/src/images/next.png" alt="Next"></a> 25</div> 26<div class="section"> 27<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 28<a name="xpressive.appendices"></a><a class="link" href="appendices.html" title="Appendices">Appendices</a> 29</h2></div></div></div> 30<div class="toc"><dl class="toc"> 31<dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.appendix_1__history">Appendix 32 1: History</a></span></dt> 33<dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.appendix_2__not_yet_implemented">Appendix 34 2: Not Yet Implemented</a></span></dt> 35<dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.appendix_3__differences_from_boost_regex">Appendix 36 3: Differences from Boost.Regex</a></span></dt> 37<dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.perf">Appendix 4: Performance 38 Comparison</a></span></dt> 39<dt><span class="section"><a href="appendices.html#xpressive.appendices.appendix_5__implementation_notes">Appendix 40 5: Implementation Notes</a></span></dt> 41</dl></div> 42<div class="section"> 43<div class="titlepage"><div><div><h3 class="title"> 44<a name="boost_xpressive.appendices.appendix_1__history"></a><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history" title="Appendix 1: History">Appendix 45 1: History</a> 46</h3></div></div></div> 47<h3> 48<a name="boost_xpressive.appendices.appendix_1__history.h0"></a> 49 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_2_1_0_6_12_2008"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_2_1_0_6_12_2008">Version 50 2.1.0 6/12/2008</a> 51 </h3> 52<p> 53 New Features: 54 </p> 55<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 56<li class="listitem"> 57 <code class="computeroutput"><span class="identifier">skip</span><span class="special">()</span></code> 58 primitive for static regexes, which allows you to specify parts of the 59 input string to ignore during regex matching. 60 </li> 61<li class="listitem"> 62 Range-based <code class="computeroutput"><span class="identifier">regex_replace</span><span class="special">()</span></code> algorithm interface. 63 </li> 64<li class="listitem"> 65 <code class="computeroutput"><span class="identifier">regex_replace</span><span class="special">()</span></code> 66 accepts formatter objects and formatter lambda expressions in addition 67 to format strings. 68 </li> 69</ul></div> 70<p> 71 Bugs Fixed: 72 </p> 73<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> 74 Semantic actions in look-aheads, look-behinds and independent sub-expressions 75 execute eagerly instead of causing a crash. 76 </li></ul></div> 77<h3> 78<a name="boost_xpressive.appendices.appendix_1__history.h1"></a> 79 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_2_0_1_10_23_2007"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_2_0_1_10_23_2007">Version 80 2.0.1 10/23/2007</a> 81 </h3> 82<p> 83 Bugs Fixed: 84 </p> 85<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> 86 <code class="computeroutput"><span class="identifier">sub_match</span><span class="special"><></span></code> 87 constructor copies singular iterator causing debug assert. 88 </li></ul></div> 89<h3> 90<a name="boost_xpressive.appendices.appendix_1__history.h2"></a> 91 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_2_0_0__10_12_2007"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_2_0_0__10_12_2007">Version 92 2.0.0, 10/12/2007</a> 93 </h3> 94<p> 95 New Features: 96 </p> 97<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 98<li class="listitem"> 99 Semantic actions 100 </li> 101<li class="listitem"> 102 Custom assertions 103 </li> 104<li class="listitem"> 105 Named captures 106 </li> 107<li class="listitem"> 108 Dynamic regex grammars 109 </li> 110<li class="listitem"> 111 Recursive dynamic regexes with <code class="literal">(?R)</code> construct 112 </li> 113<li class="listitem"> 114 Support for searching non-character data 115 </li> 116<li class="listitem"> 117 Better errors for invalid static regexes 118 </li> 119<li class="listitem"> 120 Range-based regex algorithm interface 121 </li> 122<li class="listitem"> 123 <code class="computeroutput"><span class="identifier">match_flag_type</span><span class="special">::</span><span class="identifier">format_perl</span></code>, <code class="computeroutput"><span class="identifier">match_flag_type</span><span class="special">::</span><span class="identifier">format_sed</span></code>, 124 and <code class="computeroutput"><span class="identifier">match_flag_type</span><span class="special">::</span><span class="identifier">format_all</span></code> 125 </li> 126<li class="listitem"> 127 <code class="computeroutput"><span class="keyword">operator</span><span class="special">+(</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">sub_match</span><span class="special"><>)</span></code> 128 and variants 129 </li> 130<li class="listitem"> 131 Version 2 regex traits get <code class="computeroutput"><span class="identifier">tolower</span><span class="special">()</span></code> and <code class="computeroutput"><span class="identifier">toupper</span><span class="special">()</span></code> 132 </li> 133</ul></div> 134<p> 135 Bugs Fixed: 136 </p> 137<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> 138 Complementing single-character sets like <code class="computeroutput"><span class="special">~(</span><span class="identifier">set</span><span class="special">=</span><span class="char">'a'</span><span class="special">)</span></code> works. 139 </li></ul></div> 140<h3> 141<a name="boost_xpressive.appendices.appendix_1__history.h3"></a> 142 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_1_0_2__april_27__2007"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_1_0_2__april_27__2007">Version 143 1.0.2, April 27, 2007</a> 144 </h3> 145<p> 146 Bugs Fixed: 147 </p> 148<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> 149 Back-references greater than nine work as advertized. 150 </li></ul></div> 151<p> 152 This is the version that shipped as part of Boost 1.34. 153 </p> 154<h3> 155<a name="boost_xpressive.appendices.appendix_1__history.h4"></a> 156 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_1_0_1__october_2__2006"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_1_0_1__october_2__2006">Version 157 1.0.1, October 2, 2006</a> 158 </h3> 159<p> 160 Bugs Fixed: 161 </p> 162<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> 163 <code class="computeroutput"><span class="identifier">match_results</span><span class="special">::</span><span class="identifier">position</span><span class="special">()</span></code> 164 works for nested results. 165 </li></ul></div> 166<h3> 167<a name="boost_xpressive.appendices.appendix_1__history.h5"></a> 168 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_1_0_0__march_16__2006"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_1_0_0__march_16__2006">Version 169 1.0.0, March 16, 2006</a> 170 </h3> 171<p> 172 Version 1.0! 173 </p> 174<h3> 175<a name="boost_xpressive.appendices.appendix_1__history.h6"></a> 176 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_0_9_6__august_19__2005"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_0_9_6__august_19__2005">Version 177 0.9.6, August 19, 2005</a> 178 </h3> 179<p> 180 The version reviewed for acceptance into Boost. The review began September 181 8, 2005. Xpressive was accepted into Boost on September 28, 2005. 182 </p> 183<h3> 184<a name="boost_xpressive.appendices.appendix_1__history.h7"></a> 185 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_0_9_3__june_30__2005"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_0_9_3__june_30__2005">Version 186 0.9.3, June 30, 2005</a> 187 </h3> 188<p> 189 New Features: 190 </p> 191<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 192<li class="listitem"> 193 TR1-style regex_traits interface 194 </li> 195<li class="listitem"> 196 Speed enhancements 197 </li> 198<li class="listitem"> 199 <code class="computeroutput"><span class="identifier">syntax_option_type</span><span class="special">::</span><span class="identifier">ignore_white_space</span></code> 200 </li> 201</ul></div> 202<h3> 203<a name="boost_xpressive.appendices.appendix_1__history.h8"></a> 204 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_0_9_0__september_2__2004"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_0_9_0__september_2__2004">Version 205 0.9.0, September 2, 2004</a> 206 </h3> 207<p> 208 New Features: 209 </p> 210<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> 211 It sort of works. 212 </li></ul></div> 213<h3> 214<a name="boost_xpressive.appendices.appendix_1__history.h9"></a> 215 <span class="phrase"><a name="boost_xpressive.appendices.appendix_1__history.version_0_0_1__november_16__2003"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_1__history.version_0_0_1__november_16__2003">Version 216 0.0.1, November 16, 2003</a> 217 </h3> 218<p> 219 Announcement of xpressive: <a href="http://lists.boost.org/Archives/boost/2003/11/56312.php" target="_top">http://lists.boost.org/Archives/boost/2003/11/56312.php</a> 220 </p> 221</div> 222<div class="section"> 223<div class="titlepage"><div><div><h3 class="title"> 224<a name="boost_xpressive.appendices.appendix_2__not_yet_implemented"></a><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_2__not_yet_implemented" title="Appendix 2: Not Yet Implemented">Appendix 225 2: Not Yet Implemented</a> 226</h3></div></div></div> 227<p> 228 The following features are planned for xpressive 2.X: 229 </p> 230<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 231<li class="listitem"> 232 <code class="computeroutput"><span class="identifier">syntax_option_type</span><span class="special">::</span><span class="identifier">collate</span></code> 233 </li> 234<li class="listitem"> 235 Collation sequences such as <code class="literal">[.a.]</code> 236 </li> 237<li class="listitem"> 238 Equivalence classes like <code class="literal">[=a=]</code> 239 </li> 240<li class="listitem"> 241 Control of nested results generation with <code class="computeroutput"><span class="identifier">syntax_option_type</span><span class="special">::</span><span class="identifier">nosubs</span></code>, 242 and a <code class="computeroutput"><span class="identifier">nosubs</span><span class="special">()</span></code> 243 modifier for static xpressive. 244 </li> 245</ul></div> 246<p> 247 Here are some wish-list features. You or your company should consider hiring 248 me to implement them! 249 </p> 250<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 251<li class="listitem"> 252 Optimized DFA back-end for simple, fast regexing. 253 </li> 254<li class="listitem"> 255 Different regex compiler front ends for basic, extended, awk, grep and 256 egrep regex syntax. 257 </li> 258<li class="listitem"> 259 Fine-grained control over the dynamic regex syntax 260 </li> 261<li class="listitem"> 262 Optional integration with ICU for full Unicode support. 263 </li> 264<li class="listitem"> 265 Improved localization support, possibly as a custom facet for <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>. 266 </li> 267</ul></div> 268</div> 269<div class="section"> 270<div class="titlepage"><div><div><h3 class="title"> 271<a name="boost_xpressive.appendices.appendix_3__differences_from_boost_regex"></a><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_3__differences_from_boost_regex" title="Appendix 3: Differences from Boost.Regex">Appendix 272 3: Differences from Boost.Regex</a> 273</h3></div></div></div> 274<p> 275 Since many of xpressive's users are likely to be familiar with the <a href="../../../libs/regex" target="_top">Boost.Regex</a> library, I would be remiss if 276 I failed to point out some important differences between xpressive and <a href="../../../libs/regex" target="_top">Boost.Regex</a>. In particular:<br> 277 </p> 278<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 279<li class="listitem"> 280 <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 281 is a template on the iterator type, not the character type. 282 </li> 283<li class="listitem"> 284 <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 285 cannot be constructed directly from a string; rather, you must use <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">()</span></code> 286 or <code class="computeroutput"><span class="identifier">regex_compiler</span><span class="special"><></span></code> 287 to build a regex object from a string. 288 </li> 289<li class="listitem"> 290 <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 291 does not have an <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> member function; rather, the <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> 292 member function is in the <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">regex_compiler</span><span class="special"><></span></code> factory. 293 </li> 294<li class="listitem"> 295 <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 296 has a subset of <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code>'s 297 members. <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 298 does not. The members lacking are: <code class="computeroutput"><span class="identifier">assign</span><span class="special">()</span></code>, <code class="computeroutput"><span class="keyword">operator</span><span class="special">[]()</span></code>, <code class="computeroutput"><span class="identifier">max_size</span><span class="special">()</span></code>, <code class="computeroutput"><span class="identifier">begin</span><span class="special">()</span></code>, <code class="computeroutput"><span class="identifier">end</span><span class="special">()</span></code>, <code class="computeroutput"><span class="identifier">size</span><span class="special">()</span></code>, <code class="computeroutput"><span class="identifier">compare</span><span class="special">()</span></code>, and <code class="computeroutput"><span class="keyword">operator</span><span class="special">=(</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><>)</span></code>. 299 </li> 300<li class="listitem"> 301 Other member functions that exist in <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> but do not exist in <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 302 are: <code class="computeroutput"><span class="identifier">set_expression</span><span class="special">()</span></code>, 303 <code class="computeroutput"><span class="identifier">get_allocator</span><span class="special">()</span></code>, 304 <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code>, 305 <code class="computeroutput"><span class="identifier">getloc</span><span class="special">()</span></code>, 306 <code class="computeroutput"><span class="identifier">getflags</span><span class="special">()</span></code>, 307 and <code class="computeroutput"><span class="identifier">str</span><span class="special">()</span></code>. 308 </li> 309<li class="listitem"> 310 <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 311 does not have a RegexTraits template parameter. Customization of regex 312 syntax and localization behavior will be controlled by <code class="computeroutput"><span class="identifier">regex_compiler</span><span class="special"><></span></code> 313 and a custom regex facet for <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>. 314 </li> 315<li class="listitem"> 316 <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">basic_regex</span><span class="special"><></span></code> 317 and <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">match_results</span><span class="special"><></span></code> 318 do not have an Allocator template parameter. This is by design. 319 </li> 320<li class="listitem"> 321 <code class="computeroutput"><span class="identifier">match_not_dot_null</span></code> and 322 <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code> 323 have moved from the <code class="computeroutput"><span class="identifier">match_flag_type</span></code> 324 enum to the <code class="computeroutput"><span class="identifier">syntax_option_type</span></code> 325 enum, and they have changed names to <code class="computeroutput"><span class="identifier">not_dot_null</span></code> 326 and <code class="computeroutput"><span class="identifier">not_dot_newline</span></code>. 327 </li> 328<li class="listitem"> 329 The following <code class="computeroutput"><span class="identifier">syntax_option_type</span></code> 330 enumeration values are not supported: <code class="computeroutput"><span class="identifier">escape_in_lists</span></code>, 331 <code class="computeroutput"><span class="identifier">char_classes</span></code>, <code class="computeroutput"><span class="identifier">intervals</span></code>, <code class="computeroutput"><span class="identifier">limited_ops</span></code>, 332 <code class="computeroutput"><span class="identifier">newline_alt</span></code>, <code class="computeroutput"><span class="identifier">bk_plus_qm</span></code>, <code class="computeroutput"><span class="identifier">bk_braces</span></code>, 333 <code class="computeroutput"><span class="identifier">bk_parens</span></code>, <code class="computeroutput"><span class="identifier">bk_refs</span></code>, <code class="computeroutput"><span class="identifier">bk_vbar</span></code>, 334 <code class="computeroutput"><span class="identifier">use_except</span></code>, <code class="computeroutput"><span class="identifier">failbit</span></code>, <code class="computeroutput"><span class="identifier">literal</span></code>, 335 <code class="computeroutput"><span class="identifier">perlex</span></code>, <code class="computeroutput"><span class="identifier">basic</span></code>, <code class="computeroutput"><span class="identifier">extended</span></code>, 336 <code class="computeroutput"><span class="identifier">emacs</span></code>, <code class="computeroutput"><span class="identifier">awk</span></code>, <code class="computeroutput"><span class="identifier">grep</span></code> 337 ,<code class="computeroutput"><span class="identifier">egrep</span></code>, <code class="computeroutput"><span class="identifier">sed</span></code>, <code class="computeroutput"><span class="identifier">JavaScript</span></code>, 338 <code class="computeroutput"><span class="identifier">JScript</span></code>. 339 </li> 340<li class="listitem"> 341 The following <code class="computeroutput"><span class="identifier">match_flag_type</span></code> 342 enumeration values are not supported: <code class="computeroutput"><span class="identifier">match_not_bob</span></code>, 343 <code class="computeroutput"><span class="identifier">match_not_eob</span></code>, <code class="computeroutput"><span class="identifier">match_perl</span></code>, <code class="computeroutput"><span class="identifier">match_posix</span></code>, 344 and <code class="computeroutput"><span class="identifier">match_extra</span></code>. 345 </li> 346</ul></div> 347<p> 348 Also, in the current implementation, the regex algorithms in xpressive will 349 not detect pathological behavior and abort by throwing an exception. It is 350 up to you to write efficient patterns that do not behave pathologically. 351 </p> 352</div> 353<div class="section"> 354<div class="titlepage"><div><div><h3 class="title"> 355<a name="boost_xpressive.appendices.perf"></a><a class="link" href="appendices.html#boost_xpressive.appendices.perf" title="Appendix 4: Performance Comparison">Appendix 4: Performance 356 Comparison</a> 357</h3></div></div></div> 358<div class="toc"><dl class="toc"> 359<dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.perf.perf_gcc">xpressive 360 vs. Boost.Regex with GCC (Cygwin)</a></span></dt> 361<dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.perf.perf_msvc">xpressive 362 vs. Boost.Regex with Visual C++</a></span></dt> 363</dl></div> 364<p> 365 The performance of xpressive is competitive with <a href="../../../libs/regex" target="_top">Boost.Regex</a>. 366 I have run performance benchmarks comparing static xpressive, dynamic xpressive 367 and <a href="../../../libs/regex" target="_top">Boost.Regex</a> on two platforms: gcc 368 (Cygwin) and Visual C++. The tests include short matches and long searches. 369 For both platforms, xpressive comes off well on short matches and roughly 370 on par with <a href="../../../libs/regex" target="_top">Boost.Regex</a> on long searches. 371 </p> 372<p> 373 <disclaimer> As with all benchmarks, the true test is how xpressive 374 performs with <span class="emphasis"><em>your</em></span> patterns, <span class="emphasis"><em>your</em></span> 375 input, and <span class="emphasis"><em>your</em></span> platform, so if performance matters 376 in your application, it's best to run your own tests. </disclaimer> 377 </p> 378<div class="section"> 379<div class="titlepage"><div><div><h4 class="title"> 380<a name="boost_xpressive.appendices.perf.perf_gcc"></a><a class="link" href="appendices.html#boost_xpressive.appendices.perf.perf_gcc" title="xpressive vs. Boost.Regex with GCC (Cygwin)">xpressive 381 vs. Boost.Regex with GCC (Cygwin)</a> 382</h4></div></div></div> 383<p> 384 Below are the results of a performance comparison between: 385 </p> 386<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 387<li class="listitem"> 388 static xpressive 389 </li> 390<li class="listitem"> 391 dynamic xpressive 392 </li> 393<li class="listitem"> 394 <a href="../../../libs/regex" target="_top">Boost.Regex</a> 395 </li> 396</ul></div> 397<div class="variablelist"> 398<p class="title"><b>Test Specifications</b></p> 399<dl class="variablelist"> 400<dt><span class="term">Hardware:</span></dt> 401<dd><p> 402 hyper-threaded 3GHz Xeon with 1Gb RAM 403 </p></dd> 404<dt><span class="term">Operating System:</span></dt> 405<dd><p> 406 Windows XP Pro + Cygwin 407 </p></dd> 408<dt><span class="term">Compiler:</span></dt> 409<dd><p> 410 GNU C++ version 3.4.4 (Cygwin special) 411 </p></dd> 412<dt><span class="term">C++ Standard Library:</span></dt> 413<dd><p> 414 GNU libstdc++ version 3.4.4 415 </p></dd> 416<dt><span class="term"><a href="../../../libs/regex" target="_top">Boost.Regex</a> Version:</span></dt> 417<dd><p> 418 1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE 419 </p></dd> 420<dt><span class="term">xpressive Version:</span></dt> 421<dd><p> 422 0.9.6a 423 </p></dd> 424</dl> 425</div> 426<h3> 427<a name="boost_xpressive.appendices.perf.perf_gcc.h0"></a> 428 <span class="phrase"><a name="boost_xpressive.appendices.perf.perf_gcc.comparison_1__short_matches"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.perf.perf_gcc.comparison_1__short_matches">Comparison 429 1: Short Matches</a> 430 </h3> 431<p> 432 The following tests evaluate the time taken to match the expression to 433 the input string. For each result, the top number has been normalized relative 434 to the fastest time, so 1.0 is as good as it gets. The bottom number (in 435 parentheses) is the actual time in seconds. The best time has been marked 436 in green. 437 </p> 438<div class="informaltable"> 439<h5> 440<a name="id-1.3.47.7.5.4.7.1"></a><span class="table-title">Short Matches</span> 441</h5> 442<table class="table"> 443<colgroup> 444<col> 445<col> 446<col> 447<col> 448<col> 449</colgroup> 450<thead><tr> 451<th>static xpressive</th> 452<th>dynamic xpressive</th> 453<th>Boost</th> 454<th>Text</th> 455<th>Expression</th> 456</tr></thead> 457<tbody> 458<tr> 459<td><span class="highlight">1<p></p>(8.79e‑07s)</span></td> 460<td><span class="highlight">1.08<p></p>(9.54e‑07s)</span></td> 461<td>2.51<p></p>(2.2e‑06s)</td> 462<td>100- this is a line of ftp response which contains a message string</td> 463<td><code class="literal">^([0-9]+)(\-| |$)(.*)$</code></td> 464</tr> 465<tr> 466<td><span class="highlight">1.06<p></p>(1.07e‑06s)</span></td> 467<td><span class="highlight">1<p></p>(1.01e‑06s)</span></td> 468<td>4.01<p></p>(4.06e‑06s)</td> 469<td>1234-5678-1234-456</td> 470<td><code class="literal">([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td> 471</tr> 472<tr> 473<td><span class="highlight">1<p></p>(1.4e‑06s)</span></td> 474<td>1.13<p></p>(1.58e‑06s)</td> 475<td>2.89<p></p>(4.05e‑06s)</td> 476<td>john_maddock@compuserve.com</td> 477<td><code class="literal">^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> 478</tr> 479<tr> 480<td><span class="highlight">1<p></p>(1.28e‑06s)</span></td> 481<td>1.16<p></p>(1.49e‑06s)</td> 482<td>3.07<p></p>(3.94e‑06s)</td> 483<td>foo12@foo.edu</td> 484<td><code class="literal">^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> 485</tr> 486<tr> 487<td><span class="highlight">1<p></p>(1.22e‑06s)</span></td> 488<td>1.2<p></p>(1.46e‑06s)</td> 489<td>3.22<p></p>(3.93e‑06s)</td> 490<td>bob.smith@foo.tv</td> 491<td><code class="literal">^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> 492</tr> 493<tr> 494<td><span class="highlight">1.04<p></p>(8.64e‑07s)</span></td> 495<td><span class="highlight">1<p></p>(8.34e‑07s)</span></td> 496<td>2.5<p></p>(2.09e‑06s)</td> 497<td>EH10 2QQ</td> 498<td><code class="literal">^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> 499</tr> 500<tr> 501<td>1.11<p></p>(9.09e‑07s)</td> 502<td><span class="highlight">1<p></p>(8.19e‑07s)</span></td> 503<td>2.47<p></p>(2.03e‑06s)</td> 504<td>G1 1AA</td> 505<td><code class="literal">^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> 506</tr> 507<tr> 508<td>1.12<p></p>(9.38e‑07s)</td> 509<td><span class="highlight">1<p></p>(8.34e‑07s)</span></td> 510<td>2.5<p></p>(2.08e‑06s)</td> 511<td>SW1 1ZZ</td> 512<td><code class="literal">^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> 513</tr> 514<tr> 515<td><span class="highlight">1<p></p>(7.9e‑07s)</span></td> 516<td><span class="highlight">1.06<p></p>(8.34e‑07s)</span></td> 517<td>2.49<p></p>(1.96e‑06s)</td> 518<td>4/1/2001</td> 519<td><code class="literal">^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td> 520</tr> 521<tr> 522<td><span class="highlight">1<p></p>(8.19e‑07s)</span></td> 523<td><span class="highlight">1.04<p></p>(8.49e‑07s)</span></td> 524<td>2.4<p></p>(1.97e‑06s)</td> 525<td>12/12/2001</td> 526<td><code class="literal">^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td> 527</tr> 528<tr> 529<td><span class="highlight">1.09<p></p>(8.95e‑07s)</span></td> 530<td><span class="highlight">1<p></p>(8.19e‑07s)</span></td> 531<td>2.4<p></p>(1.96e‑06s)</td> 532<td>123</td> 533<td><code class="literal">^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> 534</tr> 535<tr> 536<td>1.11<p></p>(8.79e‑07s)</td> 537<td><span class="highlight">1<p></p>(7.9e‑07s)</span></td> 538<td>2.57<p></p>(2.03e‑06s)</td> 539<td>+3.14159</td> 540<td><code class="literal">^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> 541</tr> 542<tr> 543<td><span class="highlight">1.09<p></p>(8.94e‑07s)</span></td> 544<td><span class="highlight">1<p></p>(8.19e‑07s)</span></td> 545<td>2.47<p></p>(2.03e‑06s)</td> 546<td>-3.14159</td> 547<td><code class="literal">^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> 548</tr> 549</tbody> 550</table> 551</div> 552<h3> 553<a name="boost_xpressive.appendices.perf.perf_gcc.h1"></a> 554 <span class="phrase"><a name="boost_xpressive.appendices.perf.perf_gcc.comparison_2__long_searches"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.perf.perf_gcc.comparison_2__long_searches">Comparison 555 2: Long Searches</a> 556 </h3> 557<p> 558 The next test measures the time to find <span class="emphasis"><em>all</em></span> matches 559 in a long English text. The text is the <a href="http://www.gutenberg.org/dirs/3/2/0/3200/3200.zip" target="_top">complete 560 works of Mark Twain</a>, from <a href="http://promo.net/pg/" target="_top">Project 561 Gutenberg</a>. The text is 19Mb long. As above, the top number is the 562 normalized time and the bottom number is the actual time. The best time 563 is in green. 564 </p> 565<div class="informaltable"> 566<h5> 567<a name="id-1.3.47.7.5.4.10.1"></a><span class="table-title">Long Searches</span> 568</h5> 569<table class="table"> 570<colgroup> 571<col> 572<col> 573<col> 574<col> 575</colgroup> 576<thead><tr> 577<th>static xpressive</th> 578<th>dynamic xpressive</th> 579<th>Boost</th> 580<th>Expression</th> 581</tr></thead> 582<tbody> 583<tr> 584<td><span class="highlight">1<p></p>(0.0263s)</span></td> 585<td><span class="highlight">1<p></p>(0.0263s)</span></td> 586<td>1.78<p></p>(0.0469s)</td> 587<td><code class="literal">Twain</code></td> 588</tr> 589<tr> 590<td><span class="highlight">1<p></p>(0.0234s)</span></td> 591<td><span class="highlight">1<p></p>(0.0234s)</span></td> 592<td>1.79<p></p>(0.042s)</td> 593<td><code class="literal">Huck[[:alpha:]]+</code></td> 594</tr> 595<tr> 596<td>1.84<p></p>(1.26s)</td> 597<td>2.21<p></p>(1.51s)</td> 598<td><span class="highlight">1<p></p>(0.687s)</span></td> 599<td><code class="literal">[[:alpha:]]+ing</code></td> 600</tr> 601<tr> 602<td><span class="highlight">1.09<p></p>(0.192s)</span></td> 603<td>2<p></p>(0.351s)</td> 604<td><span class="highlight">1<p></p>(0.176s)</span></td> 605<td><code class="literal">^[^ 606]*?Twain</code></td> 607</tr> 608<tr> 609<td>1.41<p></p>(0.08s)</td> 610<td>1.21<p></p>(0.0684s)</td> 611<td><span class="highlight">1<p></p>(0.0566s)</span></td> 612<td><code class="literal">Tom|Sawyer|Huckleberry|Finn</code></td> 613</tr> 614<tr> 615<td>1.56<p></p>(0.195s)</td> 616<td>1.12<p></p>(0.141s)</td> 617<td><span class="highlight">1<p></p>(0.125s)</span></td> 618<td><code class="literal">(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td> 619</tr> 620</tbody> 621</table> 622</div> 623</div> 624<div class="section"> 625<div class="titlepage"><div><div><h4 class="title"> 626<a name="boost_xpressive.appendices.perf.perf_msvc"></a><a class="link" href="appendices.html#boost_xpressive.appendices.perf.perf_msvc" title="xpressive vs. Boost.Regex with Visual C++">xpressive 627 vs. Boost.Regex with Visual C++</a> 628</h4></div></div></div> 629<p> 630 Below are the results of a performance comparison between: 631 </p> 632<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 633<li class="listitem"> 634 static xpressive 635 </li> 636<li class="listitem"> 637 dynamic xpressive 638 </li> 639<li class="listitem"> 640 <a href="../../../libs/regex" target="_top">Boost.Regex</a> 641 </li> 642</ul></div> 643<div class="variablelist"> 644<p class="title"><b>Test Specifications</b></p> 645<dl class="variablelist"> 646<dt><span class="term">Hardware:</span></dt> 647<dd><p> 648 hyper-threaded 3GHz Xeon with 1Gb RAM 649 </p></dd> 650<dt><span class="term">Operating System:</span></dt> 651<dd><p> 652 Windows XP Pro 653 </p></dd> 654<dt><span class="term">Compiler:</span></dt> 655<dd><p> 656 Visual C++ .NET 2003 (7.1) 657 </p></dd> 658<dt><span class="term">C++ Standard Library:</span></dt> 659<dd><p> 660 Dinkumware, version 313 661 </p></dd> 662<dt><span class="term"><a href="../../../libs/regex" target="_top">Boost.Regex</a> Version:</span></dt> 663<dd><p> 664 1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE 665 </p></dd> 666<dt><span class="term">xpressive Version:</span></dt> 667<dd><p> 668 0.9.6a 669 </p></dd> 670</dl> 671</div> 672<h3> 673<a name="boost_xpressive.appendices.perf.perf_msvc.h0"></a> 674 <span class="phrase"><a name="boost_xpressive.appendices.perf.perf_msvc.comparison_1__short_matches"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.perf.perf_msvc.comparison_1__short_matches">Comparison 675 1: Short Matches</a> 676 </h3> 677<p> 678 The following tests evaluate the time taken to match the expression to 679 the input string. For each result, the top number has been normalized relative 680 to the fastest time, so 1.0 is as good as it gets. The bottom number (in 681 parentheses) is the actual time in seconds. The best time has been marked 682 in green. 683 </p> 684<div class="informaltable"> 685<h5> 686<a name="id-1.3.47.7.5.5.7.1"></a><span class="table-title">Short Matches</span> 687</h5> 688<table class="table"> 689<colgroup> 690<col> 691<col> 692<col> 693<col> 694<col> 695</colgroup> 696<thead><tr> 697<th>static xpressive</th> 698<th>dynamic xpressive</th> 699<th>Boost</th> 700<th>Text</th> 701<th>Expression</th> 702</tr></thead> 703<tbody> 704<tr> 705<td><span class="highlight">1<p></p>(3.2e‑007s)</span></td> 706<td>1.37<p></p>(4.4e‑007s)</td> 707<td>2.38<p></p>(7.6e‑007s)</td> 708<td>100- this is a line of ftp response which contains a message string</td> 709<td><code class="literal">^([0-9]+)(\-| |$)(.*)$</code></td> 710</tr> 711<tr> 712<td><span class="highlight">1<p></p>(6.4e‑007s)</span></td> 713<td>1.12<p></p>(7.15e‑007s)</td> 714<td>1.72<p></p>(1.1e‑006s)</td> 715<td>1234-5678-1234-456</td> 716<td><code class="literal">([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td> 717</tr> 718<tr> 719<td><span class="highlight">1<p></p>(9.82e‑007s)</span></td> 720<td>1.3<p></p>(1.28e‑006s)</td> 721<td>1.61<p></p>(1.58e‑006s)</td> 722<td>john_maddock@compuserve.com</td> 723<td><code class="literal">^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> 724</tr> 725<tr> 726<td><span class="highlight">1<p></p>(8.94e‑007s)</span></td> 727<td>1.3<p></p>(1.16e‑006s)</td> 728<td>1.7<p></p>(1.52e‑006s)</td> 729<td>foo12@foo.edu</td> 730<td><code class="literal">^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> 731</tr> 732<tr> 733<td><span class="highlight">1<p></p>(9.09e‑007s)</span></td> 734<td>1.28<p></p>(1.16e‑006s)</td> 735<td>1.67<p></p>(1.52e‑006s)</td> 736<td>bob.smith@foo.tv</td> 737<td><code class="literal">^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> 738</tr> 739<tr> 740<td><span class="highlight">1<p></p>(3.06e‑007s)</span></td> 741<td><span class="highlight">1.07<p></p>(3.28e‑007s)</span></td> 742<td>1.95<p></p>(5.96e‑007s)</td> 743<td>EH10 2QQ</td> 744<td><code class="literal">^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> 745</tr> 746<tr> 747<td><span class="highlight">1<p></p>(3.13e‑007s)</span></td> 748<td><span class="highlight">1.09<p></p>(3.42e‑007s)</span></td> 749<td>1.86<p></p>(5.81e‑007s)</td> 750<td>G1 1AA</td> 751<td><code class="literal">^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> 752</tr> 753<tr> 754<td><span class="highlight">1<p></p>(3.2e‑007s)</span></td> 755<td><span class="highlight">1.09<p></p>(3.5e‑007s)</span></td> 756<td>1.86<p></p>(5.96e‑007s)</td> 757<td>SW1 1ZZ</td> 758<td><code class="literal">^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> 759</tr> 760<tr> 761<td><span class="highlight">1<p></p>(2.68e‑007s)</span></td> 762<td>1.22<p></p>(3.28e‑007s)</td> 763<td>2<p></p>(5.36e‑007s)</td> 764<td>4/1/2001</td> 765<td><code class="literal">^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td> 766</tr> 767<tr> 768<td><span class="highlight">1<p></p>(2.76e‑007s)</span></td> 769<td>1.16<p></p>(3.2e‑007s)</td> 770<td>1.94<p></p>(5.36e‑007s)</td> 771<td>12/12/2001</td> 772<td><code class="literal">^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td> 773</tr> 774<tr> 775<td><span class="highlight">1<p></p>(2.98e‑007s)</span></td> 776<td><span class="highlight">1.03<p></p>(3.06e‑007s)</span></td> 777<td>1.85<p></p>(5.51e‑007s)</td> 778<td>123</td> 779<td><code class="literal">^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> 780</tr> 781<tr> 782<td><span class="highlight">1<p></p>(3.2e‑007s)</span></td> 783<td>1.12<p></p>(3.58e‑007s)</td> 784<td>1.81<p></p>(5.81e‑007s)</td> 785<td>+3.14159</td> 786<td><code class="literal">^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> 787</tr> 788<tr> 789<td><span class="highlight">1<p></p>(3.28e‑007s)</span></td> 790<td>1.11<p></p>(3.65e‑007s)</td> 791<td>1.77<p></p>(5.81e‑007s)</td> 792<td>-3.14159</td> 793<td><code class="literal">^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> 794</tr> 795</tbody> 796</table> 797</div> 798<h3> 799<a name="boost_xpressive.appendices.perf.perf_msvc.h1"></a> 800 <span class="phrase"><a name="boost_xpressive.appendices.perf.perf_msvc.comparison_2__long_searches"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.perf.perf_msvc.comparison_2__long_searches">Comparison 801 2: Long Searches</a> 802 </h3> 803<p> 804 The next test measures the time to find <span class="emphasis"><em>all</em></span> matches 805 in a long English text. The text is the <a href="http://www.gutenberg.org/dirs/3/2/0/3200/3200.zip" target="_top">complete 806 works of Mark Twain</a>, from <a href="http://promo.net/pg/" target="_top">Project 807 Gutenberg</a>. The text is 19Mb long. As above, the top number is the 808 normalized time and the bottom number is the actual time. The best time 809 is in green. 810 </p> 811<div class="informaltable"> 812<h5> 813<a name="id-1.3.47.7.5.5.10.1"></a><span class="table-title">Long Searches</span> 814</h5> 815<table class="table"> 816<colgroup> 817<col> 818<col> 819<col> 820<col> 821</colgroup> 822<thead><tr> 823<th>static xpressive</th> 824<th>dynamic xpressive</th> 825<th>Boost</th> 826<th>Expression</th> 827</tr></thead> 828<tbody> 829<tr> 830<td><span class="highlight">1<p></p>(0.019s)</span></td> 831<td><span class="highlight">1<p></p>(0.019s)</span></td> 832<td>2.98<p></p>(0.0566s)</td> 833<td><code class="literal">Twain</code></td> 834</tr> 835<tr> 836<td><span class="highlight">1<p></p>(0.0176s)</span></td> 837<td><span class="highlight">1<p></p>(0.0176s)</span></td> 838<td>3.17<p></p>(0.0556s)</td> 839<td><code class="literal">Huck[[:alpha:]]+</code></td> 840</tr> 841<tr> 842<td>3.62<p></p>(1.78s)</td> 843<td>3.97<p></p>(1.95s)</td> 844<td><span class="highlight">1<p></p>(0.492s)</span></td> 845<td><code class="literal">[[:alpha:]]+ing</code></td> 846</tr> 847<tr> 848<td>2.32<p></p>(0.344s)</td> 849<td>3.06<p></p>(0.453s)</td> 850<td><span class="highlight">1<p></p>(0.148s)</span></td> 851<td><code class="literal">^[^ 852]*?Twain</code></td> 853</tr> 854<tr> 855<td><span class="highlight">1<p></p>(0.0576s)</span></td> 856<td><span class="highlight">1.05<p></p>(0.0606s)</span></td> 857<td>1.15<p></p>(0.0664s)</td> 858<td><code class="literal">Tom|Sawyer|Huckleberry|Finn</code></td> 859</tr> 860<tr> 861<td>1.24<p></p>(0.164s)</td> 862<td>1.44<p></p>(0.191s)</td> 863<td><span class="highlight">1<p></p>(0.133s)</span></td> 864<td><code class="literal">(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td> 865</tr> 866</tbody> 867</table> 868</div> 869</div> 870</div> 871<div class="section"> 872<div class="titlepage"><div><div><h3 class="title"> 873<a name="xpressive.appendices.appendix_5__implementation_notes"></a><a class="link" href="appendices.html#xpressive.appendices.appendix_5__implementation_notes" title="Appendix 5: Implementation Notes">Appendix 874 5: Implementation Notes</a> 875</h3></div></div></div> 876<div class="toc"><dl class="toc"><dt><span class="section"><a href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___">Cycle 877 collection with <code class="literal">tracking_ptr<></code></a></span></dt></dl></div> 878<div class="section"> 879<div class="titlepage"><div><div><h4 class="title"> 880<a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___"></a><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___" title="Cycle collection with tracking_ptr<>">Cycle 881 collection with <code class="literal">tracking_ptr<></code></a> 882</h4></div></div></div> 883<p> 884 In xpressive, regex objects can refer to each other and themselves by value 885 or by reference. In addition, they ref-count their referenced regexes to 886 keep them alive. This creates the possibility for cyclic reference counts, 887 and raises the possibility of memory leaks. xpressive avoids leaks by using 888 a type called <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code>. This doc describes at a high level 889 how <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> 890 works. 891 </p> 892<h3> 893<a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.h0"></a> 894 <span class="phrase"><a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.constraints"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.constraints">Constraints</a> 895 </h3> 896<p> 897 Our solution must meet the following design constraints: 898 </p> 899<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 900<li class="listitem"> 901 No dangling references: All objects referred to directly or indirectly 902 must be kept alive as long as the references are needed. 903 </li> 904<li class="listitem"> 905 No leaks: all objects must be freed eventually. 906 </li> 907<li class="listitem"> 908 No user intervention: The solution must not require users to explicitly 909 invoke some cycle collection routine. 910 </li> 911<li class="listitem"> 912 Clean-up is no-throw: The collection phase will likely be called from 913 a destructor, so it must never throw an exception under any circumstance. 914 </li> 915</ul></div> 916<h3> 917<a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.h1"></a> 918 <span class="phrase"><a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.handle_body_idiom"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.handle_body_idiom">Handle-Body 919 Idiom</a> 920 </h3> 921<p> 922 To use <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code>, 923 you must separate your type into a handle and a body. In the case of xpressive, 924 the handle type is called <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><></span></code> and the body is called <code class="computeroutput"><span class="identifier">regex_impl</span><span class="special"><></span></code>. 925 The handle will store a <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> to the body. 926 </p> 927<p> 928 The body type must inherit from <code class="computeroutput"><span class="identifier">enable_reference_tracking</span><span class="special"><></span></code>. This gives the body the bookkeeping 929 data structures that <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> will use. In particular, it gives 930 the body: 931 </p> 932<div class="orderedlist"><ol class="orderedlist" type="1"> 933<li class="listitem"> 934 <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">set</span><span class="special"><</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="identifier">body</span><span class="special">></span> 935 <span class="special">></span> <span class="identifier">refs_</span></code> 936 : collection of bodies to which this body refers, and 937 </li> 938<li class="listitem"> 939 <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">set</span><span class="special"><</span><span class="identifier">weak_ptr</span><span class="special"><</span><span class="identifier">body</span><span class="special">></span> 940 <span class="special">></span> <span class="identifier">deps_</span></code> 941 : collection of bodies which refer to this body. 942 </li> 943</ol></div> 944<h3> 945<a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.h2"></a> 946 <span class="phrase"><a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.references_and_dependencies"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.references_and_dependencies">References 947 and Dependencies</a> 948 </h3> 949<p> 950 We refer to (1) above as the "references" and (2) as the "dependencies". 951 It is crucial to the understanding of <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> to recognize that the set of references 952 includes both those objects that are referred to directly as well as those 953 that are referred to indirectly (that is, through another reference). The 954 same is true for the set of dependencies. In other words, each body holds 955 a ref-count directly to every other body that it needs. 956 </p> 957<p> 958 Why is this important? Because it means that when a body no longer has 959 a handle referring to it, all its references can be released immediately 960 without fear of creating dangling references. 961 </p> 962<p> 963 References and dependencies cross-pollinate. Here's how it works: 964 </p> 965<div class="orderedlist"><ol class="orderedlist" type="1"> 966<li class="listitem"> 967 When one object acquires another as a reference, the second object 968 acquires the first as a dependency. 969 </li> 970<li class="listitem"> 971 In addition, the first object acquires all of the second object's references, 972 and the second object acquires all of the first object's dependencies. 973 </li> 974<li class="listitem"> 975 When an object picks up a new reference, the reference is also added 976 to all dependent objects. 977 </li> 978<li class="listitem"> 979 When an object picks up a new dependency, the dependency is also added 980 to all referenced objects. 981 </li> 982<li class="listitem"> 983 An object is never allowed to have itself as a dependency. Objects 984 may have themselves as references, and often do. 985 </li> 986</ol></div> 987<p> 988 Consider the following code: 989 </p> 990<pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">expr</span><span class="special">;</span> 991<span class="special">{</span> 992 <span class="identifier">sregex</span> <span class="identifier">group</span> <span class="special">=</span> <span class="char">'('</span> <span class="special">>></span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">expr</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> <span class="comment">// (1)</span> 993 <span class="identifier">sregex</span> <span class="identifier">fact</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span> <span class="special">|</span> <span class="identifier">group</span><span class="special">;</span> <span class="comment">// (2)</span> 994 <span class="identifier">sregex</span> <span class="identifier">term</span> <span class="special">=</span> <span class="identifier">fact</span> <span class="special">>></span> <span class="special">*((</span><span class="char">'*'</span> <span class="special">>></span> <span class="identifier">fact</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><span class="char">'/'</span> <span class="special">>></span> <span class="identifier">fact</span><span class="special">));</span> <span class="comment">// (3)</span> 995 <span class="identifier">expr</span> <span class="special">=</span> <span class="identifier">term</span> <span class="special">>></span> <span class="special">*((</span><span class="char">'+'</span> <span class="special">>></span> <span class="identifier">term</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><span class="char">'-'</span> <span class="special">>></span> <span class="identifier">term</span><span class="special">));</span> <span class="comment">// (4)</span> 996<span class="special">}</span> <span class="comment">// (5)</span> 997</pre> 998<p> 999 Here is how the references and dependencies propagate, line by line: 1000 </p> 1001<div class="informaltable"><table class="table"> 1002<colgroup> 1003<col> 1004<col> 1005</colgroup> 1006<thead><tr> 1007<th> 1008 <p> 1009 Expression 1010 </p> 1011 </th> 1012<th> 1013 <p> 1014 Effects 1015 </p> 1016 </th> 1017</tr></thead> 1018<tbody> 1019<tr> 1020<td> 1021 <p> 1022 1) <code class="computeroutput"><span class="identifier">sregex</span> <span class="identifier">group</span> 1023 <span class="special">=</span> <span class="char">'('</span> 1024 <span class="special">>></span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">expr</span><span class="special">)</span> <span class="special">>></span> 1025 <span class="char">')'</span><span class="special">;</span></code> 1026 </p> 1027 </td> 1028<td> 1029 <p> 1030 <code class="literal">group: cnt=1 refs={expr} deps={}<br> expr: cnt=2 refs={} 1031 deps={group}</code> 1032 </p> 1033 </td> 1034</tr> 1035<tr> 1036<td> 1037 <p> 1038 2) <code class="computeroutput"><span class="identifier">sregex</span> <span class="identifier">fact</span> 1039 <span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span> <span class="special">|</span> 1040 <span class="identifier">group</span><span class="special">;</span></code> 1041 </p> 1042 </td> 1043<td> 1044 <p> 1045 <code class="literal">group: cnt=2 refs={expr} deps={fact}<br> expr: cnt=3 1046 refs={} deps={group,fact}<br> fact: cnt=1 refs={expr,group} 1047 deps={}</code> 1048 </p> 1049 </td> 1050</tr> 1051<tr> 1052<td> 1053 <p> 1054 3) <code class="computeroutput"><span class="identifier">sregex</span> <span class="identifier">term</span> 1055 <span class="special">=</span> <span class="identifier">fact</span> 1056 <span class="special">>></span> <span class="special">*((</span><span class="char">'*'</span> <span class="special">>></span> 1057 <span class="identifier">fact</span><span class="special">)</span> 1058 <span class="special">|</span> <span class="special">(</span><span class="char">'/'</span> <span class="special">>></span> 1059 <span class="identifier">fact</span><span class="special">));</span></code> 1060 </p> 1061 </td> 1062<td> 1063 <p> 1064 <code class="literal">group: cnt=3 refs={expr} deps={fact,term}<br> expr: 1065 cnt=4 refs={} deps={group,fact,term}<br> fact: cnt=2 refs={expr,group} 1066 deps={term}<br> term: cnt=1 refs={expr,group,fact} deps={}</code> 1067 </p> 1068 </td> 1069</tr> 1070<tr> 1071<td> 1072 <p> 1073 4) <code class="computeroutput"><span class="identifier">expr</span> <span class="special">=</span> 1074 <span class="identifier">term</span> <span class="special">>></span> 1075 <span class="special">*((</span><span class="char">'+'</span> 1076 <span class="special">>></span> <span class="identifier">term</span><span class="special">)</span> <span class="special">|</span> 1077 <span class="special">(</span><span class="char">'-'</span> 1078 <span class="special">>></span> <span class="identifier">term</span><span class="special">));</span></code> 1079 </p> 1080 </td> 1081<td> 1082 <p> 1083 <code class="literal">group: cnt=5 refs={expr,group,fact,term} deps={expr,fact,term}<br> 1084 expr: cnt=5 refs={expr,group,fact,term} deps={group,fact,term}<br> 1085 fact: cnt=5 refs={expr,group,fact,term} deps={expr,group,term}<br> 1086 term: cnt=5 refs={expr,group,fact,term} deps={expr,group,fact}</code> 1087 </p> 1088 </td> 1089</tr> 1090<tr> 1091<td> 1092 <p> 1093 5) <code class="computeroutput"><span class="special">}</span></code> 1094 </p> 1095 </td> 1096<td> 1097 <p> 1098 <code class="literal">expr: cnt=2 refs={expr,group,fact,term} deps={group,fact,term}</code> 1099 </p> 1100 </td> 1101</tr> 1102</tbody> 1103</table></div> 1104<p> 1105 This shows how references and dependencies propagate when creating cycles 1106 of objects. After line (4), which closes the cycle, every object has a 1107 ref-count on every other object, even to itself. So how does this not leak? 1108 Read on. 1109 </p> 1110<h3> 1111<a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.h3"></a> 1112 <span class="phrase"><a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.cycle_breaking"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.cycle_breaking">Cycle 1113 Breaking</a> 1114 </h3> 1115<p> 1116 Now that the bodies have their sets of references and dependencies, the 1117 hard part is done. All that remains is to decide when and where to break 1118 the cycle. That is the job of <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code>, which is part of the handle. The 1119 <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> 1120 holds 2 <code class="computeroutput"><span class="identifier">shared_ptr</span></code>s. The 1121 first, obviously, is the <code class="computeroutput"><span class="identifier">shared_ptr</span><span class="special"><</span><span class="identifier">body</span><span class="special">></span></code> -- the reference to the body to which 1122 this handle refers. The other <code class="computeroutput"><span class="identifier">shared_ptr</span></code> 1123 is used to break the cycle. It ensures that when all the handles to a body 1124 go out of scope, the body's set of references is cleared. 1125 </p> 1126<p> 1127 This suggests that more than one handle can refer to a body. In fact, 1128 <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> 1129 gives you copy-on-write semantics -- when you copy a handle, the body is 1130 shared. That makes copies very efficient. Eventually, all the handles to 1131 a particular body go out of scope. When that happens, the ref count to 1132 the body might still be greater than 0 because some other body (or this 1133 body itself!) might be holding a reference to it. However, we are certain 1134 that the cycle-breaker's ref-count goes to 0 because the cycle-breaker 1135 only lives in handles. No more handles, no more cycle-breakers. 1136 </p> 1137<p> 1138 What does the cycle-breaker do? Recall that the body has a set of references 1139 of type <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">set</span><span class="special"><</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="identifier">body</span><span class="special">></span> <span class="special">></span></code>. Let's call this type "references_type". 1140 The cycle-breaker is a <code class="computeroutput"><span class="identifier">shared_ptr</span><span class="special"><</span><span class="identifier">references_type</span><span class="special">></span></code>. It uses a custom deleter, which is 1141 defined as follows: 1142 </p> 1143<pre class="programlisting"><span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">DerivedT</span><span class="special">></span> 1144<span class="keyword">struct</span> <span class="identifier">reference_deleter</span> 1145<span class="special">{</span> 1146 <span class="keyword">void</span> <span class="keyword">operator</span> <span class="special">()(</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">set</span><span class="special"><</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="identifier">DerivedT</span><span class="special">></span> <span class="special">></span> <span class="special">*</span><span class="identifier">refs</span><span class="special">)</span> <span class="keyword">const</span> 1147 <span class="special">{</span> 1148 <span class="identifier">refs</span><span class="special">-></span><span class="identifier">clear</span><span class="special">();</span> 1149 <span class="special">}</span> 1150<span class="special">};</span> 1151</pre> 1152<p> 1153 The job of to the cycle breaker is to ensure that when the last handle 1154 to a body goes away, the body's set of references is cleared. That's it. 1155 </p> 1156<p> 1157 We can clearly see how this guarantees that all bodies are cleaned up eventually. 1158 Once every handle has gone out of scope, all the bodies' sets of references 1159 will be cleared, leaving none with a non-zero ref-count. No leaks, guaranteed. 1160 </p> 1161<p> 1162 It's a bit harder to see how this guarantees no dangling references. Imagine 1163 that there are 3 bodies: A, B and C. A refers to B which refers to C. Now 1164 all the handles to B go out of scope, so B's set of references is cleared. 1165 Doesn't this mean that C gets deleted, even though it is being used (indirectly) 1166 by A? It doesn't. This situation can never occur because we propagated 1167 the references and dependencies above such that A will be holding a reference 1168 directly to C in addition to B. When B's set of references is cleared, 1169 no bodies get deleted, because they are all still in use by A. 1170 </p> 1171<h3> 1172<a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.h4"></a> 1173 <span class="phrase"><a name="boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.future_work"></a></span><a class="link" href="appendices.html#boost_xpressive.appendices.appendix_5__implementation_notes.cycle_collection_with___tracking_ptr___.future_work">Future 1174 Work</a> 1175 </h3> 1176<p> 1177 All these <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">set</span></code>s and <code class="computeroutput"><span class="identifier">shared_ptr</span></code>s 1178 and <code class="computeroutput"><span class="identifier">weak_ptr</span></code>s! Very inefficient. 1179 I used them because they were handy. I could probably do better. 1180 </p> 1181<p> 1182 Also, some objects stick around longer than they need to. Consider: 1183 </p> 1184<pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">b</span><span class="special">;</span> 1185<span class="special">{</span> 1186 <span class="identifier">sregex</span> <span class="identifier">a</span> <span class="special">=</span> <span class="identifier">_</span><span class="special">;</span> 1187 <span class="identifier">b</span> <span class="special">=</span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">a</span><span class="special">);</span> 1188 <span class="identifier">b</span> <span class="special">=</span> <span class="identifier">_</span><span class="special">;</span> 1189<span class="special">}</span> 1190<span class="comment">// a is still alive here!</span> 1191</pre> 1192<p> 1193 Due to the way references and dependencies are propagated, the <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">set</span></code> of references can only grow. It never 1194 shrinks, even when some references are no longer needed. For xpressive 1195 this isn't an issue. The graphs of referential objects generally stay small 1196 and isolated. If someone were to try to use <code class="computeroutput"><span class="identifier">tracking_ptr</span><span class="special"><></span></code> as a general ref-count-cycle-collection 1197 mechanism, this problem would have to be addressed. 1198 </p> 1199</div> 1200</div> 1201</div> 1202<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 1203<td align="left"></td> 1204<td align="right"><div class="copyright-footer">Copyright © 2007 Eric Niebler<p> 1205 Distributed under the Boost Software License, Version 1.0. (See accompanying 1206 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) 1207 </p> 1208</div></td> 1209</tr></table> 1210<hr> 1211<div class="spirit-nav"> 1212<a accesskey="p" href="../boost_xpressive/acknowledgments.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../xpressive.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../yap.html"><img src="../../../doc/src/images/next.png" alt="Next"></a> 1213</div> 1214</body> 1215</html> 1216