1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>String Sort</title> 5<link rel="stylesheet" href="../../../../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../../../../index.html" title="Boost.Sort"> 8<link rel="up" href="../sort_hpp.html" title="Spreadsort"> 9<link rel="prev" href="float_sort.html" title="Float Sort"> 10<link rel="next" href="rationale.html" title="Rationale"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="float_sort.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h5 class="title"> 27<a name="sort.single_thread.spreadsort.sort_hpp.string_sort"></a><a class="link" href="string_sort.html" title="String Sort">String 28 Sort</a> 29</h5></div></div></div> 30<p> 31 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 32 is a hybrid radix-based/comparison-based algorithm that sorts strings 33 of characters (or arrays of binary data) in ascending order. 34 </p> 35<p> 36 The simplest version (no functors) sorts strings of items that can cast 37 to an unsigned data type (such as an unsigned char), have a &lt; 38 operator, have a size function, and have a data() function that returns 39 a pointer to an array of characters, such as a std::string. The functor 40 version can sort any data type that has a strict weak ordering, via templating, 41 but requires definitions of a get_char (acts like x[offset] on a string 42 or a byte array), get_length (returns length of the string being sorted), 43 and a comparison functor. Individual characters returned by get_char 44 must support the != operator and have an unsigned value that defines 45 their lexicographical order. 46 </p> 47<p> 48 This algorithm is not efficient for character types larger than 2 bytes 49 each, and is optimized for one-byte character strings. For this reason, 50 <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a> 51 will be called instead if the character type is of size > 2. 52 </p> 53<p> 54 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 55 has a special optimization for identical substrings. This adds some overhead 56 on random data, but identical substrings are common in real strings. 57 </p> 58<p> 59 reverse_string_sort sorts strings in reverse (decending) order, but is 60 otherwise identical. <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 61 is sufficiently flexible that it should sort any data type that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a> 62 can, assuming the user provides appropriate functors that index into 63 a key. 64 </p> 65<p> 66 <a href="../../../../../../doc/graph/windows_string_sort.htm" target="_top">Windows String Sort</a> 67 </p> 68<p> 69 <a href="../../../../../../doc/graph/osx_string_sort.htm" target="_top">OSX String Sort</a> 70 </p> 71<div class="section"> 72<div class="titlepage"><div><div><h6 class="title"> 73<a name="sort.single_thread.spreadsort.sort_hpp.string_sort.stringsort_examples"></a><a class="link" href="string_sort.html#sort.single_thread.spreadsort.sort_hpp.string_sort.stringsort_examples" title="String Sort Examples">String 74 Sort Examples</a> 75</h6></div></div></div> 76<p> 77 See <a href="../../../../../../example/stringfunctorsample.cpp" target="_top">stringfunctorsample.cpp</a> 78 for an example of how to sort structs using a string key and all functors: 79 </p> 80<pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span> 81 <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> 82 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">a</span><span class="special">;</span> 83 <span class="special">}</span> 84<span class="special">};</span> 85</pre> 86<pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span> 87 <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> 88 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">[</span><span class="identifier">offset</span><span class="special">];</span> 89 <span class="special">}</span> 90<span class="special">};</span> 91</pre> 92<pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span> 93 <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span><span class="special">{</span> <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">}</span> 94<span class="special">};</span> 95</pre> 96<p> 97 and these functors are used thus: 98 </p> 99<pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span> 100</pre> 101<p> 102 See <a href="../../../../../../example/generalizedstruct.cpp" target="_top">generalizedstruct.cpp</a> 103 for a working example of a generalized approach to sort structs by 104 a sequence of integer, float, and multiple string keys using string_sort: 105 </p> 106<pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">DATA_TYPE</span> <span class="special">{</span> 107 <span class="identifier">time_t</span> <span class="identifier">birth</span><span class="special">;</span> 108 <span class="keyword">float</span> <span class="identifier">net_worth</span><span class="special">;</span> 109 <span class="identifier">string</span> <span class="identifier">first_name</span><span class="special">;</span> 110 <span class="identifier">string</span> <span class="identifier">last_name</span><span class="special">;</span> 111<span class="special">};</span> 112 113<span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">birth_size</span> <span class="special">=</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">time_t</span><span class="special">);</span> 114<span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">first_name_offset</span> <span class="special">=</span> <span class="identifier">birth_size</span> <span class="special">+</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="keyword">float</span><span class="special">);</span> 115<span class="keyword">static</span> <span class="keyword">const</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">uint64_t</span> <span class="identifier">base_mask</span> <span class="special">=</span> <span class="number">0xff</span><span class="special">;</span> 116 117<span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span> 118 <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> 119 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">)</span> <span class="special">{</span> 120 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">;</span> 121 <span class="special">}</span> 122 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">)</span> <span class="special">{</span> 123 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">;</span> 124 <span class="special">}</span> 125 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">)</span> <span class="special">{</span> 126 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">;</span> 127 <span class="special">}</span> 128 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">;</span> 129 <span class="special">}</span> 130<span class="special">};</span> 131 132<span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span> 133 <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> 134 <span class="comment">// Sort date as a signed int, returning the appropriate byte.</span> 135 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span> 136 <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">birth_size</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span> 137 <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">&</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special"><<</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">bit_shift</span><span class="special">;</span> 138 <span class="comment">// Handling the sign bit. Unnecessary if the data is always positive.</span> 139 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span> 140 <span class="keyword">return</span> <span class="identifier">result</span> <span class="special">^</span> <span class="number">128</span><span class="special">;</span> 141 <span class="special">}</span> 142 143 <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span> 144 <span class="special">}</span> 145 146 <span class="comment">// Sort a signed float. This requires reversing the order of negatives</span> 147 <span class="comment">// because of the way floats are represented in bits.</span> 148 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">first_name_offset</span><span class="special">)</span> <span class="special">{</span> 149 <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">first_name_offset</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span> 150 <span class="keyword">unsigned</span> <span class="identifier">key</span> <span class="special">=</span> <span class="identifier">float_mem_cast</span><span class="special"><</span><span class="keyword">float</span><span class="special">,</span> <span class="keyword">unsigned</span><span class="special">>(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">);</span> 151 <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">key</span> <span class="special">&</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special"><<</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">bit_shift</span><span class="special">;</span> 152 <span class="comment">// Handling the sign.</span> 153 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special"><</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span> 154 <span class="keyword">return</span> <span class="number">255</span> <span class="special">-</span> <span class="identifier">result</span><span class="special">;</span> 155 <span class="special">}</span> 156 <span class="comment">// Increasing positives so they are higher than negatives.</span> 157 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span> 158 <span class="keyword">return</span> <span class="number">128</span> <span class="special">+</span> <span class="identifier">result</span><span class="special">;</span> 159 <span class="special">}</span> 160 161 <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span> 162 <span class="special">}</span> 163 164 <span class="comment">// Sort a string that is before the end. This approach supports embedded</span> 165 <span class="comment">// nulls. If embedded nulls are not required, then just delete the "* 2"</span> 166 <span class="comment">// and the inside of the following if just becomes:</span> 167 <span class="comment">// return x.first_name[offset - first_name_offset];</span> 168 <span class="keyword">const</span> <span class="keyword">unsigned</span> <span class="identifier">first_name_end_offset</span> <span class="special">=</span> 169 <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span><span class="special">;</span> 170 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span> 171 <span class="keyword">int</span> <span class="identifier">char_offset</span> <span class="special">=</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_offset</span><span class="special">;</span> 172 <span class="comment">// This signals that the string continues.</span> 173 <span class="keyword">if</span> <span class="special">(!(</span><span class="identifier">char_offset</span> <span class="special">&</span> <span class="number">1</span><span class="special">))</span> <span class="special">{</span> 174 <span class="keyword">return</span> <span class="number">1</span><span class="special">;</span> 175 <span class="special">}</span> 176 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">[</span><span class="identifier">char_offset</span> <span class="special">>></span> <span class="number">1</span><span class="special">];</span> 177 <span class="special">}</span> 178 179 <span class="comment">// This signals that the string has ended, so that shorter strings come</span> 180 <span class="comment">// before longer ones.</span> 181 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span> 182 <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> 183 <span class="special">}</span> 184 185 <span class="comment">// The final string needs no special consideration.</span> 186 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">[</span><span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_end_offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">];</span> 187 <span class="special">}</span> 188<span class="special">};</span> 189 190<span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span> 191 <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> 192 <span class="keyword">return</span> <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span> <span class="special">+</span> <span class="number">1</span> <span class="special">+</span> 193 <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> 194 <span class="special">}</span> 195<span class="special">};</span> 196</pre> 197<pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span> 198</pre> 199<p> 200 Other examples: 201 </p> 202<p> 203 <a href="../../../../../../example/stringsample.cpp" target="_top">String sort.</a> 204 </p> 205<p> 206 <a href="../../../../../../example/reversestringsample.cpp" target="_top">Reverse string sort.</a> 207 </p> 208<p> 209 <a href="../../../../../../example/wstringsample.cpp" target="_top">Wide character string 210 sort.</a> 211 </p> 212<p> 213 <a href="../../../../../../example/caseinsensitive.cpp" target="_top">Case insensitive string 214 sort.</a> 215 </p> 216<p> 217 <a href="../../../../../../example/charstringsample.cpp" target="_top">Sort structs using 218 a string key and indexing functors.</a> 219 </p> 220<p> 221 <a href="../../../../../../example/reversestringfunctorsample.cpp" target="_top">Sort structs 222 using a string keynd all functors in reverse order.</a> 223 </p> 224</div> 225</div> 226<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 227<td align="left"></td> 228<td align="right"><div class="copyright-footer">Copyright © 2014-2017 Steven 229 Ross, Francisco Tapia, Orson Peters<p> 230 Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost 231 Software License, Version 1.0</a>. 232 </p> 233</div></td> 234</tr></table> 235<hr> 236<div class="spirit-nav"> 237<a accesskey="p" href="float_sort.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a> 238</div> 239</body> 240</html> 241