1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>Why spreadsort?</title> 5<link rel="stylesheet" href="../../../../../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../../../../../index.html" title="Boost.Sort"> 8<link rel="up" href="../rationale.html" title="Rationale"> 9<link rel="prev" href="hybrid_radix.html" title="Hybrid Radix"> 10<link rel="next" href="unstable_sort.html" title="Unstable Sorting"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="hybrid_radix.html"><img src="../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../rationale.html"><img src="../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unstable_sort.html"><img src="../../../../../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h6 class="title"> 27<a name="sort.single_thread.spreadsort.sort_hpp.rationale.why_spreadsort"></a><a class="link" href="why_spreadsort.html" title="Why spreadsort?">Why 28 spreadsort?</a> 29</h6></div></div></div> 30<p> 31 The <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/spreadso_idm45878543880736.html" title="Function template spreadsort">spreadsort</a></code></code> 32 algorithm used in this library is designed to provide best possible 33 worst-case performance, while still being cache-friendly. It provides 34 the better of <span class="emphasis"><em>(N*log(K/S + S))</em></span> and <span class="emphasis"><em>(N*log(N))</em></span> 35 worst-case time, where <span class="emphasis"><em>K</em></span> is the log of the range. 36 The log of the range is normally the length in bits of the data type; 37 32 for a 32-bit integer. 38 </p> 39<p> 40 <code class="computeroutput">flash_sort</code> (another hybrid algorithm), by comparison is 41 <span class="emphasis"><em>(N)</em></span> for evenly distributed lists. The problem 42 is, <code class="computeroutput">flash_sort</code> is merely an MSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix 43 sort</a> combined with <span class="emphasis"><em>(N*N)</em></span> insertion sort 44 to deal with small subsets where the MSD Radix Sort is inefficient, 45 so it is inefficient with chunks of data around the size at which it 46 switches to <code class="computeroutput">insertion_sort</code>, and ends up operating as an 47 enhanced MSD Radix Sort. For uneven distributions this makes it especially 48 inefficient. 49 </p> 50<p> 51 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/integer__idm45878544022768.html" title="Function template integer_sort">integer_sort</a></code></code> 52 and <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/float_so_idm45878545215024.html" title="Function template float_sort">float_sort</a></code></code> 53 use <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a> 54 instead, which provides <span class="emphasis"><em>(N*log(N))</em></span> performance 55 for these medium-sized pieces. Also, <code class="computeroutput">flash_sort</code>'s <span class="emphasis"><em>(N)</em></span> 56 performance for even distributions comes at the cost of cache misses, 57 which on modern architectures are extremely expensive, and in testing 58 on modern systems ends up being slower than cutting up the data in 59 multiple, cache-friendly steps. Also worth noting is that on most modern 60 computers, <code class="computeroutput">log2(available RAM)/log2(L1 cache size)</code> is 61 around 3, where a cache miss takes more than 3 times as long as an 62 in-cache random-access, and the size of <span class="emphasis"><em>max_splits</em></span> 63 is tuned to the size of the cache. On a computer where cache misses 64 aren't this expensive, <span class="emphasis"><em>max_splits</em></span> could be increased 65 to a large value, or eliminated entirely, and <code class="computeroutput">integer_sort/float_sort</code> 66 would have the same <span class="emphasis"><em>(N)</em></span> performance on even distributions. 67 </p> 68<p> 69 Adaptive Left Radix (ALR) is similar to <code class="computeroutput">flash_sort</code>, but 70 more cache-friendly. It still uses insertion_sort. Because ALR uses 71 <span class="emphasis"><em>(N*N)</em></span> <code class="computeroutput">insertion_sort</code>, it isn't efficient 72 to use the comparison-based fallback sort on large lists, and if the 73 data is clustered in small chunks just over the fallback size with 74 a few outliers, radix-based sorting iterates many times doing little 75 sorting with high overhead. Asymptotically, ALR is still <span class="emphasis"><em>(N*log(K/S 76 + S))</em></span>, but with a very small <span class="emphasis"><em>S</em></span> (about 77 2 in the worst case), which compares unfavorably with the 11 default 78 value of <span class="emphasis"><em>max_splits</em></span> for Spreadsort. 79 </p> 80<p> 81 ALR also does not have the <span class="emphasis"><em>(N*log(N))</em></span> fallback, 82 so for small lists that are not evenly distributed it is extremely 83 inefficient. See the <code class="computeroutput">alrbreaker</code> and <code class="computeroutput">binaryalrbreaker</code> 84 testcases for examples; either replace the call to sort with a call 85 to ALR and update the ALR_THRESHOLD at the top, or as a quick comparison 86 make <code class="computeroutput">get_max_count return ALR_THRESHOLD</code> (20 by default 87 based upon the paper). These small tests take 4-10 times as long with 88 ALR as <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a> 89 in the author's testing, depending on the test system, because they 90 are trying to sort a highly uneven distribution. Normal Spreadsort 91 does much better with them, because <code class="computeroutput">get_max_count</code> is designed 92 around minimizing worst-case runtime. 93 </p> 94<p> 95 <code class="computeroutput">burst_sort</code> is an efficient hybrid algorithm for strings 96 that uses substantial additional memory. 97 </p> 98<p> 99 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 100 uses minimal additional memory by comparison. Speed comparisons between 101 the two haven't been made, but the better memory efficiency makes 102 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 103 more general. 104 </p> 105<p> 106 <code class="computeroutput">postal_sort</code> and <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 107 are similar. A direct performance comparison would be welcome, but 108 an efficient version of <code class="computeroutput">postal_sort</code> was not found in a 109 search for source. 110 </p> 111<p> 112 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 113 is most similar to the <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American 114 flag sort</a> algorithm. The main difference is that it doesn't 115 bother trying to optimize how empty buckets/piles are handled, instead 116 just checking to see if all characters at the current index are equal. 117 Other differences are using <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a> 118 as the fallback algorithm, and a larger fallback size (256 vs. 16), 119 which makes empty pile handling less important. 120 </p> 121<p> 122 Another difference is not applying the stack-size restriction. Because 123 of the equality check in <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code>, 124 it would take <span class="emphasis"><em>m*m</em></span> memory worth of strings to force 125 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 126 to create a stack of depth <span class="emphasis"><em>m</em></span>. This problem isn't 127 a realistic one on modern systems with multi-megabyte stacksize limits, 128 where main memory would be exhausted holding the long strings necessary 129 to exceed the stacksize limit. <code class="literal"><code class="computeroutput"><a class="link" href="../../../../../boost/sort/spreadsort/string_s_idm45878543797168.html" title="Function template string_sort">string_sort</a></code></code> 130 can be thought of as modernizing <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American 131 flag sort</a> to take advantage of <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a> 132 as a fallback algorithm. In the author's testing, <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American 133 flag sort</a> (on <code class="computeroutput">std::strings</code>) had comparable runtime 134 to <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>, 135 but making a hybrid of the two allows reduced overhead and substantially 136 superior performance. 137 </p> 138</div> 139<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 140<td align="left"></td> 141<td align="right"><div class="copyright-footer">Copyright © 2014-2017 Steven 142 Ross, Francisco Tapia, Orson Peters<p> 143 Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost 144 Software License, Version 1.0</a>. 145 </p> 146</div></td> 147</tr></table> 148<hr> 149<div class="spirit-nav"> 150<a accesskey="p" href="hybrid_radix.html"><img src="../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../rationale.html"><img src="../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unstable_sort.html"><img src="../../../../../../../../../doc/src/images/next.png" alt="Next"></a> 151</div> 152</body> 153</html> 154