1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>Performance</title> 5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../index.html" title="Chapter 1. Fiber"> 8<link rel="up" href="../index.html" title="Chapter 1. Fiber"> 9<link rel="prev" href="worker.html" title="Running with worker threads"> 10<link rel="next" href="tuning.html" title="Tuning"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="worker.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tuning.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 27<a name="fiber.performance"></a><a class="link" href="performance.html" title="Performance">Performance</a> 28</h2></div></div></div> 29<p> 30 Performance measurements were taken using <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">chrono</span><span class="special">::</span><span class="identifier">highresolution_clock</span></code>, 31 with overhead corrections. The code was compiled with gcc-6.3.1, using build 32 options: variant = release, optimization = speed. Tests were executed on dual 33 Intel XEON E5 2620v4 2.2GHz, 16C/32T, 64GB RAM, running Linux (x86_64). 34 </p> 35<p> 36 Measurements headed 1C/1T were run in a single-threaded process. 37 </p> 38<p> 39 The <a href="https://github.com/atemerev/skynet" target="_top">microbenchmark <span class="emphasis"><em>syknet</em></span></a> 40 from Alexander Temerev was ported and used for performance measurements. At 41 the root the test spawns 10 threads-of-execution (ToE), e.g. actor/goroutine/fiber 42 etc.. Each spawned ToE spawns additional 10 ToEs ... until <span class="bold"><strong>1,000,000</strong></span> 43 ToEs are created. ToEs return back their ordinal numbers (0 ... 999,999), which 44 are summed on the previous level and sent back upstream, until reaching the 45 root. The test was run 10-20 times, producing a range of values for each measurement. 46 </p> 47<div class="table"> 48<a name="fiber.performance.time_per_actor_erlang_process_goroutine__other_languages___average_over_1_000_000_"></a><p class="title"><b>Table 1.2. time per actor/erlang process/goroutine (other languages) (average over 49 1,000,000)</b></p> 50<div class="table-contents"><table class="table" summary="time per actor/erlang process/goroutine (other languages) (average over 51 1,000,000)"> 52<colgroup> 53<col> 54<col> 55<col> 56</colgroup> 57<thead><tr> 58<th> 59 <p> 60 Haskell | stack-1.4.0/ghc-8.0.1 61 </p> 62 </th> 63<th> 64 <p> 65 Go | go1.8.1 66 </p> 67 </th> 68<th> 69 <p> 70 Erlang | erts-8.3 71 </p> 72 </th> 73</tr></thead> 74<tbody><tr> 75<td> 76 <p> 77 0.05 µs - 0.06 µs 78 </p> 79 </td> 80<td> 81 <p> 82 0.42 µs - 0.49 µs 83 </p> 84 </td> 85<td> 86 <p> 87 0.63 µs - 0.73 µs 88 </p> 89 </td> 90</tr></tbody> 91</table></div> 92</div> 93<br class="table-break"><p> 94 Pthreads are created with a stack size of 8kB while <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">thread</span></code>'s 95 use the system default (1MB - 2MB). The microbenchmark could <span class="bold"><strong>not</strong></span> 96 be <span class="bold"><strong>run</strong></span> with 1,000,000 threads because of 97 <span class="bold"><strong>resource exhaustion</strong></span> (pthread and std::thread). 98 Instead the test runs only at <span class="bold"><strong>10,000</strong></span> threads. 99 </p> 100<div class="table"> 101<a name="fiber.performance.time_per_thread__average_over_10_000___unable_to_spawn_1_000_000_threads_"></a><p class="title"><b>Table 1.3. time per thread (average over 10,000 - unable to spawn 1,000,000 threads)</b></p> 102<div class="table-contents"><table class="table" summary="time per thread (average over 10,000 - unable to spawn 1,000,000 threads)"> 103<colgroup> 104<col> 105<col> 106<col> 107</colgroup> 108<thead><tr> 109<th> 110 <p> 111 pthread 112 </p> 113 </th> 114<th> 115 <p> 116 <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">thread</span></code> 117 </p> 118 </th> 119<th> 120 <p> 121 <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">async</span></code> 122 </p> 123 </th> 124</tr></thead> 125<tbody><tr> 126<td> 127 <p> 128 54 µs - 73 µs 129 </p> 130 </td> 131<td> 132 <p> 133 52 µs - 73 µs 134 </p> 135 </td> 136<td> 137 <p> 138 106 µs - 122 µs 139 </p> 140 </td> 141</tr></tbody> 142</table></div> 143</div> 144<br class="table-break"><p> 145 The test utilizes 16 cores with Symmetric MultiThreading enabled (32 logical 146 CPUs). The fiber stacks are allocated by <a class="link" href="stack.html#class_fixedsize_stack"><code class="computeroutput">fixedsize_stack</code></a>. 147 </p> 148<p> 149 As the benchmark shows, the memory allocation algorithm is significant for 150 performance in a multithreaded environment. The tests use glibc’s memory allocation 151 algorithm (based on ptmalloc2) as well as Google’s <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html" target="_top">TCmalloc</a> 152 (via linkflags="-ltcmalloc").<a href="#ftn.fiber.performance.f0" class="footnote" name="fiber.performance.f0"><sup class="footnote">[9]</sup></a> 153 </p> 154<p> 155 In the <a class="link" href="scheduling.html#class_work_stealing"><code class="computeroutput">work_stealing</code></a> scheduling algorithm, each thread has 156 its own local queue. Fibers that are ready to run are pushed to and popped 157 from the local queue. If the queue runs out of ready fibers, fibers are stolen 158 from the local queues of other participating threads. 159 </p> 160<div class="table"> 161<a name="fiber.performance.time_per_fiber__average_over_1_000_000_"></a><p class="title"><b>Table 1.4. time per fiber (average over 1.000.000)</b></p> 162<div class="table-contents"><table class="table" summary="time per fiber (average over 1.000.000)"> 163<colgroup> 164<col> 165<col> 166</colgroup> 167<thead><tr> 168<th> 169 <p> 170 fiber (16C/32T, work stealing, tcmalloc) 171 </p> 172 </th> 173<th> 174 <p> 175 fiber (1C/1T, round robin, tcmalloc) 176 </p> 177 </th> 178</tr></thead> 179<tbody><tr> 180<td> 181 <p> 182 0.05 µs - 0.09 µs 183 </p> 184 </td> 185<td> 186 <p> 187 1.69 µs - 1.79 µs 188 </p> 189 </td> 190</tr></tbody> 191</table></div> 192</div> 193<br class="table-break"><div class="footnotes"> 194<br><hr style="width:100; text-align:left;margin-left: 0"> 195<div id="ftn.fiber.performance.f0" class="footnote"><p><a href="#fiber.performance.f0" class="para"><sup class="para">[9] </sup></a> 196 Tais B. Ferreira, Rivalino Matias, Autran Macedo, Lucio B. Araujo <span class="quote">“<span class="quote">An 197 Experimental Study on Memory Allocators in Multicore and Multithreaded Applications</span>”</span>, 198 PDCAT ’11 Proceedings of the 2011 12th International Conference on Parallel 199 and Distributed Computing, Applications and Technologies, pages 92-98 200 </p></div> 201</div> 202</div> 203<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 204<td align="left"></td> 205<td align="right"><div class="copyright-footer">Copyright © 2013 Oliver Kowalke<p> 206 Distributed under the Boost Software License, Version 1.0. (See accompanying 207 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) 208 </p> 209</div></td> 210</tr></table> 211<hr> 212<div class="spirit-nav"> 213<a accesskey="p" href="worker.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tuning.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 214</div> 215</body> 216</html> 217