• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
4<title>Performance</title>
5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7<link rel="home" href="../index.html" title="Chapter 1. Fiber">
8<link rel="up" href="../index.html" title="Chapter 1. Fiber">
9<link rel="prev" href="worker.html" title="Running with worker threads">
10<link rel="next" href="tuning.html" title="Tuning">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<table cellpadding="2" width="100%"><tr>
14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
15<td align="center"><a href="../../../../../index.html">Home</a></td>
16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19<td align="center"><a href="../../../../../more/index.htm">More</a></td>
20</tr></table>
21<hr>
22<div class="spirit-nav">
23<a accesskey="p" href="worker.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tuning.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
24</div>
25<div class="section">
26<div class="titlepage"><div><div><h2 class="title" style="clear: both">
27<a name="fiber.performance"></a><a class="link" href="performance.html" title="Performance">Performance</a>
28</h2></div></div></div>
29<p>
30      Performance measurements were taken using <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">chrono</span><span class="special">::</span><span class="identifier">highresolution_clock</span></code>,
31      with overhead corrections. The code was compiled with gcc-6.3.1, using build
32      options: variant = release, optimization = speed. Tests were executed on dual
33      Intel XEON E5 2620v4 2.2GHz, 16C/32T, 64GB RAM, running Linux (x86_64).
34    </p>
35<p>
36      Measurements headed 1C/1T were run in a single-threaded process.
37    </p>
38<p>
39      The <a href="https://github.com/atemerev/skynet" target="_top">microbenchmark <span class="emphasis"><em>syknet</em></span></a>
40      from Alexander Temerev was ported and used for performance measurements. At
41      the root the test spawns 10 threads-of-execution (ToE), e.g. actor/goroutine/fiber
42      etc.. Each spawned ToE spawns additional 10 ToEs ... until <span class="bold"><strong>1,000,000</strong></span>
43      ToEs are created. ToEs return back their ordinal numbers (0 ... 999,999), which
44      are summed on the previous level and sent back upstream, until reaching the
45      root. The test was run 10-20 times, producing a range of values for each measurement.
46    </p>
47<div class="table">
48<a name="fiber.performance.time_per_actor_erlang_process_goroutine__other_languages___average_over_1_000_000_"></a><p class="title"><b>Table 1.2. time per actor/erlang process/goroutine (other languages) (average over
49      1,000,000)</b></p>
50<div class="table-contents"><table class="table" summary="time per actor/erlang process/goroutine (other languages) (average over
51      1,000,000)">
52<colgroup>
53<col>
54<col>
55<col>
56</colgroup>
57<thead><tr>
58<th>
59              <p>
60                Haskell | stack-1.4.0/ghc-8.0.1
61              </p>
62            </th>
63<th>
64              <p>
65                Go | go1.8.1
66              </p>
67            </th>
68<th>
69              <p>
70                Erlang | erts-8.3
71              </p>
72            </th>
73</tr></thead>
74<tbody><tr>
75<td>
76              <p>
77                0.05 µs - 0.06 µs
78              </p>
79            </td>
80<td>
81              <p>
82                0.42 µs - 0.49 µs
83              </p>
84            </td>
85<td>
86              <p>
87                0.63 µs - 0.73 µs
88              </p>
89            </td>
90</tr></tbody>
91</table></div>
92</div>
93<br class="table-break"><p>
94      Pthreads are created with a stack size of 8kB while <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">thread</span></code>'s
95      use the system default (1MB - 2MB). The microbenchmark could <span class="bold"><strong>not</strong></span>
96      be <span class="bold"><strong>run</strong></span> with 1,000,000 threads because of
97      <span class="bold"><strong>resource exhaustion</strong></span> (pthread and std::thread).
98      Instead the test runs only at <span class="bold"><strong>10,000</strong></span> threads.
99    </p>
100<div class="table">
101<a name="fiber.performance.time_per_thread__average_over_10_000___unable_to_spawn_1_000_000_threads_"></a><p class="title"><b>Table 1.3. time per thread (average over 10,000 - unable to spawn 1,000,000 threads)</b></p>
102<div class="table-contents"><table class="table" summary="time per thread (average over 10,000 - unable to spawn 1,000,000 threads)">
103<colgroup>
104<col>
105<col>
106<col>
107</colgroup>
108<thead><tr>
109<th>
110              <p>
111                pthread
112              </p>
113            </th>
114<th>
115              <p>
116                <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">thread</span></code>
117              </p>
118            </th>
119<th>
120              <p>
121                <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">async</span></code>
122              </p>
123            </th>
124</tr></thead>
125<tbody><tr>
126<td>
127              <p>
128                54 µs - 73 µs
129              </p>
130            </td>
131<td>
132              <p>
133                52 µs - 73 µs
134              </p>
135            </td>
136<td>
137              <p>
138                106 µs - 122 µs
139              </p>
140            </td>
141</tr></tbody>
142</table></div>
143</div>
144<br class="table-break"><p>
145      The test utilizes 16 cores with Symmetric MultiThreading enabled (32 logical
146      CPUs). The fiber stacks are allocated by <a class="link" href="stack.html#class_fixedsize_stack"><code class="computeroutput">fixedsize_stack</code></a>.
147    </p>
148<p>
149      As the benchmark shows, the memory allocation algorithm is significant for
150      performance in a multithreaded environment. The tests use glibc’s memory allocation
151      algorithm (based on ptmalloc2) as well as Google’s <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html" target="_top">TCmalloc</a>
152      (via linkflags="-ltcmalloc").<a href="#ftn.fiber.performance.f0" class="footnote" name="fiber.performance.f0"><sup class="footnote">[9]</sup></a>
153    </p>
154<p>
155      In the <a class="link" href="scheduling.html#class_work_stealing"><code class="computeroutput">work_stealing</code></a> scheduling algorithm, each thread has
156      its own local queue. Fibers that are ready to run are pushed to and popped
157      from the local queue. If the queue runs out of ready fibers, fibers are stolen
158      from the local queues of other participating threads.
159    </p>
160<div class="table">
161<a name="fiber.performance.time_per_fiber__average_over_1_000_000_"></a><p class="title"><b>Table 1.4. time per fiber (average over 1.000.000)</b></p>
162<div class="table-contents"><table class="table" summary="time per fiber (average over 1.000.000)">
163<colgroup>
164<col>
165<col>
166</colgroup>
167<thead><tr>
168<th>
169              <p>
170                fiber (16C/32T, work stealing, tcmalloc)
171              </p>
172            </th>
173<th>
174              <p>
175                fiber (1C/1T, round robin, tcmalloc)
176              </p>
177            </th>
178</tr></thead>
179<tbody><tr>
180<td>
181              <p>
182                0.05 µs - 0.09 µs
183              </p>
184            </td>
185<td>
186              <p>
187                1.69 µs - 1.79 µs
188              </p>
189            </td>
190</tr></tbody>
191</table></div>
192</div>
193<br class="table-break"><div class="footnotes">
194<br><hr style="width:100; text-align:left;margin-left: 0">
195<div id="ftn.fiber.performance.f0" class="footnote"><p><a href="#fiber.performance.f0" class="para"><sup class="para">[9] </sup></a>
196        Tais B. Ferreira, Rivalino Matias, Autran Macedo, Lucio B. Araujo <span class="quote">“<span class="quote">An
197        Experimental Study on Memory Allocators in Multicore and Multithreaded Applications</span>”</span>,
198        PDCAT ’11 Proceedings of the 2011 12th International Conference on Parallel
199        and Distributed Computing, Applications and Technologies, pages 92-98
200      </p></div>
201</div>
202</div>
203<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
204<td align="left"></td>
205<td align="right"><div class="copyright-footer">Copyright © 2013 Oliver Kowalke<p>
206        Distributed under the Boost Software License, Version 1.0. (See accompanying
207        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
208      </p>
209</div></td>
210</tr></table>
211<hr>
212<div class="spirit-nav">
213<a accesskey="p" href="worker.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tuning.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
214</div>
215</body>
216</html>
217