• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
4<title>Tuning</title>
5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7<link rel="home" href="../index.html" title="Chapter 1. Fiber">
8<link rel="up" href="../index.html" title="Chapter 1. Fiber">
9<link rel="prev" href="performance.html" title="Performance">
10<link rel="next" href="custom.html" title="Customization">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<table cellpadding="2" width="100%"><tr>
14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
15<td align="center"><a href="../../../../../index.html">Home</a></td>
16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19<td align="center"><a href="../../../../../more/index.htm">More</a></td>
20</tr></table>
21<hr>
22<div class="spirit-nav">
23<a accesskey="p" href="performance.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="custom.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
24</div>
25<div class="section">
26<div class="titlepage"><div><div><h2 class="title" style="clear: both">
27<a name="fiber.tuning"></a><a name="tuning"></a><a class="link" href="tuning.html" title="Tuning">Tuning</a>
28</h2></div></div></div>
29<h4>
30<a name="fiber.tuning.h0"></a>
31      <span class="phrase"><a name="fiber.tuning.disable_synchronization"></a></span><a class="link" href="tuning.html#fiber.tuning.disable_synchronization">Disable
32      synchronization</a>
33    </h4>
34<p>
35      With <a class="link" href="overview.html#cross_thread_sync"><code class="computeroutput"><span class="identifier">BOOST_FIBERS_NO_ATOMICS</span></code></a>
36      defined at the compiler’s command line, synchronization between fibers (in different
37      threads) is disabled. This is acceptable if the application is single threaded
38      and/or fibers are not synchronized between threads.
39    </p>
40<h4>
41<a name="fiber.tuning.h1"></a>
42      <span class="phrase"><a name="fiber.tuning.memory_allocation"></a></span><a class="link" href="tuning.html#fiber.tuning.memory_allocation">Memory
43      allocation</a>
44    </h4>
45<p>
46      Memory allocation algorithm is significant for performance in a multithreaded
47      environment, especially for <span class="bold"><strong>Boost.Fiber</strong></span> where
48      fiber stacks are allocated on the heap. The default user-level memory allocator
49      (UMA) of glibc is ptmalloc2 but it can be replaced by another UMA that fit
50      better for the concret work-load For instance Google’s <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html" target="_top">TCmalloc</a>
51      enables a better performance at the <span class="emphasis"><em>skynet</em></span> microbenchmark
52      than glibc’s default memory allocator.
53    </p>
54<h4>
55<a name="fiber.tuning.h2"></a>
56      <span class="phrase"><a name="fiber.tuning.scheduling_strategies"></a></span><a class="link" href="tuning.html#fiber.tuning.scheduling_strategies">Scheduling
57      strategies</a>
58    </h4>
59<p>
60      The fibers in a thread are coordinated by a fiber manager. Fibers trade control
61      cooperatively, rather than preemptively. Depending on the work-load several
62      strategies of scheduling the fibers are possible <a href="#ftn.fiber.tuning.f0" class="footnote" name="fiber.tuning.f0"><sup class="footnote">[10]</sup></a> that can be implmented on behalf of <a class="link" href="scheduling.html#class_algorithm"><code class="computeroutput">algorithm</code></a>.
63    </p>
64<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
65<li class="listitem">
66          work-stealing: ready fibers are hold in a local queue, when the fiber-scheduler's
67          local queue runs out of ready fibers, it randomly selects another fiber-scheduler
68          and tries to steal a ready fiber from the victim (implemented in <a class="link" href="scheduling.html#class_work_stealing"><code class="computeroutput">work_stealing</code></a> and
69          <a class="link" href="numa.html#class_numa_work_stealing"><code class="computeroutput">numa::work_stealing</code></a>)
70        </li>
71<li class="listitem">
72          work-requesting: ready fibers are hold in a local queue, when the fiber-scheduler's
73          local queue runs out of ready fibers, it randomly selects another fiber-scheduler
74          and requests for a ready fibers, the victim fiber-scheduler sends a ready-fiber
75          back
76        </li>
77<li class="listitem">
78          work-sharing: ready fibers are hold in a global queue, fiber-scheduler
79          concurrently push and pop ready fibers to/from the global queue (implemented
80          in <a class="link" href="scheduling.html#class_shared_work"><code class="computeroutput">shared_work</code></a>)
81        </li>
82<li class="listitem">
83          work-distribution: fibers that became ready are proactivly distributed
84          to idle fiber-schedulers or fiber-schedulers with low load
85        </li>
86<li class="listitem">
87          work-balancing: a dedicated (helper) fiber-scheduler periodically collects
88          informations about all fiber-scheduler running in other threads and re-distributes
89          ready fibers among them
90        </li>
91</ul></div>
92<h4>
93<a name="fiber.tuning.h3"></a>
94      <span class="phrase"><a name="fiber.tuning.ttas_locks"></a></span><a class="link" href="tuning.html#fiber.tuning.ttas_locks">TTAS
95      locks</a>
96    </h4>
97<p>
98      Boost.Fiber uses internally spinlocks to protect critical regions if fibers
99      running on different threads interact. Spinlocks are implemented as TTAS (test-test-and-set)
100      locks, i.e. the spinlock tests the lock before calling an atomic exchange.
101      This strategy helps to reduce the cache line invalidations triggered by acquiring/releasing
102      the lock.
103    </p>
104<h4>
105<a name="fiber.tuning.h4"></a>
106      <span class="phrase"><a name="fiber.tuning.spin_wait_loop"></a></span><a class="link" href="tuning.html#fiber.tuning.spin_wait_loop">Spin-wait
107      loop</a>
108    </h4>
109<p>
110      A lock is considered under contention, if a thread repeatedly fails to acquire
111      the lock because some other thread was faster. Waiting for a short time lets
112      other threads finish before trying to enter the critical section again. While
113      busy waiting on the lock, relaxing the CPU (via pause/yield mnemonic) gives
114      the CPU a hint that the code is in a spin-wait loop.
115    </p>
116<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
117<li class="listitem">
118          prevents expensive pipeline flushes (speculatively executed load and compare
119          instructions are not pushed to pipeline)
120        </li>
121<li class="listitem">
122          another hardware thread (simultaneous multithreading) can get time slice
123        </li>
124<li class="listitem">
125          it does delay a few CPU cycles, but this is necessary to prevent starvation
126        </li>
127</ul></div>
128<p>
129      It is obvious that this strategy is useless on single core systems because
130      the lock can only released if the thread gives up its time slice in order to
131      let other threads run. The macro BOOST_FIBERS_SPIN_SINGLE_CORE replaces the
132      CPU hints (pause/yield mnemonic) by informing the operating system (via <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread_yield</span><span class="special">()</span></code>) that the thread gives up its time slice
133      and the operating system switches to another thread.
134    </p>
135<h4>
136<a name="fiber.tuning.h5"></a>
137      <span class="phrase"><a name="fiber.tuning.exponential_back_off"></a></span><a class="link" href="tuning.html#fiber.tuning.exponential_back_off">Exponential
138      back-off</a>
139    </h4>
140<p>
141      The macro BOOST_FIBERS_RETRY_THRESHOLD determines how many times the CPU iterates
142      in the spin-wait loop before yielding the thread or blocking in futex-wait.
143      The spinlock tracks how many times the thread failed to acquire the lock. The
144      higher the contention, the longer the thread should back-off. A <span class="quote">“<span class="quote">Binary
145      Exponential Backoff</span>”</span> algorithm together with a randomized contention
146      window is utilized for this purpose. BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD
147      determines the upper limit of the contention window (expressed as the exponent
148      for basis of two).
149    </p>
150<h4>
151<a name="fiber.tuning.h6"></a>
152      <span class="phrase"><a name="fiber.tuning.speculative_execution__hardware_transactional_memory_"></a></span><a class="link" href="tuning.html#fiber.tuning.speculative_execution__hardware_transactional_memory_">Speculative
153      execution (hardware transactional memory)</a>
154    </h4>
155<p>
156      Boost.Fiber uses spinlocks to protect critical regions that can be used together
157      with transactional memory (see section <a class="link" href="speculation.html#speculation">Speculative
158      execution</a>).
159    </p>
160<div class="note"><table border="0" summary="Note">
161<tr>
162<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../doc/src/images/note.png"></td>
163<th align="left">Note</th>
164</tr>
165<tr><td align="left" valign="top"><p>
166        TXS is enabled if property <code class="computeroutput"><span class="identifier">htm</span><span class="special">=</span><span class="identifier">tsx</span></code> is
167        specified at b2 command-line and <code class="computeroutput"><span class="identifier">BOOST_USE_TSX</span></code>
168        is applied to the compiler.
169      </p></td></tr>
170</table></div>
171<div class="note"><table border="0" summary="Note">
172<tr>
173<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../doc/src/images/note.png"></td>
174<th align="left">Note</th>
175</tr>
176<tr><td align="left" valign="top"><p>
177        A TSX-transaction will be aborted if the floating point state is modified
178        inside a critical region. As a consequence floating point operations, e.g.
179        tore/load of floating point related registers during a fiber (context) switch
180        are disabled.
181      </p></td></tr>
182</table></div>
183<h4>
184<a name="fiber.tuning.h7"></a>
185      <span class="phrase"><a name="fiber.tuning.numa_systems"></a></span><a class="link" href="tuning.html#fiber.tuning.numa_systems">NUMA
186      systems</a>
187    </h4>
188<p>
189      Modern multi-socket systems are usually designed as <a class="link" href="numa.html#numa">NUMA
190      systems</a>. A suitable fiber scheduler like <a class="link" href="numa.html#class_numa_work_stealing"><code class="computeroutput">numa::work_stealing</code></a> reduces
191      remote memory access (latence).
192    </p>
193<h4>
194<a name="fiber.tuning.h8"></a>
195      <span class="phrase"><a name="fiber.tuning.parameters"></a></span><a class="link" href="tuning.html#fiber.tuning.parameters">Parameters</a>
196    </h4>
197<div class="table">
198<a name="fiber.tuning.parameters_that_migh_be_defiend_at_compiler_s_command_line"></a><p class="title"><b>Table 1.5. Parameters that migh be defiend at compiler's command line</b></p>
199<div class="table-contents"><table class="table" summary="Parameters that migh be defiend at compiler's command line">
200<colgroup>
201<col>
202<col>
203<col>
204</colgroup>
205<thead><tr>
206<th>
207              <p>
208                Parameter
209              </p>
210            </th>
211<th>
212              <p>
213                Default value
214              </p>
215            </th>
216<th>
217              <p>
218                Effect on Boost.Fiber
219              </p>
220            </th>
221</tr></thead>
222<tbody>
223<tr>
224<td>
225              <p>
226                BOOST_FIBERS_NO_ATOMICS
227              </p>
228            </td>
229<td>
230              <p>
231                -
232              </p>
233            </td>
234<td>
235              <p>
236                no multithreading support, all atomics removed, no synchronization
237                between fibers running in different threads
238              </p>
239            </td>
240</tr>
241<tr>
242<td>
243              <p>
244                BOOST_FIBERS_SPINLOCK_STD_MUTEX
245              </p>
246            </td>
247<td>
248              <p>
249                -
250              </p>
251            </td>
252<td>
253              <p>
254                <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">mutex</span></code> used inside spinlock
255              </p>
256            </td>
257</tr>
258<tr>
259<td>
260              <p>
261                BOOST_FIBERS_SPINLOCK_TTAS
262              </p>
263            </td>
264<td>
265              <p>
266                +
267              </p>
268            </td>
269<td>
270              <p>
271                spinlock with test-test-and-swap on shared variable
272              </p>
273            </td>
274</tr>
275<tr>
276<td>
277              <p>
278                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE
279              </p>
280            </td>
281<td>
282              <p>
283                -
284              </p>
285            </td>
286<td>
287              <p>
288                spinlock with test-test-and-swap on shared variable, adaptive retries
289                while busy waiting
290              </p>
291            </td>
292</tr>
293<tr>
294<td>
295              <p>
296                BOOST_FIBERS_SPINLOCK_TTAS_FUTEX
297              </p>
298            </td>
299<td>
300              <p>
301                -
302              </p>
303            </td>
304<td>
305              <p>
306                spinlock with test-test-and-swap on shared variable, suspend on futex
307                after certain number of retries
308              </p>
309            </td>
310</tr>
311<tr>
312<td>
313              <p>
314                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX
315              </p>
316            </td>
317<td>
318              <p>
319                -
320              </p>
321            </td>
322<td>
323              <p>
324                spinlock with test-test-and-swap on shared variable, while busy waiting
325                adaptive retries, suspend on futex certain amount of retries
326              </p>
327            </td>
328</tr>
329<tr>
330<td>
331              <p>
332                BOOST_FIBERS_SPINLOCK_TTAS + BOOST_USE_TSX
333              </p>
334            </td>
335<td>
336              <p>
337                -
338              </p>
339            </td>
340<td>
341              <p>
342                spinlock with test-test-and-swap and speculative execution (Intel
343                TSX required)
344              </p>
345            </td>
346</tr>
347<tr>
348<td>
349              <p>
350                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE + BOOST_USE_TSX
351              </p>
352            </td>
353<td>
354              <p>
355                -
356              </p>
357            </td>
358<td>
359              <p>
360                spinlock with test-test-and-swap on shared variable, adaptive retries
361                while busy waiting and speculative execution (Intel TSX required)
362              </p>
363            </td>
364</tr>
365<tr>
366<td>
367              <p>
368                BOOST_FIBERS_SPINLOCK_TTAS_FUTEX + BOOST_USE_TSX
369              </p>
370            </td>
371<td>
372              <p>
373                -
374              </p>
375            </td>
376<td>
377              <p>
378                spinlock with test-test-and-swap on shared variable, suspend on futex
379                after certain number of retries and speculative execution (Intel
380                TSX required)
381              </p>
382            </td>
383</tr>
384<tr>
385<td>
386              <p>
387                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX + BOOST_USE_TSX
388              </p>
389            </td>
390<td>
391              <p>
392                -
393              </p>
394            </td>
395<td>
396              <p>
397                spinlock with test-test-and-swap on shared variable, while busy waiting
398                adaptive retries, suspend on futex certain amount of retries and
399                speculative execution (Intel TSX required)
400              </p>
401            </td>
402</tr>
403<tr>
404<td>
405              <p>
406                BOOST_FIBERS_SPIN_SINGLE_CORE
407              </p>
408            </td>
409<td>
410              <p>
411                -
412              </p>
413            </td>
414<td>
415              <p>
416                on single core machines with multiple threads, yield thread (<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread</span><span class="special">::</span><span class="identifier">yield</span><span class="special">()</span></code>)
417                after collisions
418              </p>
419            </td>
420</tr>
421<tr>
422<td>
423              <p>
424                BOOST_FIBERS_RETRY_THRESHOLD
425              </p>
426            </td>
427<td>
428              <p>
429                64
430              </p>
431            </td>
432<td>
433              <p>
434                max number of retries while busy spinning, the use fallback
435              </p>
436            </td>
437</tr>
438<tr>
439<td>
440              <p>
441                BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD
442              </p>
443            </td>
444<td>
445              <p>
446                16
447              </p>
448            </td>
449<td>
450              <p>
451                max size of collisions window, expressed as exponent for the basis
452                of two
453              </p>
454            </td>
455</tr>
456<tr>
457<td>
458              <p>
459                BOOST_FIBERS_SPIN_BEFORE_SLEEP0
460              </p>
461            </td>
462<td>
463              <p>
464                32
465              </p>
466            </td>
467<td>
468              <p>
469                max number of retries that relax the processor before the thread
470                sleeps for 0s
471              </p>
472            </td>
473</tr>
474<tr>
475<td>
476              <p>
477                BOOST_FIBERS_SPIN_BEFORE_YIELD
478              </p>
479            </td>
480<td>
481              <p>
482                64
483              </p>
484            </td>
485<td>
486              <p>
487                max number of retries where the thread sleeps for 0s before yield
488                thread (<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread</span><span class="special">::</span><span class="identifier">yield</span><span class="special">()</span></code>)
489              </p>
490            </td>
491</tr>
492</tbody>
493</table></div>
494</div>
495<br class="table-break"><div class="footnotes">
496<br><hr style="width:100; text-align:left;margin-left: 0">
497<div id="ftn.fiber.tuning.f0" class="footnote"><p><a href="#fiber.tuning.f0" class="para"><sup class="para">[10] </sup></a>
498        1024cores.net: <a href="http://www.1024cores.net/home/scalable-architecture/task-scheduling-strategies" target="_top">Task
499        Scheduling Strategies</a>
500      </p></div>
501</div>
502</div>
503<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
504<td align="left"></td>
505<td align="right"><div class="copyright-footer">Copyright © 2013 Oliver Kowalke<p>
506        Distributed under the Boost Software License, Version 1.0. (See accompanying
507        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
508      </p>
509</div></td>
510</tr></table>
511<hr>
512<div class="spirit-nav">
513<a accesskey="p" href="performance.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="custom.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
514</div>
515</body>
516</html>
517