1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>Tuning</title> 5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../index.html" title="Chapter 1. Fiber"> 8<link rel="up" href="../index.html" title="Chapter 1. Fiber"> 9<link rel="prev" href="performance.html" title="Performance"> 10<link rel="next" href="custom.html" title="Customization"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="performance.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="custom.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 27<a name="fiber.tuning"></a><a name="tuning"></a><a class="link" href="tuning.html" title="Tuning">Tuning</a> 28</h2></div></div></div> 29<h4> 30<a name="fiber.tuning.h0"></a> 31 <span class="phrase"><a name="fiber.tuning.disable_synchronization"></a></span><a class="link" href="tuning.html#fiber.tuning.disable_synchronization">Disable 32 synchronization</a> 33 </h4> 34<p> 35 With <a class="link" href="overview.html#cross_thread_sync"><code class="computeroutput"><span class="identifier">BOOST_FIBERS_NO_ATOMICS</span></code></a> 36 defined at the compiler’s command line, synchronization between fibers (in different 37 threads) is disabled. This is acceptable if the application is single threaded 38 and/or fibers are not synchronized between threads. 39 </p> 40<h4> 41<a name="fiber.tuning.h1"></a> 42 <span class="phrase"><a name="fiber.tuning.memory_allocation"></a></span><a class="link" href="tuning.html#fiber.tuning.memory_allocation">Memory 43 allocation</a> 44 </h4> 45<p> 46 Memory allocation algorithm is significant for performance in a multithreaded 47 environment, especially for <span class="bold"><strong>Boost.Fiber</strong></span> where 48 fiber stacks are allocated on the heap. The default user-level memory allocator 49 (UMA) of glibc is ptmalloc2 but it can be replaced by another UMA that fit 50 better for the concret work-load For instance Google’s <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html" target="_top">TCmalloc</a> 51 enables a better performance at the <span class="emphasis"><em>skynet</em></span> microbenchmark 52 than glibc’s default memory allocator. 53 </p> 54<h4> 55<a name="fiber.tuning.h2"></a> 56 <span class="phrase"><a name="fiber.tuning.scheduling_strategies"></a></span><a class="link" href="tuning.html#fiber.tuning.scheduling_strategies">Scheduling 57 strategies</a> 58 </h4> 59<p> 60 The fibers in a thread are coordinated by a fiber manager. Fibers trade control 61 cooperatively, rather than preemptively. Depending on the work-load several 62 strategies of scheduling the fibers are possible <a href="#ftn.fiber.tuning.f0" class="footnote" name="fiber.tuning.f0"><sup class="footnote">[10]</sup></a> that can be implmented on behalf of <a class="link" href="scheduling.html#class_algorithm"><code class="computeroutput">algorithm</code></a>. 63 </p> 64<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 65<li class="listitem"> 66 work-stealing: ready fibers are hold in a local queue, when the fiber-scheduler's 67 local queue runs out of ready fibers, it randomly selects another fiber-scheduler 68 and tries to steal a ready fiber from the victim (implemented in <a class="link" href="scheduling.html#class_work_stealing"><code class="computeroutput">work_stealing</code></a> and 69 <a class="link" href="numa.html#class_numa_work_stealing"><code class="computeroutput">numa::work_stealing</code></a>) 70 </li> 71<li class="listitem"> 72 work-requesting: ready fibers are hold in a local queue, when the fiber-scheduler's 73 local queue runs out of ready fibers, it randomly selects another fiber-scheduler 74 and requests for a ready fibers, the victim fiber-scheduler sends a ready-fiber 75 back 76 </li> 77<li class="listitem"> 78 work-sharing: ready fibers are hold in a global queue, fiber-scheduler 79 concurrently push and pop ready fibers to/from the global queue (implemented 80 in <a class="link" href="scheduling.html#class_shared_work"><code class="computeroutput">shared_work</code></a>) 81 </li> 82<li class="listitem"> 83 work-distribution: fibers that became ready are proactivly distributed 84 to idle fiber-schedulers or fiber-schedulers with low load 85 </li> 86<li class="listitem"> 87 work-balancing: a dedicated (helper) fiber-scheduler periodically collects 88 informations about all fiber-scheduler running in other threads and re-distributes 89 ready fibers among them 90 </li> 91</ul></div> 92<h4> 93<a name="fiber.tuning.h3"></a> 94 <span class="phrase"><a name="fiber.tuning.ttas_locks"></a></span><a class="link" href="tuning.html#fiber.tuning.ttas_locks">TTAS 95 locks</a> 96 </h4> 97<p> 98 Boost.Fiber uses internally spinlocks to protect critical regions if fibers 99 running on different threads interact. Spinlocks are implemented as TTAS (test-test-and-set) 100 locks, i.e. the spinlock tests the lock before calling an atomic exchange. 101 This strategy helps to reduce the cache line invalidations triggered by acquiring/releasing 102 the lock. 103 </p> 104<h4> 105<a name="fiber.tuning.h4"></a> 106 <span class="phrase"><a name="fiber.tuning.spin_wait_loop"></a></span><a class="link" href="tuning.html#fiber.tuning.spin_wait_loop">Spin-wait 107 loop</a> 108 </h4> 109<p> 110 A lock is considered under contention, if a thread repeatedly fails to acquire 111 the lock because some other thread was faster. Waiting for a short time lets 112 other threads finish before trying to enter the critical section again. While 113 busy waiting on the lock, relaxing the CPU (via pause/yield mnemonic) gives 114 the CPU a hint that the code is in a spin-wait loop. 115 </p> 116<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 117<li class="listitem"> 118 prevents expensive pipeline flushes (speculatively executed load and compare 119 instructions are not pushed to pipeline) 120 </li> 121<li class="listitem"> 122 another hardware thread (simultaneous multithreading) can get time slice 123 </li> 124<li class="listitem"> 125 it does delay a few CPU cycles, but this is necessary to prevent starvation 126 </li> 127</ul></div> 128<p> 129 It is obvious that this strategy is useless on single core systems because 130 the lock can only released if the thread gives up its time slice in order to 131 let other threads run. The macro BOOST_FIBERS_SPIN_SINGLE_CORE replaces the 132 CPU hints (pause/yield mnemonic) by informing the operating system (via <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread_yield</span><span class="special">()</span></code>) that the thread gives up its time slice 133 and the operating system switches to another thread. 134 </p> 135<h4> 136<a name="fiber.tuning.h5"></a> 137 <span class="phrase"><a name="fiber.tuning.exponential_back_off"></a></span><a class="link" href="tuning.html#fiber.tuning.exponential_back_off">Exponential 138 back-off</a> 139 </h4> 140<p> 141 The macro BOOST_FIBERS_RETRY_THRESHOLD determines how many times the CPU iterates 142 in the spin-wait loop before yielding the thread or blocking in futex-wait. 143 The spinlock tracks how many times the thread failed to acquire the lock. The 144 higher the contention, the longer the thread should back-off. A <span class="quote">“<span class="quote">Binary 145 Exponential Backoff</span>”</span> algorithm together with a randomized contention 146 window is utilized for this purpose. BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD 147 determines the upper limit of the contention window (expressed as the exponent 148 for basis of two). 149 </p> 150<h4> 151<a name="fiber.tuning.h6"></a> 152 <span class="phrase"><a name="fiber.tuning.speculative_execution__hardware_transactional_memory_"></a></span><a class="link" href="tuning.html#fiber.tuning.speculative_execution__hardware_transactional_memory_">Speculative 153 execution (hardware transactional memory)</a> 154 </h4> 155<p> 156 Boost.Fiber uses spinlocks to protect critical regions that can be used together 157 with transactional memory (see section <a class="link" href="speculation.html#speculation">Speculative 158 execution</a>). 159 </p> 160<div class="note"><table border="0" summary="Note"> 161<tr> 162<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../doc/src/images/note.png"></td> 163<th align="left">Note</th> 164</tr> 165<tr><td align="left" valign="top"><p> 166 TXS is enabled if property <code class="computeroutput"><span class="identifier">htm</span><span class="special">=</span><span class="identifier">tsx</span></code> is 167 specified at b2 command-line and <code class="computeroutput"><span class="identifier">BOOST_USE_TSX</span></code> 168 is applied to the compiler. 169 </p></td></tr> 170</table></div> 171<div class="note"><table border="0" summary="Note"> 172<tr> 173<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../doc/src/images/note.png"></td> 174<th align="left">Note</th> 175</tr> 176<tr><td align="left" valign="top"><p> 177 A TSX-transaction will be aborted if the floating point state is modified 178 inside a critical region. As a consequence floating point operations, e.g. 179 tore/load of floating point related registers during a fiber (context) switch 180 are disabled. 181 </p></td></tr> 182</table></div> 183<h4> 184<a name="fiber.tuning.h7"></a> 185 <span class="phrase"><a name="fiber.tuning.numa_systems"></a></span><a class="link" href="tuning.html#fiber.tuning.numa_systems">NUMA 186 systems</a> 187 </h4> 188<p> 189 Modern multi-socket systems are usually designed as <a class="link" href="numa.html#numa">NUMA 190 systems</a>. A suitable fiber scheduler like <a class="link" href="numa.html#class_numa_work_stealing"><code class="computeroutput">numa::work_stealing</code></a> reduces 191 remote memory access (latence). 192 </p> 193<h4> 194<a name="fiber.tuning.h8"></a> 195 <span class="phrase"><a name="fiber.tuning.parameters"></a></span><a class="link" href="tuning.html#fiber.tuning.parameters">Parameters</a> 196 </h4> 197<div class="table"> 198<a name="fiber.tuning.parameters_that_migh_be_defiend_at_compiler_s_command_line"></a><p class="title"><b>Table 1.5. Parameters that migh be defiend at compiler's command line</b></p> 199<div class="table-contents"><table class="table" summary="Parameters that migh be defiend at compiler's command line"> 200<colgroup> 201<col> 202<col> 203<col> 204</colgroup> 205<thead><tr> 206<th> 207 <p> 208 Parameter 209 </p> 210 </th> 211<th> 212 <p> 213 Default value 214 </p> 215 </th> 216<th> 217 <p> 218 Effect on Boost.Fiber 219 </p> 220 </th> 221</tr></thead> 222<tbody> 223<tr> 224<td> 225 <p> 226 BOOST_FIBERS_NO_ATOMICS 227 </p> 228 </td> 229<td> 230 <p> 231 - 232 </p> 233 </td> 234<td> 235 <p> 236 no multithreading support, all atomics removed, no synchronization 237 between fibers running in different threads 238 </p> 239 </td> 240</tr> 241<tr> 242<td> 243 <p> 244 BOOST_FIBERS_SPINLOCK_STD_MUTEX 245 </p> 246 </td> 247<td> 248 <p> 249 - 250 </p> 251 </td> 252<td> 253 <p> 254 <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">mutex</span></code> used inside spinlock 255 </p> 256 </td> 257</tr> 258<tr> 259<td> 260 <p> 261 BOOST_FIBERS_SPINLOCK_TTAS 262 </p> 263 </td> 264<td> 265 <p> 266 + 267 </p> 268 </td> 269<td> 270 <p> 271 spinlock with test-test-and-swap on shared variable 272 </p> 273 </td> 274</tr> 275<tr> 276<td> 277 <p> 278 BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE 279 </p> 280 </td> 281<td> 282 <p> 283 - 284 </p> 285 </td> 286<td> 287 <p> 288 spinlock with test-test-and-swap on shared variable, adaptive retries 289 while busy waiting 290 </p> 291 </td> 292</tr> 293<tr> 294<td> 295 <p> 296 BOOST_FIBERS_SPINLOCK_TTAS_FUTEX 297 </p> 298 </td> 299<td> 300 <p> 301 - 302 </p> 303 </td> 304<td> 305 <p> 306 spinlock with test-test-and-swap on shared variable, suspend on futex 307 after certain number of retries 308 </p> 309 </td> 310</tr> 311<tr> 312<td> 313 <p> 314 BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX 315 </p> 316 </td> 317<td> 318 <p> 319 - 320 </p> 321 </td> 322<td> 323 <p> 324 spinlock with test-test-and-swap on shared variable, while busy waiting 325 adaptive retries, suspend on futex certain amount of retries 326 </p> 327 </td> 328</tr> 329<tr> 330<td> 331 <p> 332 BOOST_FIBERS_SPINLOCK_TTAS + BOOST_USE_TSX 333 </p> 334 </td> 335<td> 336 <p> 337 - 338 </p> 339 </td> 340<td> 341 <p> 342 spinlock with test-test-and-swap and speculative execution (Intel 343 TSX required) 344 </p> 345 </td> 346</tr> 347<tr> 348<td> 349 <p> 350 BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE + BOOST_USE_TSX 351 </p> 352 </td> 353<td> 354 <p> 355 - 356 </p> 357 </td> 358<td> 359 <p> 360 spinlock with test-test-and-swap on shared variable, adaptive retries 361 while busy waiting and speculative execution (Intel TSX required) 362 </p> 363 </td> 364</tr> 365<tr> 366<td> 367 <p> 368 BOOST_FIBERS_SPINLOCK_TTAS_FUTEX + BOOST_USE_TSX 369 </p> 370 </td> 371<td> 372 <p> 373 - 374 </p> 375 </td> 376<td> 377 <p> 378 spinlock with test-test-and-swap on shared variable, suspend on futex 379 after certain number of retries and speculative execution (Intel 380 TSX required) 381 </p> 382 </td> 383</tr> 384<tr> 385<td> 386 <p> 387 BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX + BOOST_USE_TSX 388 </p> 389 </td> 390<td> 391 <p> 392 - 393 </p> 394 </td> 395<td> 396 <p> 397 spinlock with test-test-and-swap on shared variable, while busy waiting 398 adaptive retries, suspend on futex certain amount of retries and 399 speculative execution (Intel TSX required) 400 </p> 401 </td> 402</tr> 403<tr> 404<td> 405 <p> 406 BOOST_FIBERS_SPIN_SINGLE_CORE 407 </p> 408 </td> 409<td> 410 <p> 411 - 412 </p> 413 </td> 414<td> 415 <p> 416 on single core machines with multiple threads, yield thread (<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread</span><span class="special">::</span><span class="identifier">yield</span><span class="special">()</span></code>) 417 after collisions 418 </p> 419 </td> 420</tr> 421<tr> 422<td> 423 <p> 424 BOOST_FIBERS_RETRY_THRESHOLD 425 </p> 426 </td> 427<td> 428 <p> 429 64 430 </p> 431 </td> 432<td> 433 <p> 434 max number of retries while busy spinning, the use fallback 435 </p> 436 </td> 437</tr> 438<tr> 439<td> 440 <p> 441 BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD 442 </p> 443 </td> 444<td> 445 <p> 446 16 447 </p> 448 </td> 449<td> 450 <p> 451 max size of collisions window, expressed as exponent for the basis 452 of two 453 </p> 454 </td> 455</tr> 456<tr> 457<td> 458 <p> 459 BOOST_FIBERS_SPIN_BEFORE_SLEEP0 460 </p> 461 </td> 462<td> 463 <p> 464 32 465 </p> 466 </td> 467<td> 468 <p> 469 max number of retries that relax the processor before the thread 470 sleeps for 0s 471 </p> 472 </td> 473</tr> 474<tr> 475<td> 476 <p> 477 BOOST_FIBERS_SPIN_BEFORE_YIELD 478 </p> 479 </td> 480<td> 481 <p> 482 64 483 </p> 484 </td> 485<td> 486 <p> 487 max number of retries where the thread sleeps for 0s before yield 488 thread (<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread</span><span class="special">::</span><span class="identifier">yield</span><span class="special">()</span></code>) 489 </p> 490 </td> 491</tr> 492</tbody> 493</table></div> 494</div> 495<br class="table-break"><div class="footnotes"> 496<br><hr style="width:100; text-align:left;margin-left: 0"> 497<div id="ftn.fiber.tuning.f0" class="footnote"><p><a href="#fiber.tuning.f0" class="para"><sup class="para">[10] </sup></a> 498 1024cores.net: <a href="http://www.1024cores.net/home/scalable-architecture/task-scheduling-strategies" target="_top">Task 499 Scheduling Strategies</a> 500 </p></div> 501</div> 502</div> 503<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 504<td align="left"></td> 505<td align="right"><div class="copyright-footer">Copyright © 2013 Oliver Kowalke<p> 506 Distributed under the Boost Software License, Version 1.0. (See accompanying 507 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) 508 </p> 509</div></td> 510</tr></table> 511<hr> 512<div class="spirit-nav"> 513<a accesskey="p" href="performance.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="custom.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 514</div> 515</body> 516</html> 517