1[library Boost.Lockfree 2 [quickbook 1.4] 3 [authors [Blechmann, Tim]] 4 [copyright 2008-2011 Tim Blechmann] 5 [category algorithms] 6 [purpose 7 lockfree concurrent data structures 8 ] 9 [id lockfree] 10 [dirname lockfree] 11 [license 12 Distributed under the Boost Software License, Version 1.0. 13 (See accompanying file LICENSE_1_0.txt or copy at 14 [@http://www.boost.org/LICENSE_1_0.txt]) 15 ] 16] 17 18[c++] 19 20 21[/ Images ] 22 23[def _note_ [$images/note.png]] 24[def _alert_ [$images/caution.png]] 25[def _detail_ [$images/note.png]] 26[def _tip_ [$images/tip.png]] 27 28[/ Links ] 29 30[def _lockfree_ [^boost.lockfree]] 31 32[section Introduction & Motivation] 33 34[h2 Introduction & Terminology] 35 36The term *non-blocking* denotes concurrent data structures, which do not use traditional synchronization primitives like 37guards to ensure thread-safety. Maurice Herlihy and Nir Shavit (compare [@http://books.google.com/books?id=pFSwuqtJgxYC 38"The Art of Multiprocessor Programming"]) distinguish between 3 types of non-blocking data structures, each having different 39properties: 40 41* data structures are *wait-free*, if every concurrent operation is guaranteed to be finished in a finite number of 42 steps. It is therefore possible to give worst-case guarantees for the number of operations. 43 44* data structures are *lock-free*, if some concurrent operations are guaranteed to be finished in a finite number of 45 steps. While it is in theory possible that some operations never make any progress, it is very unlikely to happen in 46 practical applications. 47 48* data structures are *obstruction-free*, if a concurrent operation is guaranteed to be finished in a finite number of 49 steps, unless another concurrent operation interferes. 50 51 52Some data structures can only be implemented in a lock-free manner, if they are used under certain restrictions. The 53relevant aspects for the implementation of _lockfree_ are the number of producer and consumer threads. *Single-producer* 54(*sp*) or *multiple producer* (*mp*) means that only a single thread or multiple concurrent threads are allowed to add 55data to a data structure. *Single-consumer* (*sc*) or *Multiple-consumer* (*mc*) denote the equivalent for the removal 56of data from the data structure. 57 58 59[h2 Properties of Non-Blocking Data Structures] 60 61Non-blocking data structures do not rely on locks and mutexes to ensure thread-safety. The synchronization is done completely in 62user-space without any direct interaction with the operating system [footnote Spinlocks do not 63directly interact with the operating system either. However it is possible that the owning thread is preempted by the 64operating system, which violates the lock-free property.]. This implies that they are not prone to issues like priority 65inversion (a low-priority thread needs to wait for a high-priority thread). 66 67Instead of relying on guards, non-blocking data structures require *atomic operations* (specific CPU instructions executed 68without interruption). This means that any thread either sees the state before or after the operation, but no 69intermediate state can be observed. Not all hardware supports the same set of atomic instructions. If it is not 70available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the 71lock-free property. 72 73 74[h2 Performance of Non-Blocking Data Structures] 75 76When discussing the performance of non-blocking data structures, one has to distinguish between *amortized* and 77*worst-case* costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore 78lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an 79application one should consider high-performance concurrent data structures [footnote 80[@http://threadingbuildingblocks.org/ Intel's Thread Building Blocks library] provides many efficient concurrent data structures, 81which are not necessarily lock-free.]. 82 83Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion, 84which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if 85concurrent data structures are sufficient. In any case we advice to perform benchmarks with different data structures for a 86specific workload. 87 88 89[h2 Sources of Blocking Behavior] 90 91Apart from locks and mutexes (which we are not using in _lockfree_ anyway), there are three other aspects, that could violate 92lock-freedom: 93 94[variablelist 95 [[Atomic Operations] 96 [Some architectures do not provide the necessary atomic operations in natively in hardware. If this is not 97 the case, they are emulated in software using spinlocks, which by itself is blocking. 98 ] 99 ] 100 101 [[Memory Allocations] 102 [Allocating memory from the operating system is not lock-free. This makes it impossible to implement true 103 dynamically-sized non-blocking data structures. The node-based data structures of _lockfree_ use a memory pool to allocate the 104 internal nodes. If this memory pool is exhausted, memory for new nodes has to be allocated from the operating system. However 105 all data structures of _lockfree_ can be configured to avoid memory allocations (instead the specific calls will fail). 106 This is especially useful for real-time systems that require lock-free memory allocations. 107 ] 108 ] 109 110 [[Exception Handling] 111 [The C++ exception handling does not give any guarantees about its real-time behavior. We therefore do 112 not encourage the use of exceptions and exception handling in lock-free code.] 113 ] 114] 115 116[h2 Data Structures] 117 118_lockfree_ implements three lock-free data structures: 119 120[variablelist 121 [[[classref boost::lockfree::queue]] 122 [a lock-free multi-producer/multi-consumer queue] 123 ] 124 125 [[[classref boost::lockfree::stack]] 126 [a lock-free multi-producer/multi-consumer stack] 127 ] 128 129 [[[classref boost::lockfree::spsc_queue]] 130 [a wait-free single-producer/single-consumer queue (commonly known as ringbuffer)] 131 ] 132] 133 134[h3 Data Structure Configuration] 135 136The data structures can be configured with [@boost:/libs/parameter/doc/html/index.html Boost.Parameter]-style templates: 137 138[variablelist 139 [[[classref boost::lockfree::fixed_sized]] 140 [Configures the data structure as *fixed sized*. The internal nodes are stored inside an array and they are addressed by 141 array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index 142 type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way 143 to achieve lock-freedom. 144 ] 145 ] 146 147 [[[classref boost::lockfree::capacity]] 148 [Sets the *capacity* of a data structure at compile-time. This implies that a data structure is fixed-sized. 149 ] 150 ] 151 152 [[[classref boost::lockfree::allocator]] 153 [Defines the allocator. _lockfree_ supports stateful allocator and is compatible with [@boost:/libs/interprocess/index.html Boost.Interprocess] allocators.] 154 ] 155] 156 157 158[endsect] 159 160[section Examples] 161 162[h2 Queue] 163 164The [classref boost::lockfree::queue boost::lockfree::queue] class implements a multi-writer/multi-reader queue. The 165following example shows how integer values are produced and consumed by 4 threads each: 166 167[import ../examples/queue.cpp] 168[queue_example] 169 170The program output is: 171 172[pre 173produced 40000000 objects. 174consumed 40000000 objects. 175] 176 177 178[h2 Stack] 179 180The [classref boost::lockfree::stack boost::lockfree::stack] class implements a multi-writer/multi-reader stack. The 181following example shows how integer values are produced and consumed by 4 threads each: 182 183[import ../examples/stack.cpp] 184[stack_example] 185 186 187The program output is: 188 189[pre 190produced 4000000 objects. 191consumed 4000000 objects. 192] 193 194[h2 Waitfree Single-Producer/Single-Consumer Queue] 195 196The [classref boost::lockfree::spsc_queue boost::lockfree::spsc_queue] class implements a wait-free single-producer/single-consumer queue. The 197following example shows how integer values are produced and consumed by 2 separate threads: 198 199[import ../examples/spsc_queue.cpp] 200[spsc_queue_example] 201 202 203The program output is: 204 205[pre 206produced 10000000 objects. 207consumed 10000000 objects. 208] 209 210[endsect] 211 212 213[section Rationale] 214 215[section Data Structures] 216 217The implementations are implementations of well-known data structures. The queue is based on 218[@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael], 219the stack is based on [@http://books.google.com/books?id=YQg3HAAACAAJ Systems programming: coping with parallelism by R. K. Treiber] 220and the spsc_queue is considered as 'folklore' and is implemented in several open-source projects including the linux kernel. All 221data structures are discussed in detail in [@http://books.google.com/books?id=pFSwuqtJgxYC "The Art of Multiprocessor Programming" by Herlihy & Shavit]. 222 223[endsect] 224 225[section Memory Management] 226 227The lock-free [classref boost::lockfree::queue] and [classref boost::lockfree::stack] classes are node-based data structures, 228based on a linked list. Memory management of lock-free data structures is a non-trivial problem, because we need to avoid that 229one thread frees an internal node, while another thread still uses it. _lockfree_ uses a simple approach not returning any memory 230to the operating system. Instead they maintain a *free-list* in order to reuse them later. This is done for two reasons: 231first, depending on the implementation of the memory allocator freeing the memory may block (so the implementation would not 232be lock-free anymore), and second, most memory reclamation algorithms are patented. 233 234[endsect] 235 236[section ABA Prevention] 237 238The ABA problem is a common problem when implementing lock-free data structures. The problem occurs when updating an atomic 239variable using a =compare_exchange= operation: if the value A was read, thread 1 changes it to say C and tries to update 240the variable, it uses =compare_exchange= to write C, only if the current value is A. This might be a problem if in the meanwhile 241thread 2 changes the value from A to B and back to A, because thread 1 does not observe the change of the state. The common way to 242avoid the ABA problem is to associate a version counter with the value and change both atomically. 243 244_lockfree_ uses a =tagged_ptr= helper class which associates a pointer with an integer tag. This usually requires a double-width 245=compare_exchange=, which is not available on all platforms. IA32 did not provide the =cmpxchg8b= opcode before the pentium 246processor and it is also lacking on many RISC architectures like PPC. Early X86-64 processors also did not provide a =cmpxchg16b= 247instruction. 248On 64bit platforms one can work around this issue, because often not the full 64bit address space is used. On X86_64 for example, 249only 48bit are used for the address, so we can use the remaining 16bit for the ABA prevention tag. For details please consult the 250implementation of the =boost::lockfree::detail::tagged_ptr= class. 251 252For lock-free operations on 32bit platforms without double-width =compare_exchange=, we support a third approach: by using a 253fixed-sized array to store the internal nodes we can avoid the use of 32bit pointers, but instead 16bit indices into the array 254are sufficient. However this is only possible for fixed-sized data structures, that have an upper bound of internal nodes. 255 256[endsect] 257 258[section Interprocess Support] 259 260The _lockfree_ data structures have basic support for [@boost:/libs/interprocess/index.html Boost.Interprocess]. The only 261problem is the blocking emulation of lock-free atomics, which in the current implementation is not guaranteed to be interprocess-safe. 262 263[endsect] 264 265[endsect] 266 267[xinclude autodoc.xml] 268 269[section Appendices] 270 271[section Supported Platforms & Compilers] 272 273_lockfree_ has been tested on the following platforms: 274 275* g++ 4.4, 4.5 and 4.6, linux, x86 & x86_64 276* clang++ 3.0, linux, x86 & x86_64 277 278[endsect] 279 280[section Future Developments] 281 282* More data structures (set, hash table, dequeue) 283* Backoff schemes (exponential backoff or elimination) 284 285[endsect] 286 287[section References] 288 289# [@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael], 290In Symposium on Principles of Distributed Computing, pages 267–275, 1996. 291# [@http://books.google.com/books?id=pFSwuqtJgxYC M. Herlihy & Nir Shavit. The Art of Multiprocessor Programming], Morgan Kaufmann Publishers, 2008 292 293[endsect] 294 295[endsect] 296