Atomics.rst - OpenGrok cross reference for /external/llvm/docs/Atomics.rst

Lines Matching +full:for +full:- +full:in
11 LLVM supports instructions which are well-defined in the presence of threads and
15 optimized code generation for the following:
18   <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
19   <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
21 * Proper semantics for Java-style memory, for both ``volatile`` and regular
23   <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
25 * gcc-compatible ``__sync_*`` builtins. (`Description
26   <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
29   non-trivial constructors in C++.
31 Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile,
32 which ensures that every volatile load and store happens and is performed in the
35 address, the first store can be erased. This transformation is not allowed for a
36 pair of volatile stores. On the other hand, a non-volatile non-atomic load can
40 for LLVM or working on optimization passes for LLVM with a guide for how to deal
41 with instructions with special semantics in the presence of concurrency.  This
51 lead to undefined results in a concurrent environment; see `NotAtomic`_. This
52 section specifically goes into the one optimizer restriction which applies in
62 .. code-block:: c
64  /* C code, for readability; run through clang -O2 -S -emit-llvm to get
68     for (int i = 0; i < 100; i++) {
74 The following is equivalent in non-concurrent situations:
76 .. code-block:: c
81     for (int i = 0; i < 100; i++) {
99 For cases where simple loads and stores are not sufficient, LLVM provides
104 non-atomic loads and stores, but provide additional guarantees in situations
108 atomic store (where the store is conditional for ``cmpxchg``), but no other
117 guarantees, see the C++11 standard for details.
120 target to some degree; atomic instructions are guaranteed to be lock-free, and
129 In order to achieve a balance between performance and necessary guarantees,
130 there are six levels of atomicity. They are listed in order of strength; each
131 level includes all the guarantees of the previous level except for
137 ---------
140 really a level of atomicity, but is listed here for comparison.) This is
145   This is intended to match shared variables in C/C++, and to be used in any
147   precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.)
149 Notes for frontends
153   for a "safe" language like Java, use Unordered to load and store any shared
159 Notes for optimizers
164 Notes for code generation
167   unaligned stores: it is not allowed in general to convert an unaligned store
170   which writes to surrounding bytes.  (If you are writing a backend for an
172   concurrency, please send an email to llvm-dev.)
175 ---------
179 guarantees the operation to be lock-free, so it does not depend on the data
180 being part of a special atomic structure or depend on a separate per-process
181 global lock.  Note that code generation will fail for unsupported atomic
185   This is intended to match the Java memory model for shared variables.
187 Notes for frontends
188   This cannot be used for synchronization, but is useful for Java and other
191   platforms for loads of a native width, but can be expensive or unavailable for
192   wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe"
193   languages would normally split a 64-bit store on ARM into two 32-bit unordered
196 Notes for optimizers
197   In terms of the optimizer, this prohibits any transformation that transforms a
203   take advantage of that because unordered operations are common in languages
206 Notes for code generation
207   These operations are required to be atomic in the sense that if you use
212   ``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM).
215 ---------
217 Monotonic is the weakest level of atomicity that can be used in synchronization
224   standards for the exact definition.
226 Notes for frontends
228   guarantees in terms of synchronization are very weak, so make sure these are
229   only used in a pattern which you know is correct.  Generally, these would
230   either be used for atomic operations which do not protect other memory (like
233 Notes for optimizers
234   In terms of the optimizer, this can be treated as a read+write on the relevant
235   memory location (and alias analysis will take advantage of that). In addition,
236   it is legal to reorder non-atomic and Unordered loads around Monotonic
238   operations are unlikely to be used in ways which would make those
241 Notes for code generation
242   Code generation is essentially the same as that for unordered for loads and
247 -------
254   used for C++11/C11 ``memory_order_consume``.
256 Notes for frontends
261 Notes for optimizers
263   also possible to move stores from before an Acquire load or read-modify-write
264   operation to after it, and move non-Acquire loads from before an Acquire
267 Notes for code generation
270   semantics.  The precise fences required varies widely by architecture, but for
272   enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.).  Putting
274   maintain Acquire semantics for a memory operation.
277 -------
285 Notes for frontends
290 Notes for optimizers
292   also possible to move loads from after a Release store or read-modify-write
293   operation to before it, and move non-Release stores from after an Release
296 Notes for code generation
298   sufficient for Release. Note that a store-store fence is not sufficient to
299   implement Release semantics; store-store fences are generally not exposed to
303 --------------
305 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
306 barrier (for fences and operations which both read and write memory).
311 Notes for frontends
316 Notes for optimizers
317   In general, optimizers should treat this like a nothrow call; the possible
320 Notes for code generation
325 ----------------------
327 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
328 and Release semantics for stores. Additionally, it guarantees that a total
333   the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
335 Notes for frontends
337   about for the programmer than other kinds of operations, and using them is
340 Notes for optimizers
341   Optimizers not aware of atomics can treat this like a nothrow call.  For
343   for Acquire loads and Release stores, except that SequentiallyConsistent
346 Notes for code generation
357 Predicates for optimizer writers to query:
360   what, for example, memcpyopt would check for operations it might transform.
363   Unordered. This would be checked, for example, by LICM before hoisting an
367   that they return true for any operation which is volatile or at least
371   orderings. They can be useful for passes that are aware of atomics, for
373   release-acquire pair (see MemoryDependencyAnalysis for an example of this)
375 * Alias analysis: Note that AA will return ModRef for anything Acquire or
376   Release, and for the address accessed by any Monotonic operation.
380 optimize some atomic operations (Unordered operations in particular), make sure
381 it doesn't replace an atomic load or store with a non-atomic operation.
395   be DSE'ed in some cases, but it's tricky to reason about, and not especially
396   important. It is possible in some case for DSE to operate across a stronger
400 * Folding a load: Any atomic load from a constant global can be constant-folded,
407 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
408 On architectures which use barrier instructions for all atomic ordering (like
412 The MachineMemOperand for all atomic operations is currently marked as volatile;
413 this is not correct in the IR sense of volatile, but CodeGen handles anything
417 supports any inline lock-free atomic operations of a given size, you should
418 support *ALL* operations of that size in a lock-free manner.
431 proper ``__atomic_*`` libcalls for any size above the maximum set by
441 operations like ``LOCK AND``, but that does not work in general.
444 and SequentiallyConsistent semantics require barrier instructions for every such
446 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
450 It is often easiest for backends to use AtomicExpandPass to lower some of the
453 * cmpxchg -> loop with load-linked/store-conditional
456 * large loads/stores -> ll-sc/cmpxchg
458 * strong atomic accesses -> monotonic accesses + fences by overriding
461 * atomic rmw -> loop with cmpxchg or load-linked/store-conditional
463 * expansion to __atomic_* libcalls for unsupported sizes.
465 For an example of all of these, look at the ARM backend.
492 There are also size-specialized versions of the above functions, which can only
493 be used with *naturally-aligned* pointers of the appropriate size. In the
494 signatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate
498    iN __atomic_load_N(iN *ptr, iN val, int ordering)
499    void __atomic_store_N(iN *ptr, iN val, int ordering)
500    iN __atomic_exchange_N(iN *ptr, iN val, int ordering)
501 …bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure…
503 Finally there are some read-modify-write functions, which are only available in
504 the size-specific variants (any other sizes use a ``__atomic_compare_exchange``
507    iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering)
508    iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering)
509    iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering)
510    iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering)
511    iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering)
512    iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering)
517 - They support all sizes and alignments -- including those which cannot be
519   use mutexes in for some sizes/alignments.
521 - As a consequence, they cannot be shipped in a statically linked
522   compiler-support library, as they have state which must be shared amongst all
523   DSOs loaded in the program. They must be provided in a shared library used by
526 - The set of atomic sizes supported lock-free must be a superset of the sizes
527   any compiler can emit. That is: if a new compiler introduces support for
528   inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a
529   lock-free implementation for size N. This is a requirement so that code
534 Note that it's possible to write an entirely target-independent implementation
536 implement the operations on naturally-aligned pointers of supported sizes, and a
542 Some targets or OS/target combinations can support lock-free atomics, but for
548 on function-call boundaries. For example, MIPS supports the MIPS16 ISA, which
550 has the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic
554 Additionally, a few OS/target pairs provide kernel-supported lock-free
557 function which on older CPUs contains a "magically-restartable" atomic sequence
561 compare-and-swap support are uniprocessor (no SMP). This is almost always the
562 case. The only common architecture without that property is SPARC -- SPARCV8 SMP
563 systems were common, yet it doesn't support any sort of compare-and-swap
566 In either of these cases, the Target in LLVM can claim support for atomics of an
568 to a ``__sync_*`` function. Such functions *must* not use locks in their
570 AtomicExpandPass, these may be mixed-and-matched with native instructions by the
574 there is no issue with having multiple copies included in one binary. Thus,
575 typically these routines are implemented by the statically-linked compiler
580 or ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the
583 The full set of functions that may be called by LLVM is (for ``N`` being 1, 2,
586   iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired)
587   iN __sync_lock_test_and_set_N(iN *ptr, iN val)
588   iN __sync_fetch_and_add_N(iN *ptr, iN val)
589   iN __sync_fetch_and_sub_N(iN *ptr, iN val)
590   iN __sync_fetch_and_and_N(iN *ptr, iN val)
591   iN __sync_fetch_and_or_N(iN *ptr, iN val)
592   iN __sync_fetch_and_xor_N(iN *ptr, iN val)
593   iN __sync_fetch_and_nand_N(iN *ptr, iN val)
594   iN __sync_fetch_and_max_N(iN *ptr, iN val)
595   iN __sync_fetch_and_umax_N(iN *ptr, iN val)
596   iN __sync_fetch_and_min_N(iN *ptr, iN val)
597   iN __sync_fetch_and_umin_N(iN *ptr, iN val)
599 This list doesn't include any function for atomic load or store; all known