• Home
  • Raw
  • Download

Lines Matching +full:max +full:- +full:virtual +full:- +full:functions

19 documentation at tools/memory-model/.  Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48 - Device operations.
49 - Guarantees.
53 - Varieties of memory barrier.
54 - What may not be assumed about memory barriers?
55 - Address-dependency barriers (historical).
56 - Control dependencies.
57 - SMP barrier pairing.
58 - Examples of memory barrier sequences.
59 - Read memory barriers vs load speculation.
60 - Multicopy atomicity.
64 - Compiler barrier.
65 - CPU memory barriers.
69 - Lock acquisition functions.
70 - Interrupt disabling functions.
71 - Sleep and wake-up functions.
72 - Miscellaneous functions.
74 (*) Inter-CPU acquiring barrier effects.
76 - Acquires vs memory accesses.
80 - Interprocessor interaction.
81 - Atomic operations.
82 - Accessing devices.
83 - Interrupts.
91 - Cache coherency vs DMA.
92 - Cache coherency vs MMIO.
96 - And then there's the Alpha.
97 - Virtual Machine Guests.
101 - Circular buffers.
115 +-------+ : +--------+ : +-------+
118 | CPU 1 |<----->| Memory |<----->| CPU 2 |
121 +-------+ : +--------+ : +-------+
126 | : +--------+ : |
129 +---------->| Device |<----------+
132 : +--------+ :
158 STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
159 STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
160 STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
161 STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
162 STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
163 STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
164 STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
202 -----------------
224 ----------
238 emits a memory-barrier instruction, so that a DEC Alpha CPU will
309 And there are anti-guarantees:
312 generate code to modify these using non-atomic read-modify-write
319 non-atomic read-modify-write sequences can cause an update to one
326 "char", two-byte alignment for "short", four-byte alignment for
327 "int", and either four-byte or eight-byte alignment for "long",
328 on 32-bit and 64-bit systems, respectively. Note that these
330 using older pre-C11 compilers (for example, gcc 4.6). The portion
336 of adjacent bit-fields all having nonzero width
342 NOTE 2: A bit-field and an adjacent non-bit-field member
344 to two bit-fields, if one is declared inside a nested
346 are separated by a zero-length bit-field declaration,
347 or if they are separated by a non-bit-field member
349 bit-fields in the same structure if all members declared
350 between them are also bit-fields, no matter what the
351 sizes of those intervening bit-fields happen to be.
359 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
375 ---------------------------
394 address-dependency barriers; see the "SMP barrier pairing" subsection.
397 (2) Address-dependency barriers (historical).
398 [!] This section is marked as HISTORICAL: it covers the long-obsolete
400 implicit in all marked accesses. For more up-to-date information,
404 An address-dependency barrier is a weaker form of read barrier. In the
407 the second load will be directed), an address-dependency barrier would
411 An address-dependency barrier is a partial ordering on interdependent
417 considered can then perceive. An address-dependency barrier issued by
422 the address-dependency barrier.
434 [!] Note that address-dependency barriers should normally be paired with
437 [!] Kernel release v5.9 removed kernel APIs for explicit address-
440 address-dependency barriers.
444 A read barrier is an address-dependency barrier plus a guarantee that all
452 Read memory barriers imply address-dependency barriers, and so can
476 This acts as a one-way permeable barrier. It guarantees that all memory
491 This also acts as a one-way permeable barrier. It guarantees that all
502 -not- guaranteed to act as a full memory barrier. However, after an
513 RELEASE variants in addition to fully-ordered and relaxed (no barrier
530 ----------------------------------------------
549 (*) There is no guarantee that some intervening piece of off-the-CPU
556 Documentation/driver-api/pci/pci.rst
557 Documentation/core-api/dma-api-howto.rst
558 Documentation/core-api/dma-api.rst
561 ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
562 ----------------------------------------
563 [!] This section is marked as HISTORICAL: it covers the long-obsolete
565 in all marked accesses. For more up-to-date information, including
571 to this section are those working on DEC Alpha architecture-specific code
574 address-dependency barriers.
576 [!] While address dependencies are observed in both load-to-load and
577 load-to-store relations, address-dependency barriers are not necessary
578 for load-to-store situations.
580 The requirement of address-dependency barriers is a little subtle, and
593 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
594 doesn't imply an address-dependency barrier.
611 To deal with this, READ_ONCE() provides an implicit address-dependency barrier
621 <implicit address-dependency barrier>
630 even-numbered cache lines and the other bank processes odd-numbered cache
631 lines. The pointer P might be stored in an odd-numbered cache line, and the
632 variable B might be stored in an even-numbered cache line. Then, if the
633 even-numbered bank of the reading CPU's cache is extremely busy while the
634 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
638 An address-dependency barrier is not required to order dependent writes
655 Therefore, no address-dependency barrier is required to order the read into
657 even without an implicit address-dependency barrier of modern READ_ONCE():
662 of dependency ordering is to -prevent- writes to the data structure, along
673 The address-dependency barrier is very important to the RCU system,
681 --------------------
687 A load-load control dependency requires a full read memory barrier, not
688 simply an (implicit) address-dependency barrier to make it work correctly.
692 <implicit address-dependency barrier>
699 dependency, but rather a control dependency that the CPU may short-circuit
710 However, stores are not speculated. This means that ordering -is- provided
711 for load-store control dependencies, as in the following example:
726 variable 'a' is always non-zero, it would be well within its rights
756 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
759 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
779 In contrast, without explicit memory barriers, two-legged-if control
799 if (q % MAX) {
807 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
819 relying on this ordering, you should make sure that MAX is greater than
823 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
824 if (q % MAX) {
836 You must also be careful not to rely too much on boolean short-circuit
851 out-guess your code. More generally, although READ_ONCE() does force
855 In addition, control dependencies apply only to the then-clause and
856 else-clause of the if-statement in question. In particular, it does
857 not necessarily apply to code following the if-statement:
871 conditional-move instructions, as in this fanciful pseudo-assembly
884 In short, control dependencies apply only to the stores in the then-clause
885 and else-clause of the if-statement in question (including functions
886 invoked by those two clauses), not to code following that if-statement.
897 However, they do -not- guarantee any other sort of ordering:
906 to carry out the stores. Please note that it is -not- sufficient
912 (*) Control dependencies require at least one run-time conditional
924 (*) Control dependencies apply only to the then-clause and else-clause
925 of the if-statement containing the control dependency, including
926 any functions that these two clauses call. Control dependencies
927 do -not- apply to code following the if-statement containing the
932 (*) Control dependencies do -not- provide multicopy atomicity. If you
940 -------------------
942 When dealing with CPU-CPU interactions, certain types of memory barrier should
949 with an address-dependency barrier, a control dependency, an acquire barrier,
951 read barrier, control dependency, or an address-dependency barrier pairs
970 <implicit address-dependency barrier>
990 match the loads after the read barrier or the address-dependency barrier, and
995 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
999 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
1003 ------------------------------------
1022 +-------+ : :
1023 | | +------+
1024 | |------>| C=3 | } /\
1025 | | : +------+ }----- \ -----> Events perceptible to
1027 | | : +------+ }
1029 | | +------+ }
1030 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
1031 | | +------+ } requires all stores prior to the
1033 | | : +------+ } further stores may take place
1034 | |------>| D=4 | }
1035 | | +------+
1036 +-------+ : :
1043 Secondly, address-dependency barriers act as partial orderings on address-
1059 +-------+ : : : :
1060 | | +------+ +-------+ | Sequence of update
1061 | |------>| B=2 |----- --->| Y->8 | | of perception on
1062 | | : +------+ \ +-------+ | CPU 2
1063 | CPU 1 | : | A=1 | \ --->| C->&Y | V
1064 | | +------+ | +-------+
1066 | | +------+ | : :
1067 | | : | C=&B |--- | : : +-------+
1068 | | : +------+ \ | +-------+ | |
1069 | |------>| D=4 | ----------->| C->&B |------>| |
1070 | | +------+ | +-------+ | |
1071 +-------+ : : | : : | |
1074 | +-------+ | |
1075 Apparently incorrect ---> | | B->7 |------>| |
1076 perception of B (!) | +-------+ | |
1078 | +-------+ | |
1079 The load of X holds ---> \ | X->9 |------>| |
1080 up the maintenance \ +-------+ | |
1081 of coherence of B ----->| B->2 | +-------+
1082 +-------+
1089 If, however, an address-dependency barrier were to be placed between the load
1100 <address-dependency barrier>
1105 +-------+ : : : :
1106 | | +------+ +-------+
1107 | |------>| B=2 |----- --->| Y->8 |
1108 | | : +------+ \ +-------+
1109 | CPU 1 | : | A=1 | \ --->| C->&Y |
1110 | | +------+ | +-------+
1112 | | +------+ | : :
1113 | | : | C=&B |--- | : : +-------+
1114 | | : +------+ \ | +-------+ | |
1115 | |------>| D=4 | ----------->| C->&B |------>| |
1116 | | +------+ | +-------+ | |
1117 +-------+ : : | : : | |
1120 | +-------+ | |
1121 | | X->9 |------>| |
1122 | +-------+ | |
1123 Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
1124 prior to the store of C \ +-------+ | |
1125 are perceptible to ----->| B->2 |------>| |
1126 subsequent loads +-------+ | |
1127 : : +-------+
1145 +-------+ : : : :
1146 | | +------+ +-------+
1147 | |------>| A=1 |------ --->| A->0 |
1148 | | +------+ \ +-------+
1149 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1150 | | +------+ | +-------+
1151 | |------>| B=2 |--- | : :
1152 | | +------+ \ | : : +-------+
1153 +-------+ : : \ | +-------+ | |
1154 ---------->| B->2 |------>| |
1155 | +-------+ | CPU 2 |
1156 | | A->0 |------>| |
1157 | +-------+ | |
1158 | : : +-------+
1160 \ +-------+
1161 ---->| A->1 |
1162 +-------+
1182 +-------+ : : : :
1183 | | +------+ +-------+
1184 | |------>| A=1 |------ --->| A->0 |
1185 | | +------+ \ +-------+
1186 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1187 | | +------+ | +-------+
1188 | |------>| B=2 |--- | : :
1189 | | +------+ \ | : : +-------+
1190 +-------+ : : \ | +-------+ | |
1191 ---------->| B->2 |------>| |
1192 | +-------+ | CPU 2 |
1195 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1196 barrier causes all effects \ +-------+ | |
1197 prior to the storage of B ---->| A->1 |------>| |
1198 to be perceptible to CPU 2 +-------+ | |
1199 : : +-------+
1219 +-------+ : : : :
1220 | | +------+ +-------+
1221 | |------>| A=1 |------ --->| A->0 |
1222 | | +------+ \ +-------+
1223 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1224 | | +------+ | +-------+
1225 | |------>| B=2 |--- | : :
1226 | | +------+ \ | : : +-------+
1227 +-------+ : : \ | +-------+ | |
1228 ---------->| B->2 |------>| |
1229 | +-------+ | CPU 2 |
1232 | +-------+ | |
1233 | | A->0 |------>| 1st |
1234 | +-------+ | |
1235 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1236 barrier causes all effects \ +-------+ | |
1237 prior to the storage of B ---->| A->1 |------>| 2nd |
1238 to be perceptible to CPU 2 +-------+ | |
1239 : : +-------+
1245 +-------+ : : : :
1246 | | +------+ +-------+
1247 | |------>| A=1 |------ --->| A->0 |
1248 | | +------+ \ +-------+
1249 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1250 | | +------+ | +-------+
1251 | |------>| B=2 |--- | : :
1252 | | +------+ \ | : : +-------+
1253 +-------+ : : \ | +-------+ | |
1254 ---------->| B->2 |------>| |
1255 | +-------+ | CPU 2 |
1258 \ +-------+ | |
1259 ---->| A->1 |------>| 1st |
1260 +-------+ | |
1262 +-------+ | |
1263 | A->1 |------>| 2nd |
1264 +-------+ | |
1265 : : +-------+
1274 ----------------------------------------
1278 other loads, and so do the load in advance - even though they haven't actually
1283 It may turn out that the CPU didn't actually need the value - perhaps because a
1284 branch circumvented the load - in which case it can discard the value or just
1298 : : +-------+
1299 +-------+ | |
1300 --->| B->2 |------>| |
1301 +-------+ | CPU 2 |
1303 +-------+ | |
1304 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1305 division speculates on the +-------+ ~ | |
1309 Once the divisions are complete --> : : ~-->| |
1311 LOAD with immediate effect : : +-------+
1314 Placing a read barrier or an address-dependency barrier just before the second
1329 : : +-------+
1330 +-------+ | |
1331 --->| B->2 |------>| |
1332 +-------+ | CPU 2 |
1334 +-------+ | |
1335 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1336 division speculates on the +-------+ ~ | |
1343 : : ~-->| |
1345 : : +-------+
1351 : : +-------+
1352 +-------+ | |
1353 --->| B->2 |------>| |
1354 +-------+ | CPU 2 |
1356 +-------+ | |
1357 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1358 division speculates on the +-------+ ~ | |
1364 +-------+ | |
1365 The speculation is discarded ---> --->| A->1 |------>| |
1366 and an updated value is +-------+ | |
1367 retrieved : : +-------+
1371 --------------------
1380 time to all -other- CPUs. The remainder of this document discusses this
1404 multicopy-atomic systems, CPU B's load must return either the same value
1414 able to compensate for non-multicopy atomicity. For example, suppose
1425 This substitution allows non-multicopy atomicity to run rampant: in
1431 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1436 General barriers can compensate not only for non-multicopy atomicity,
1437 but can also generate additional ordering that can ensure that -all-
1438 CPUs will perceive the same order of -all- operations. In contrast, a
1439 chain of release-acquire pairs do not provide this additional ordering,
1480 Furthermore, because of the release-acquire relationship between cpu0()
1486 However, the ordering provided by a release-acquire chain is local
1497 writes in order, CPUs not involved in the release-acquire chain might
1499 the weak memory-barrier instructions used to implement smp_load_acquire()
1502 store to u as happening -after- cpu1()'s load from v, even though
1508 -not- ensure that any particular value will be read. Therefore, the
1533 ----------------
1540 This is a general barrier -- there are no read-read or write-write
1550 interrupt-handler code and the code that was interrupted.
1555 The READ_ONCE() and WRITE_ONCE() functions can prevent any number of
1556 optimizations that, while perfectly safe in single-threaded code, can
1585 for single-threaded code, is almost certainly not what the developer
1606 single-threaded code, but can be fatal in concurrent code:
1624 single-threaded code, so you need to tell the compiler about cases
1638 This transformation is a win for single-threaded code because it
1650 do the following and MAX is a preprocessor macro with the value 1:
1652 while ((tmp = READ_ONCE(a)) % MAX)
1656 to MAX will always be zero, again allowing the compiler to optimize
1657 the code into near-nonexistence. (It will still load from the
1685 between process-level code and an interrupt handler:
1701 win for single-threaded code:
1762 In single-threaded code, this is not only safe, but also saves
1764 could cause some other CPU to see a spurious value of 42 -- even
1765 if variable 'a' was never zero -- when loading variable 'b'.
1774 damaging, but they can result in cache-line bouncing and thus in
1779 with a single memory-reference instruction, prevents "load tearing"
1782 16-bit store instructions with 7-bit immediate fields, the compiler
1783 might be tempted to use two 16-bit store-immediate instructions to
1784 implement the following 32-bit store:
1791 This optimization can therefore be a win in single-threaded code.
1815 implement these three assignment statements as a pair of 32-bit
1816 loads followed by a pair of 32-bit stores. This would result in
1836 -------------------
1848 All memory barriers except the address-dependency barriers imply a compiler
1862 systems because it is assumed that a CPU will appear to be self-consistent,
1864 However, see the subsection on "Virtual Machine Guests" below.
1873 windows. These barriers are required even on non-SMP systems as they affect
1878 There are some more advanced barrier functions:
1890 These are for use with atomic RMW functions that do not imply memory
1892 RMW functions that do not imply a memory barrier are e.g. add,
1893 subtract, (failed) conditional operations, _relaxed functions,
1898 These are also used for atomic RMW bitop functions that do not imply a
1904 obj->dead = 1;
1906 atomic_dec(&obj->ref_count);
1920 DMA capable device. See Documentation/core-api/dma-api.rst file for more
1928 if (desc->status != DEVICE_OWN) {
1933 read_data = desc->data;
1934 desc->data = write_data;
1940 desc->status = DEVICE_OWN;
1964 For example, after a non-temporal write to pmem region, we use pmem_wmb()
1975 For memory accesses with write-combining attributes (e.g. those returned
1978 write-combining memory accesses before this macro with those after it when
1985 Some of the other functions in the linux kernel imply memory barriers, amongst
1986 which are locking and scheduling functions.
1993 LOCK ACQUISITION FUNCTIONS
1994 --------------------------
2041 one-way barriers is that the effects of instructions outside of a critical
2062 RELEASE may -not- be assumed to be a full memory barrier.
2087 -could- occur.
2102 a sleep-unlock race, but the locking primitive needs to resolve
2107 anything at all - especially with respect to I/O accesses - unless combined
2110 See also the section on "Inter-CPU acquiring barrier effects".
2139 INTERRUPT DISABLING FUNCTIONS
2140 -----------------------------
2142 Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
2148 SLEEP AND WAKE-UP FUNCTIONS
2149 ---------------------------
2174 STORE current->state
2217 STORE current->state ...
2219 LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
2220 STORE task->state
2243 The available waker functions include:
2261 In terms of memory ordering, these functions all provide the same guarantees of
2265 order multiple stores before the wake-up with respect to loads of those stored
2300 MISCELLANEOUS FUNCTIONS
2301 -----------------------
2303 Other functions that imply barriers:
2309 INTER-CPU ACQUIRING BARRIER EFFECTS
2318 ---------------------------
2351 be a problem as a single-threaded linear piece of code will still appear to
2365 --------------------------
2390 To wake up a particular waiter, the up_read() or up_write() functions have to:
2405 LOAD waiter->list.next;
2406 LOAD waiter->task;
2407 STORE waiter->task;
2429 LOAD waiter->task;
2430 STORE waiter->task;
2438 LOAD waiter->list.next;
2439 --- OOPS ---
2446 LOAD waiter->list.next;
2447 LOAD waiter->task;
2449 STORE waiter->task;
2459 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2466 -----------------
2477 -----------------
2486 efficient to reorder, combine or merge accesses - something that would cause
2490 routines - such as inb() or writel() - which know how to make such accesses
2492 use of memory barriers unnecessary, if the accessor functions are used to refer
2496 See Documentation/driver-api/device-io.rst for more information.
2500 ----------
2506 This may be alleviated - at least in part - by disabling local interrupts (a
2508 the interrupt-disabled section in the driver. While the driver's interrupt
2515 under interrupt-disablement and then the driver's interrupt handler is invoked:
2534 accesses performed in an interrupt - and vice versa - unless implicit or
2544 likely, then interrupt-disabling locks should be used to guarantee ordering.
2552 specific. Therefore, drivers which are inherently non-portable may rely on
2556 series of accessor functions that provide various degrees of ordering
2604 The ordering properties of __iomem pointers obtained with non-default
2614 bullets 2-5 above) but they are still guaranteed to be ordered with
2622 register-based, memory-mapped FIFOs residing on peripherals that are not
2628 The inX() and outX() accessors are intended to access legacy port-mapped
2634 internal virtual memory mapping, the portable ordering guarantees
2639 Device drivers may expect outX() to emit a non-posted write transaction
2657 little-endian and will therefore perform byte-swapping operations on big-endian
2665 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2669 of arch-specific code.
2672 stream in any order it feels like - or even in parallel - provided that if an
2678 [*] Some instructions have more than one effect - such as changing the
2679 condition codes, changing registers or changing memory - and different
2705 <--- CPU ---> : <----------- Memory ----------->
2707 +--------+ +--------+ : +--------+ +-----------+
2708 | | | | : | | | | +--------+
2710 | Core |--->| Access |----->| Cache |<-->| | | |
2711 | | | Queue | : | | | |--->| Memory |
2713 +--------+ +--------+ : +--------+ | | | |
2714 : | Cache | +--------+
2716 : | Mechanism | +--------+
2717 +--------+ +--------+ : +--------+ | | | |
2719 | CPU | | Memory | : | CPU | | |--->| Device |
2720 | Core |--->| Access |----->| Cache |<-->| | | |
2722 | | | | : | | | | +--------+
2723 +--------+ +--------+ : +--------+ +-----------+
2754 ----------------------
2771 See Documentation/core-api/cachetlb.rst for more information on cache
2776 -----------------------
2832 (*) the CPU's data cache may affect the ordering, and while cache-coherency
2833 mechanisms may alleviate this - once the store has actually hit the cache
2834 - there's no guarantee that the coherency management will be propagated in
2845 However, it is guaranteed that a CPU will be self-consistent: it will see its
2872 are -not- optional in the above example, as there are architectures
2907 --------------------------
2911 two semantically-related cache lines updated at separate times. This is where
2912 the address-dependency barrier really becomes necessary as this synchronises
2921 VIRTUAL MACHINE GUESTS
2922 ----------------------
2924 Guests running within virtual machines might be affected by SMP effects even if
2927 barriers for this use-case would be possible but is often suboptimal.
2929 To handle this case optimally, low-level virt_mb() etc macros are available.
2931 identical code for SMP and non-SMP systems. For example, virtual machine guests
2945 ----------------
2950 Documentation/core-api/circular-buffers.rst
2967 Chapter 7.1: Memory-Access Ordering
2970 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2973 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2988 Chapter 15: Sparc-V9 Memory Models
3004 Solaris Internals, Core Kernel Architecture, p63-68: