Lines Matching refs:store
21 <li style="margin:0"><a href="#ss_ll">Store/store and load/load</a></li>
22 <li style="margin:0"><a href="#ls_sl">Load/store and store/load</a></li>
202 <p>To get into a situation where we see B=5 before we see the store to A, either
214 stores, it does not guarantee that a store followed by a load will be observed
244 SMP, the store to A and the load from B in thread 1 can be “observed” in a
293 continue executing instructions past the one that did the store, possibly
311 either way it won’t see the store performed by core 1. (“A” could be in core
318 performance penalty on every store operation. Relaxing the rules for the
326 store #1 to <strong>finish</strong> being published before it can start on store
345 is meant by “observing” a load or store. Suppose core 1 executes “A = 1”. The
346 store is <em>initiated</em> when the CPU executes the instruction. At some
347 point later, possibly through cache coherence activity, the store is
349 <em>complete</em> until the store arrives in main memory, but the memory
431 <li>store followed by another store</li>
433 <li>load followed by store</li>
434 <li>store followed by load</li>
437 <h4 id="ss_ll">Store/store and load/load</h4>
455 <p>Thread 1 needs to ensure that the store to A happens before the store to B.
456 This is a “store/store” situation. Similarly, thread 2 needs to ensure that the
463 lines, with minimal cache coherency. If the store to A stays local but the
464 store to B is published, core 2 will see B=1 but won’t see the update to A. On
480 <em>store/store barrier</em><br />
488 <p>The store/store barrier guarantees that <strong>all observers</strong> will
494 <p>Since the store/store barrier guarantees that thread 2 observes the stores in
500 <p>The store/store barrier could work by flushing all
514 <h4 id="ls_sl">Load/store and store/load</h4>
517 store/load barrier. Here’s an example where a load/store barrier is
533 <p>Thread 2 could observe thread 1’s store of B=1 before it observe’s thread 1’s
534 load from A, and as a result store A=41 before thread 1 has a chance to read A.
535 Inserting a load/store barrier in each thread solves the problem:</p>
544 <em>load/store barrier</em><br />
547 <em>load/store barrier</em><br />
554 <p>A store to local cache may be observed before a load from main memory,
557 while that’s in progress execution continues. The store to B happens in local
564 thread 2 store to A before thread 1’s read if thread 1 guarantees the load/store
572 <p>As mentioned earlier, store/load barriers are the only kind required on x86
583 <li>Alpha provides “rmb” (load/load), “wmb” (store/store), and “mb” (full).
588 <li>ARMv7 has “dmb st” (store/store) and “dmb sy” (full).</li>
614 <em>store/store barrier</em><br />
644 <em>store/store barrier</em><br />
662 load or store. It can let you avoid the need for an explicit barrier in certain
710 store in thread 1 causes something to happen in thread 2 which causes something
712 that order. (Inserting a load/store barrier in thread 2 fixes this.)</p>
753 load 0 from A, increment it to 1, and store it back, leaving a final result of
787 location is doubleword-aligned and special load/store instructions are used.
795 store.</p>
809 conditional store instruction is used to try to write the data back. If the
810 reservation is still in place, the store succeeds; if not, the store will fail.
922 other threads will stay out until they observe the store of 0. If it takes a
934 When releasing the spinlock, we issue the barrier and then the atomic store.
954 the store of zero to the lock word is observed after any loads or stores in the
955 critical section above it. In other words, we need a load/store and store/store
957 SMP -- only store/load barriers are required. The implementation of
959 barrier followed by a simple store. No CPU barrier is required.</p>
1030 move a store “downward” across another store’s release barrier.</li>
1031 <li>A load followed by a store can’t be reordered, because neither instruction
1033 <li>A store followed by a load <strong>can</strong> be reordered, because each
1037 <p>Hence, you only need store/load barriers on x86 SMP.</p>
1116 <p>Without a memory barrier, the store to <code>gGlobalThing</code> could be observed before
1162 <p>We need to replace the store with:</p>
1230 releasing store. This means that compilers and code optimizers are free to
1254 usual ways, for example the compiler could move a non-volatile load or store “above” a
1255 volatile store, but couldn’t move it “below”. Volatile accesses may not be
1401 <p>Now the problem should be obvious: the store to <code>helper</code> is
1406 <p>You could try to ensure that the store to <code>helper</code> happens after
1450 it can’t make any assumptions about <code>data2</code>, because that store was
1451 performed after the volatile store.</p>
1772 <p>As we saw in an earlier section, we need to insert a store/load barrier
1779 <th>volatile store</th>
1783 <em>load/load + load/store barrier</em></code></td>
1784 <td><code><em>store/store barrier</em><br />
1786 <em>store/load barrier</em></code></td>
1790 <p>The volatile load is just an acquiring load. The volatile store is similar
1791 to a releasing store, but we’ve omitted load/store from the pre-store barrier,
1792 and added a store/load barrier afterward.</p>
1796 issue the store/load barrier before the volatile load instead and get the same
1798 with the store.</p>
1801 atomic operation and skip the explicit store/load barrier. On x86, for example,