Tree-RCU-Memory-Ordering.rst - OpenGrok cross reference for /Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst

Lines Matching +full:cpu +full:- +full:idle +full:- +full:states
2 A Tour Through TREE_RCU's Grace-Period Memory Ordering
13 grace-period memory ordering guarantee is provided.
18 RCU grace periods provide extremely strong memory-ordering guarantees
19 for non-idle non-offline code.
22 period that are within RCU read-side critical sections.
25 of that grace period that are within RCU read-side critical sections.
27 Note well that RCU-sched read-side critical sections include any region
30 an extremely small region of preemption-disabled code, one can think of
37 a linked RCU-protected data structure, and phase two frees that element.
39 phase-one update (in the common case, removal) must not witness state
40 following the phase-two update (in the common case, freeing).
43 of lock-based critical sections, memory barriers, and per-CPU
49 The workhorse for RCU's grace-period memory ordering is the
51 ``->lock``. These critical sections use helper functions for lock
54 Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``,
58 The key point is that the lock-acquisition functions, including
63 happening before one of the above lock-release functions will be seen
65 one of the above lock-acquisition functions.
67 above lock-release function on any given CPU will be seen by all
69 of the above lock-acquisition functions executing on that same CPU,
70 even if the lock-release and lock-acquisition functions are operating
78 lock-acquisition and lock-release functions::
115 +-----------------------------------------------------------------------+
117 +-----------------------------------------------------------------------+
118 | But the chain of rcu_node-structure lock acquisitions guarantees      |
119 | that new readers will see all of the updater's pre-grace-period       |
120 | accesses and also guarantees that the updater's post-grace-period     |
123 +-----------------------------------------------------------------------+
125 +-----------------------------------------------------------------------+
126 | Because we must provide ordering for RCU's polling grace-period       |
130 |  CPU 0                                     CPU 1                      |
131 |  ----                                      ----                       |
139 | happen, even if CPU 1 is in an RCU extended quiescent state           |
140 | (idle or offline) and thus won't interact directly with the RCU       |
142 +-----------------------------------------------------------------------+
144 This approach must be extended to include idle CPUs, which need
145 RCU's grace-period memory ordering guarantee to extend to any
146 RCU read-side critical sections preceding and following the current
147 idle sojourn.
149 ``atomic_add_return()`` read-modify-write atomic operation that
150 is invoked within ``ct_kernel_exit_state()`` at idle-entry
151 time and within ``ct_kernel_enter_state()`` at idle-exit time.
152 The grace-period kthread invokes first ``ct_rcu_watching_cpu_acquire()``
154 (both of which rely on acquire semantics) to detect idle CPUs.
156 +-----------------------------------------------------------------------+
158 +-----------------------------------------------------------------------+
160 +-----------------------------------------------------------------------+
162 +-----------------------------------------------------------------------+
164 | the grace period won't expect quiescent states from them. Races       |
165 | between grace-period start and CPU-hotplug operations are mediated    |
166 | by the CPU's leaf ``rcu_node`` structure's ``->lock`` as described    |
168 +-----------------------------------------------------------------------+
172 a CPU that is not yet aware that the grace period has ended, and thus
177 +-----------------------------------------------------------------------+
179 +-----------------------------------------------------------------------+
182 +-----------------------------------------------------------------------+
184 +-----------------------------------------------------------------------+
191 +-----------------------------------------------------------------------+
193 Tree RCU's grace--period memory-ordering guarantees rely most heavily on
194 the ``rcu_node`` structure's ``->lock`` field, so much so that it is
215    14   if (tne != rdp->tick_nohz_enabled_snap) {
216    15     if (!rcu_segcblist_empty(&rdp->cblist))
218    17     rdp->tick_nohz_enabled_snap = tne;
226    25    * callbacks on this CPU.
228    27   if (rdp->last_accelerate == jiffies)
230    29   rdp->last_accelerate = jiffies;
231    30   if (rcu_segcblist_pend_cbs(&rdp->cblist)) {
232    31     rnp = rdp->mynode;
245 .. kernel-figure:: rcu_node-lock.svg
247 The box represents the ``rcu_node`` structure's ``->lock`` critical
254 Tree RCU's grace-period memory-ordering guarantee is provided by a
258 #. `Grace-Period Initialization`_
259 #. `Self-Reported Quiescent States`_
261 #. `CPU-Hotplug Interface`_
262 #. `Forcing Quiescent States`_
263 #. `Grace-Period Cleanup`_
272 If RCU's grace-period guarantee is to mean anything at all, any access
278 .. kernel-figure:: TreeRCU-callback-registry.svg
280 Because ``call_rcu()`` normally acts only on CPU-local state, it
283 RCU-protected data structure). It simply enqueues the ``rcu_head``
284 structure on a per-CPU list, which cannot become associated with a grace
290 current CPU is inundated with queued ``rcu_head`` structures) or more
298 is invoked on a surviving CPU after the outgoing CPU has been completely
301 There are a few other code paths within grace-period processing that
303 all of the CPU's recently queued ``rcu_head`` structures are associated
304 with a future grace-period number under the protection of the CPU's lead
305 ``rcu_node`` structure's ``->lock``. In all cases, there is full
307 structure's ``->lock``, and also full ordering against any of the
308 current task's or CPU's prior critical sections for any ``rcu_node``
309 structure's ``->lock``.
315 +-----------------------------------------------------------------------+
317 +-----------------------------------------------------------------------+
319 +-----------------------------------------------------------------------+
321 +-----------------------------------------------------------------------+
325 +-----------------------------------------------------------------------+
327 Grace-Period Initialization
330 Grace-period initialization is carried out by the grace-period kernel
333 ordering through the grace-period computation will require duplicating
336 to keep the ``rcu_node`` river tractable, the grace-period kernel
338 section with the various phases of grace-period initialization.
340 The first ordering-related grace-period initialization action is to
341 advance the ``rcu_state`` structure's ``->gp_seq`` grace-period-number
344 .. kernel-figure:: TreeRCU-gp-init-1.svg
347 helps reject false-positive RCU CPU stall detection. Note that only the
357 first incoming CPU. Similarly, if the number of online CPUs for a given
359 ``rcu_cleanup_dead_rnp()`` will be invoked for the last outgoing CPU.
361 ``rcu_node`` structure onlines its first CPU and if the next
363 leftmost ``rcu_node`` structure offlines its last CPU and if the next
366 .. kernel-figure:: TreeRCU-gp-init-2.svg
369 breadth-first, setting each ``rcu_node`` structure's ``->gp_seq`` field
373 .. kernel-figure:: TreeRCU-gp-init-3.svg
375 This change will also cause each CPU's next call to
377 as described in the next section. But because the grace-period kthread
379 ``rcu_state`` structure's ``->gp_seq`` field) before setting each leaf
380 ``rcu_node`` structure's ``->gp_seq`` field, each CPU's observation of
384 +-----------------------------------------------------------------------+
386 +-----------------------------------------------------------------------+
387 | But what about the CPU that started the grace period? Why wouldn't it |
390 +-----------------------------------------------------------------------+
392 +-----------------------------------------------------------------------+
394 | CPU starting the grace period is immediately aware of having done so. |
395 | However, if we instead assume that RCU is not self-aware, then even   |
396 | the CPU starting the grace period does not really become aware of the |
398 | ``__note_gp_changes()``. On the other hand, this CPU potentially gets |
402 +-----------------------------------------------------------------------+
404 Self-Reported Quiescent States
408 quiescent states (or as described in a later section, had quiescent
409 states reported on their behalf), the grace period can end. Online
410 non-idle CPUs report their own quiescent states, as shown in the
413 .. kernel-figure:: TreeRCU-qs.svg
415 This is for the last CPU to report a quiescent state, which signals the
416 end of the grace period. Earlier quiescent states would push up the
418 that is waiting for additional quiescent states. However, ordering is
420 that ``rcu_node`` structure's ``->lock``.
422 Any number of events can lead up to a CPU invoking ``note_gp_changes``
424 point that CPU will notice the start of a new grace period while holding
427 CPU will consider any RCU read-side critical section that started before
432 +-----------------------------------------------------------------------+
434 +-----------------------------------------------------------------------+
435 | But a RCU read-side critical section might have started after the     |
436 | beginning of the grace period (the advancing of ``->gp_seq`` from     |
439 +-----------------------------------------------------------------------+
441 +-----------------------------------------------------------------------+
445 | more scalable than a “big bang” all-at-once grace-period start could  |
447 +-----------------------------------------------------------------------+
449 If the CPU does a context switch, a quiescent state will be noted by
450 ``rcu_note_context_switch()`` on the left. On the other hand, if the CPU
451 takes a scheduler-clock interrupt while executing in usermode, a
454 per-CPU variable.
456 The next time an ``RCU_SOFTIRQ`` handler executes on this CPU (for
457 example, after the next scheduler-clock interrupt), ``rcu_core()`` will
463 diagram, clearing bits from each ``rcu_node`` structure's ``->qsmask``
467 only if the current CPU is reporting the last quiescent state for the
469 CPU's traversal stops at a given ``rcu_node`` structure, then there will
470 be a later traversal by another CPU (or perhaps the same one) that
471 proceeds upwards from that point, and the ``rcu_node`` ``->lock``
472 guarantees that the first CPU's quiescent state happens before the
473 remainder of the second CPU's traversal. Applying this line of thought
474 repeatedly shows that all CPUs' quiescent states happen before the last
475 CPU traverses through the root ``rcu_node`` structure, the “last CPU”
477 structure's ``->qsmask`` field.
482 Due to energy-efficiency considerations, RCU is forbidden from
483 disturbing idle CPUs. CPUs are therefore required to notify RCU when
484 entering or leaving idle state, which they do via fully ordered
485 value-returning atomic operations on a per-CPU variable. The ordering
488 .. kernel-figure:: TreeRCU-dyntick.svg
490 The RCU grace-period kernel thread samples the per-CPU idleness variable
491 while holding the corresponding CPU's leaf ``rcu_node`` structure's
492 ``->lock``. This means that any RCU read-side critical sections that
493 precede the idle period (the oval near the top of the diagram above)
496 read-side critical sections that follow the idle period (the oval near
499 Plumbing this into the full grace-period execution is described
502 CPU-Hotplug Interface
508 corresponding CPU hotplug operations. The ordering effects are shown
511 .. kernel-figure:: TreeRCU-hotplug.svg
513 Because CPU hotplug operations are much less frequent than idle
514 transitions, they are heavier weight, and thus acquire the CPU's leaf
515 ``rcu_node`` structure's ``->lock`` and update this structure's
516 ``->qsmaskinitnext``. The RCU grace-period kernel thread samples this
520 Plumbing this into the full grace-period execution is described
523 Forcing Quiescent States
526 As noted above, idle and offline CPUs cannot report their own quiescent
527 states, and therefore the grace-period kernel thread must do the
529 states”, it is repeated every few jiffies, and its ordering effects are
532 .. kernel-figure:: TreeRCU-gp-fqs.svg
535 ``rcu_node`` structures, and if there are no new quiescent states due to
537 However, if there is a newly offlined CPU as illustrated on the left or
538 a newly idled CPU as illustrated on the right, the corresponding
540 self-reported quiescent states, the upwards driving stops once it
541 reaches an ``rcu_node`` structure that has quiescent states outstanding
544 +-----------------------------------------------------------------------+
546 +-----------------------------------------------------------------------+
552 +-----------------------------------------------------------------------+
554 +-----------------------------------------------------------------------+
559 | `stitched-together diagram <Putting It All Together_>`__.             |
560 +-----------------------------------------------------------------------+
562 Grace-Period Cleanup
565 Grace-period cleanup first scans the ``rcu_node`` tree breadth-first
566 advancing all the ``->gp_seq`` fields, then it advances the
567 ``rcu_state`` structure's ``->gp_seq`` field. The ordering effects are
570 .. kernel-figure:: TreeRCU-gp-cleanup.svg
572 As indicated by the oval at the bottom of the diagram, once grace-period
575 +-----------------------------------------------------------------------+
577 +-----------------------------------------------------------------------+
579 +-----------------------------------------------------------------------+
581 +-----------------------------------------------------------------------+
583 | to end. The earliest reasonable candidate is as soon as the last CPU  |
586 | once the ``rcu_state`` structure's ``->gp_seq`` field has been        |
590 +-----------------------------------------------------------------------+
595 Once a given CPU's leaf ``rcu_node`` structure's ``->gp_seq`` field has
596 been updated, that CPU can begin invoking its RCU callbacks that were
600 can be triggered by the scheduling-clock interrupt
601 (``rcu_sched_clock_irq()`` on the left) or by idle entry
606 indirectly via wakeup) the needed phase-two processing for each update.
608 .. kernel-figure:: TreeRCU-callback-invocation.svg
611 of corner-case code paths, for example, when a CPU notes that it has
612 excessive numbers of callbacks queued. In all cases, the CPU acquires
613 its leaf ``rcu_node`` structure's ``->lock`` before invoking callbacks,
620 that runs on some other CPU, proper ordering must in place in both the
622 important, consider the top half of the `grace-period
624 running on a CPU corresponding to the leftmost leaf ``rcu_node``
625 structure, and awaken a task that is to run on a CPU corresponding to
626 the rightmost leaf ``rcu_node`` structure, and the grace-period kernel
628 grace period's memory ordering might not yet have reached that CPU, so
635 A stitched-together diagram is here:
637 .. kernel-figure:: TreeRCU-gp.svg