Expedited-Grace-Periods.rst - OpenGrok cross reference for /Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst

Lines Matching +full:pre +full:- +full:processing
13 There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier
14 third RCU-bh flavor having been implemented in terms of the other two.
38 RCU-preempt Expedited Grace Periods
41 ``CONFIG_PREEMPT=y`` kernels implement RCU-preempt.
42 The overall flow of the handling of a given CPU by an RCU-preempt
45 .. kernel-figure:: ExpRCUFlow.svg
59 can check to see if the CPU is currently running in an RCU read-side
63 invocation will provide the needed quiescent-state report.
64 This flag-setting avoids the previous forced preemption of all
65 CPUs that might have RCU read-side critical sections.
66 In addition, this flag-setting is done so as to avoid increasing
67 the overhead of the common-case fastpath through the scheduler.
69 Again because this is preemptible RCU, an RCU read-side critical section
83 +-----------------------------------------------------------------------+
85 +-----------------------------------------------------------------------+
87 | the CPUs? After all, that would avoid all those real-time-unfriendly  |
89 +-----------------------------------------------------------------------+
91 +-----------------------------------------------------------------------+
92 | Because we want the RCU read-side critical sections to run fast,      |
98 | testing would not help the worst-case latency that real-time          |
101 | One way to prevent your real-time application from getting hit with   |
106 +-----------------------------------------------------------------------+
112 RCU-sched Expedited Grace Periods
113 ---------------------------------
115 ``CONFIG_PREEMPT=n`` kernels implement RCU-sched. The overall flow of
116 the handling of a given CPU by an RCU-sched expedited grace period is
119 .. kernel-figure:: ExpSchedFlow.svg
121 As with RCU-preempt, RCU-sched's ``synchronize_rcu_expedited()`` ignores
126 an RCU read-side critical section. The best that RCU-sched's
127 ``rcu_exp_handler()`` can do is to check for idle, on the off-chance
138 --------------------------------------
143 in splats, but failing to IPI online CPUs can result in too-short grace
150    ``rcu_state`` structure's ``->ncpus`` field. The ``rcu_state``
151    structure's ``->ncpus_snap`` field tracks the number of CPUs that
156    the ``rcu_node`` structure's ``->expmaskinitnext`` field. The
157    ``rcu_node`` structure's ``->expmaskinit`` field tracks the
160    ``rcu_state`` structure's ``->ncpus`` and ``->ncpus_snap`` fields are
162    that is, when the ``rcu_node`` structure's ``->expmaskinitnext``
165    ``->expmaskinit`` field from its ``->expmaskinitnext`` field.
166 #. Each ``rcu_node`` structure's ``->expmaskinit`` field is used to
167    initialize that structure's ``->expmask`` at the beginning of each
172    structure's ``->qsmaskinitnext`` field, so any CPU with that bit
176 #. For each non-idle CPU that RCU believes is currently online, the
182    concurrent CPU-hotplug operation to complete.
183 #. In the case of RCU-sched, one of the last acts of an outgoing CPU is
185    that CPU. However, this is likely paranoia-induced redundancy.
187 +-----------------------------------------------------------------------+
189 +-----------------------------------------------------------------------+
193 +-----------------------------------------------------------------------+
195 +-----------------------------------------------------------------------+
198 | between grace-period initialization and CPU-hotplug operations. For   |
200 | CPU-offline operation is progressing up the tree. This situation can  |
203 | will result in grace-period hangs. In short, that way lies madness,   |
205 | In contrast, the current multi-mask multi-counter scheme ensures that |
206 | grace-period initialization will always see consistent masks up and   |
208 | single-mask method.                                                   |
211 | synchronization <http://www.cs.columbia.edu/~library/TR-repository/re |
212 | ports/reports-1992/cucs-039-92.ps.gz>`__.                             |
213 | Lazily recording CPU-hotplug events at the beginning of the next      |
214 | grace period greatly simplifies maintenance of the CPU-tracking       |
216 +-----------------------------------------------------------------------+
219 ----------------------------------
221 Idle-CPU Checks
231 For RCU-sched, there is an additional check: If the IPI has interrupted
235 For RCU-preempt, there is no specific check for idle in the IPI handler
236 (``rcu_exp_handler()``), but because RCU read-side critical sections are
238 the CPU is within RCU read-side critical section, the CPU cannot
251 If each grace-period request was carried out separately, expedited grace
252 periods would have abysmal scalability and problematic high-load
253 characteristics. Because each grace-period operation can serve an
255 that a single expedited grace-period operation will cover all requests
259 ``->expedited_sequence`` in the ``rcu_state`` structure. This counter
279 grace-period operation, which means there must be an efficient way to
292 structure records its desired grace-period sequence number in the
293 ``->exp_seq_rq`` field and moves up to the next level in the tree.
294 Otherwise, if the ``->exp_seq_rq`` field already contains the sequence
296 blocks on one of four wait queues in the ``->exp_wq[]`` array, using the
297 second-from-bottom and third-from bottom bits as an index. An
298 ``->exp_lock`` field in the ``rcu_node`` structure synchronizes access
302 white cells representing the ``->exp_seq_rq`` field and the red cells
303 representing the elements of the ``->exp_wq[]`` array.
305 .. kernel-figure:: Funnel0.svg
310 ``->expedited_sequence`` field is zero, so adding three and clearing the
312 ``->exp_seq_rq`` field of their respective ``rcu_node`` structures:
314 .. kernel-figure:: Funnel1.svg
317 Suppose that Task A wins, recording its desired grace-period sequence
320 .. kernel-figure:: Funnel2.svg
324 sequence number is already recorded, blocks on ``->exp_wq[1]``.
326 +-----------------------------------------------------------------------+
328 +-----------------------------------------------------------------------+
329 | Why ``->exp_wq[1]``? Given that the value of these tasks' desired     |
331 | ``->exp_wq[2]``?                                                      |
332 +-----------------------------------------------------------------------+
334 +-----------------------------------------------------------------------+
340 | ``->exp_wq[1]``.                                                      |
341 +-----------------------------------------------------------------------+
344 desired grace-period sequence number, and see that both leaf
347 ``->exp_wq[1]`` fields, as shown below:
349 .. kernel-figure:: Funnel3.svg
351 Task A now acquires the ``rcu_state`` structure's ``->exp_mutex`` and
352 initiates the grace period, which increments ``->expedited_sequence``.
356 .. kernel-figure:: Funnel4.svg
363 .. kernel-figure:: Funnel5.svg
367 ``->expedited_sequence``, acquires the ``->exp_wake_mutex`` and then
368 releases the ``->exp_mutex``. This results in the following state:
370 .. kernel-figure:: Funnel6.svg
372 Task E can then acquire ``->exp_mutex`` and increment
373 ``->expedited_sequence`` to the value three. If new tasks G and H arrive
377 .. kernel-figure:: Funnel7.svg
381 on the ``->exp_wq`` waitqueues, resulting in the following state:
383 .. kernel-figure:: Funnel8.svg
388 +-----------------------------------------------------------------------+
390 +-----------------------------------------------------------------------+
393 +-----------------------------------------------------------------------+
395 +-----------------------------------------------------------------------+
396 | Then Task E will block on the ``->exp_wake_mutex``, which will also   |
397 | prevent it from releasing ``->exp_mutex``, which in turn will prevent |
399 | preventing overflow of the ``->exp_wq[]`` array.                      |
400 +-----------------------------------------------------------------------+
409 `workqueues <https://www.kernel.org/doc/Documentation/core-api/workqueue.rst>`__.
411 The requesting task still does counter snapshotting and funnel-lock
412 processing, but the task reaching the top of the funnel lock does a
414 workqueue kthread does the actual grace-period processing. Because
415 workqueue kthreads do not accept POSIX signals, grace-period-wait
416 processing need not allow for POSIX signals. In addition, this approach
418 with processing for the next expedited grace period. Because there are
421 wakeups start. This is handled by having the ``->exp_mutex`` guard
422 expedited grace-period processing and the ``->exp_wake_mutex`` guard
423 wakeups. The key point is that the ``->exp_mutex`` is not released until
424 the first wakeup is complete, which means that the ``->exp_wake_mutex``
438 +-----------------------------------------------------------------------+
440 +-----------------------------------------------------------------------+
441 | But why not just let the normal grace-period machinery detect the     |
444 +-----------------------------------------------------------------------+
446 +-----------------------------------------------------------------------+
450 +-----------------------------------------------------------------------+
454 RCU CPU stall-warning time. If this time is exceeded, any CPUs or
459 Mid-boot operation
462 The use of workqueues has the advantage that the expedited grace-period
467 really do want to execute grace periods during this mid-boot “dead
473 drive the grace period during the mid-boot dead zone. Before mid-boot, a
474 synchronous grace period is a no-op. Some time after mid-boot,
477 Non-expedited non-SRCU synchronous grace periods must also operate
478 normally during mid-boot. This is handled by causing non-expedited grace
479 periods to take the expedited code path during mid-boot.
482 mid-boot dead zone. However, if an overwhelming need for POSIX signals
484 stall-warning code. One such adjustment would reinstate the
485 pre-workqueue stall-warning checks, but only during the mid-boot dead
496 Expedited grace periods use a sequence-number approach to promote
497 batching, so that a single grace-period operation can serve numerous
501 structure. The actual grace-period processing is carried out by a
504 CPU-hotplug operations are noted lazily in order to prevent the need for
505 tight synchronization between expedited grace periods and CPU-hotplug
506 operations. The dyntick-idle counters are used to avoid sending IPIs to
507 idle CPUs, at least in the common case. RCU-preempt and RCU-sched use
515 period's processing.
518 reasonably efficiently. However, for non-time-critical tasks, normal
520 permits much higher degrees of batching, and thus much lower per-request