Lines Matching +full:pre +full:- +full:its
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
5 <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
16 <li> <a href="#Data-Structure Relationships">
17 Data-Structure Relationships</a>
30 <li> <a href="#RCU-Specific Fields in the task_struct Structure">
31 RCU-Specific Fields in the <tt>task_struct</tt> Structure</a>
36 <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
38 <p>RCU is for all intents and purposes a large state machine, and its
57 which results in a three-level <tt>rcu_node</tt> tree.
61 </p><p>The purpose of this combining tree is to allow per-CPU events
62 such as quiescent states, dyntick-idle transitions,
65 Quiescent states are recorded by the per-CPU <tt>rcu_data</tt> structures,
66 and other events are recorded by the leaf-level <tt>rcu_node</tt>
77 </p><p>As can be seen from the diagram, on a 64-bit system
78 a two-level tree with 64 leaves can accommodate 1,024 CPUs, with a fanout
89 Because there are more types of events that affect the leaf-level
92 64, the contention on these structures' <tt>->structures</tt>
100 that the fanout for the non-leaf <tt>rcu_node</tt> structures
105 contention problems on the non-leaf <tt>rcu_node</tt> structures,
107 parameter to reduce the non-leaf fanout as needed.
120 a 32-bit system), then RCU will automatically add more levels to the
122 For example, if you are crazy enough to build a 64-bit system with 65,536
127 </p><p>RCU currently permits up to a four-level tree, which on a 64-bit system
129 32-bit systems.
134 </p><p>This multi-level combining tree allows us to get most of the
136 benefits of partitioning, even though RCU grace-period detection is
141 This means that at the leaf-level <tt>rcu_node</tt> structure, only
144 more extreme: Only one access out of sixty-four will progress up
148 No matter how many CPUs there are in the system, at most 64 quiescent-state
167 turns off the scheduling-clock interrupts on idle CPUs, which in
170 CPUs whose scheduling-clock interrupts have been turned off are
171 said to be in <i>dyntick-idle mode</i>.
172 RCU must handle dyntick-idle CPUs specially
180 </p><p>However, if a CPU is in dyntick-idle mode, it is in that mode
208 its own synchronization:
227 read-side performance.
240 tracks grace periods, serves as short-term repository
241 for callbacks orphaned by CPU-hotplug events,
243 tracks expedited grace-period state,
247 tree that propagates quiescent-state
249 grace-period information from the root to the leaves.
250 It provides local copies of the grace-period state in order
256 RCU read-side critical section.
259 per-<tt>rcu_node</tt> priority-boosting
261 Finally, it records CPU-hotplug state in order to determine
263 <li> <tt>rcu_data</tt>: This per-CPU structure is the
264 focus of quiescent-state detection and RCU callback queuing.
265 It also tracks its relationship to the corresponding leaf
266 <tt>rcu_node</tt> structure to allow more-efficient
270 copy of the grace-period information to allow for-free
273 Finally, this structure records past dyntick-idle state
276 This per-CPU structure tracks the current dyntick-idle
284 within the RCU-protected data structure.
301 synchronize with CPU-hotplug events,
315 <pre>
319 </pre>
345 <tt>->node[]</tt> array as shown in the following figure:
350 breadth-first traversal of the tree is implemented as a simple
355 </p><p>Each entry of the <tt>->level</tt> array references
366 </p><p>For whatever it is worth, if you draw the tree to be tree-shaped
367 rather than array-shaped, it is easy to draw a planar representation:
371 </p><p>Finally, the <tt>->rda</tt> field references a per-CPU
377 <h5>Grace-Period Tracking</h5>
382 <pre>
384 </pre>
387 the <tt>->gp_seq</tt> field contains the current grace-period
391 In other words, if the bottom two bits of <tt>->gp_seq</tt> are
395 <tt>->lock</tt> field.
397 </p><p>There are <tt>->gp_seq</tt> fields
412 <pre>
416 </pre>
418 <p>The <tt>->gp_max</tt> field tracks the duration of the longest
420 It is protected by the root <tt>rcu_node</tt>'s <tt>->lock</tt>.
422 <p>The <tt>->name</tt> field points to the name of the RCU flavor
424 The <tt>->abbr</tt> field contains a one-character abbreviation,
425 for example, “s” for RCU-sched.
431 tree that propagates quiescent-state
433 grace-period information from the root down to the leaves.
434 They provides local copies of the grace-period state in order
440 RCU read-side critical section.
443 per-<tt>rcu_node</tt> priority-boosting
445 Finally, they record CPU-hotplug state in order to determine
456 <pre>
463 </pre>
465 <p>The <tt>->parent</tt> pointer references the <tt>rcu_node</tt>
470 The <tt>->level</tt> field gives the level in the tree, with
471 the root being at level zero, its children at level one, and so on.
472 The <tt>->grpnum</tt> field gives this node's position within
473 the children of its parent, so this number can range between 0 and 31
474 on 32-bit systems and between 0 and 63 on 64-bit systems.
475 The <tt>->level</tt> and <tt>->grpnum</tt> fields are
477 The <tt>->grpmask</tt> field is the bitmask counterpart of
478 <tt>->grpnum</tt>, and therefore always has exactly one bit set.
480 structure in its parent's bitmasks, which are described later.
481 Finally, the <tt>->grplo</tt> and <tt>->grphi</tt> fields
493 <pre>
495 </pre>
504 <h5>Grace-Period Tracking</h5>
509 <pre>
512 </pre>
514 <p>The <tt>rcu_node</tt> structures' <tt>->gp_seq</tt> fields are
520 <tt>->gp_seq</tt> field is zero, then this <tt>rcu_node</tt>
526 <p>The <tt>->gp_seq_needed</tt> fields record the
527 furthest-in-the-future grace period request seen by the corresponding
529 the value of the <tt>->gp_seq</tt> field equals or exceeds that of
530 the <tt>->gp_seq_needed</tt> field.
538 Won't wrapping of the <tt>->gp_seq</tt> field cause
543 No, because if the <tt>->gp_seq_needed</tt> field lags behind the
544 <tt>->gp_seq</tt> field, the <tt>->gp_seq_needed</tt> field
546 Modulo-arithmetic comparisons therefore will always get the
552 <h5>Quiescent-State Tracking</h5>
560 <pre>
565 </pre>
567 <p>The <tt>->qsmask</tt> field tracks which of this
574 Similarly, the <tt>->expmask</tt> field tracks which
580 much lower grace-period latency, for example, consuming a few
581 tens of microseconds worth of CPU time to reduce grace-period
583 The <tt>->qsmaskinit</tt> field tracks which of this
586 This mask is used to initialize <tt>->qsmask</tt>,
587 and <tt>->expmaskinit</tt> is used to initialize
588 <tt>->expmask</tt> and the beginning of the
600 Lockless grace-period computation! Such a tantalizing possibility!
607 <li> <font color="ffffff">CPU 0 has been in dyntick-idle
620 middle of an RCU read-side critical section, and notices
627 to report its quiescent state up the <tt>rcu_node</tt>
634 <li> <font color="ffffff">CPU 0 now reports its quiescent
637 That grace period might now end before the RCU read-side
645 grace-period sequence number in <tt>->gp_seq</tt>.
650 <h5>Blocked-Task Management</h5>
653 midst of their RCU read-side critical sections, and these tasks
656 in a separate article on RCU read-side processing.
660 <pre>
665 </pre>
667 <p>The <tt>->blkd_tasks</tt> field is a list header for
669 As tasks undergo context switches within RCU read-side critical
671 (via the <tt>task_struct</tt>'s <tt>->rcu_node_entry</tt>
672 field) onto the head of the <tt>->blkd_tasks</tt> list for the
675 As these tasks later exit their RCU read-side critical sections,
682 That pointer is stored in <tt>->gp_tasks</tt> for normal
683 grace periods and in <tt>->exp_tasks</tt> for expedited
689 removes itself from the <tt>->blkd_tasks</tt> list,
695 all hard-affinitied to the largest-numbered CPU in the system.
696 Then if task T1 blocked in an RCU read-side
698 then task T2 blocked in an RCU read-side critical section,
700 in an RCU read-side critical section, then the state of the
701 last leaf <tt>rcu_node</tt> structure's blocked-task list
712 <tt>rcu_read_unlock()</tt> that ends their RCU read-side critical
716 The <tt>->wait_blkd_tasks</tt> field indicates whether or not
722 C-preprocessor expressions as follows:
724 <pre>
791 </pre>
794 is currently limited to four, as specified by lines 21-24
796 For 32-bit systems, this allows 16*32*32*32=524,288 CPUs, which
798 For 64-bit systems, 16*64*64*64=4,194,304 CPUs is allowed, which
800 This four-level tree also allows kernels built with
807 tree permits better testing of RCU's combining-tree code.
810 are permitted at each non-leaf level of the <tt>rcu_node</tt> tree.
819 the <tt>->qsmask</tt> field on a 64-bit system, results in
821 <tt>->lock</tt> fields.
827 Lines 11-19 perform this computation.
829 </p><p>Lines 21-24 compute the maximum number of CPUs supported by
830 a single-level (which contains a single <tt>rcu_node</tt> structure),
831 two-level, three-level, and four-level <tt>rcu_node</tt> tree,
839 C-preprocessor variables, respectively.
841 </p><p>These variables are used to control the C-preprocessor <tt>#if</tt>
842 statement spanning lines 26-66 that computes the number of
846 C-preprocessor variable by lines 27, 35, 44, and 54.
854 46-47, and 56-58.
855 Lines 31-33, 40-42, 50-52, and 62-63 create initializers
856 for lockdep lock-class names.
857 Finally, lines 64-66 produce an error if the maximum number of
866 <pre>
880 </pre>
891 grace period is current, hence the <tt>->gp_seq</tt> field.
899 The <tt>->head</tt> pointer references the first callback or
902 Each element of the <tt>->tails[]</tt> array references the
903 <tt>->next</tt> pointer of the last callback in the corresponding
904 segment of the list, or the list's <tt>->head</tt> pointer if
907 not empty, then the array element is identical to its predecessor.
910 This relationship between the <tt>->head</tt> pointer, the
911 <tt>->tails[]</tt> array, and the callbacks is shown in this
916 </p><p>In this figure, the <tt>->head</tt> pointer references the
919 The <tt>->tails[RCU_DONE_TAIL]</tt> array element references
920 the <tt>->head</tt> pointer itself, indicating that none
922 The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback
923 CB 2's <tt>->next</tt> pointer, which indicates that
927 The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element
928 references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt>
931 The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references
932 CB 4's <tt>->next</tt> pointer, indicating that all the
935 Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element
936 always references the last RCU callback's <tt>->next</tt> pointer
938 the <tt>->head</tt> pointer.
942 <tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt>
953 </p><p>The <tt>->gp_seq[]</tt> array records grace-period
962 </p><p>The <tt>->len</tt> counter contains the number of
963 callbacks in <tt>->head</tt>, and the
964 <tt>->len_lazy</tt> contains the number of those callbacks that
968 <p><b>Important note</b>: It is the <tt>->len</tt> field that
970 this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt>
972 The reason for this is that all the ready-to-invoke callbacks
974 all at once at callback-invocation time.
976 high-priority process just woke up on this CPU, then the remaining
978 Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts
980 again it is the <tt>->len</tt> count that accurately reflects whether
983 Of course, off-CPU sampling of the <tt>->len</tt> count requires
991 <p>The <tt>rcu_data</tt> maintains the per-CPU state for the
996 focus of quiescent-state detection and RCU callback queuing.
997 It also tracks its relationship to the corresponding leaf
998 <tt>rcu_node</tt> structure to allow more-efficient
1002 copy of the grace-period information to allow for-free
1005 Finally, this structure records past dyntick-idle state
1016 <pre>
1023 </pre>
1025 <p>The <tt>->cpu</tt> field contains the number of the
1026 corresponding CPU, the <tt>->rsp</tt> pointer references
1029 and the <tt>->mynode</tt> field references the corresponding
1031 The <tt>->mynode</tt> is used to propagate quiescent states
1033 <p>The <tt>->dynticks</tt> pointer references the
1036 Recall that a single per-CPU instance of the <tt>rcu_dynticks</tt>
1041 </p><p>The <tt>->grpmask</tt> field indicates the bit in
1042 the <tt>->mynode->qsmask</tt> corresponding to this
1045 The <tt>->beenonline</tt> flag is set whenever the corresponding
1049 <h5>Quiescent-State and Grace-Period Tracking</h5>
1054 <pre>
1061 </pre>
1063 <p>The <tt>->gp_seq</tt> and <tt>->gp_seq_needed</tt>
1069 arbitrarily far behind for CPUs in dyntick-idle mode (but these counters
1070 will catch up upon exit from dyntick-idle mode).
1072 <tt>->gp_seq</tt> are zero, then this <tt>rcu_data</tt>
1089 to carefully manage the numbers on a per-node basis.
1097 <p>The <tt>->cpu_no_qs</tt> flag indicates that the
1099 while the <tt>->core_needs_qs</tt> flag indicates that the
1101 The <tt>->gpwrap</tt> field indicates that the corresponding
1104 will cause the CPU to disregard the values of its counters on
1105 its next exit from idle.
1113 <p>In the absence of CPU-hotplug events, RCU callbacks are invoked by
1115 This is strictly a cache-locality optimization: callbacks can and
1124 <pre>
1133 </pre>
1135 <p>The <tt>->cblist</tt> structure is the segmented callback list
1137 The CPU advances the callbacks in its <tt>rcu_data</tt> structure
1140 that the value of its <tt>rcu_data</tt> structure's
1141 <tt>->gp_seq</tt> field differs from that of its leaf
1144 <tt>->gp_seq</tt> field is updated at the beginnings and ends of each
1148 The <tt>->qlen_last_fqs_check</tt> and
1149 <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent
1153 </p><p>The <tt>->n_cbs_invoked</tt>,
1154 <tt>->n_cbs_orphaned</tt>, and <tt>->n_cbs_adopted</tt>
1158 The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks
1162 Finally, the <tt>->blimit</tt> counter is the maximum number of
1165 <h5>Dyntick-Idle Handling</h5>
1170 <pre>
1173 </pre>
1175 The <tt>->dynticks_snap</tt> field is used to take a snapshot
1176 of the corresponding CPU's dyntick-idle state when forcing
1178 Finally, the <tt>->dynticks_fqs</tt> field is used to
1180 dyntick-idle state, and is used for tracing and debugging purposes.
1185 <p>The <tt>rcu_dynticks</tt> maintains the per-CPU dyntick-idle state
1191 Its fields are as follows:
1193 <pre>
1200 </pre>
1202 <p>The <tt>->dynticks_nesting</tt> field counts the
1205 NMIs, irqs, and tracers are counted by the <tt>->dynticks_nmi_nesting</tt>
1211 <tt>->dynticks_nmi_nesting</tt> value of nine.
1213 of reasons why this CPU cannot be permitted to enter dyntick-idle
1214 mode, aside from process-level transitions.
1216 <p>However, it turns out that when running in non-idle kernel context,
1219 Therefore, whenever the <tt>->dynticks_nesting</tt> field is
1220 incremented up from zero, the <tt>->dynticks_nmi_nesting</tt> field
1222 <tt>->dynticks_nesting</tt> field is decremented down to zero,
1223 the the <tt>->dynticks_nmi_nesting</tt> field is set to zero.
1226 <tt>->dynticks_nmi_nesting</tt> field every time the corresponding
1229 </p><p>The <tt>->dynticks</tt> field counts the corresponding
1230 CPU's transitions to and from dyntick-idle mode, so that this counter
1231 has an even value when the CPU is in dyntick-idle mode and an odd
1234 </p><p>The <tt>->rcu_need_heavy_qs</tt> field is used
1237 it is willing to call for heavy-weight dyntick-counter operations.
1238 This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
1241 </p><p>The <tt>->rcu_qs_ctr</tt> field is used to record
1244 must be quite lightweight, as in a non-atomic increment of this
1245 per-CPU field.
1247 </p><p>Finally, the <tt>->rcu_urgent_qs</tt> field is used to record
1251 This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
1252 code, which, if nothing else, non-atomically increment <tt>->rcu_qs_ctr</tt>
1259 Why not simply combine the <tt>->dynticks_nesting</tt>
1260 and <tt>->dynticks_nmi_nesting</tt> counters into a
1262 the corresponding CPU is non-idle?
1268 from a made-up interrupt.
1273 <p>Additional fields are present for some special-purpose
1280 These structures are normally embedded within RCU-protected data
1287 <pre>
1290 </pre>
1292 <p>The <tt>->next</tt> field is used
1295 The <tt>->func</tt> field is a pointer to the function
1299 However, <tt>kfree_rcu()</tt> uses the <tt>->func</tt>
1301 structure within the enclosing RCU-protected data structure.
1311 Given that the callback function <tt>->func</tt>
1314 enclosing RCU-protected data structure?
1319 type of RCU-protected data structure.
1321 macro in the Linux kernel (or other pointer-manipulation facilities
1328 <h3><a name="RCU-Specific Fields in the task_struct Structure">
1329 RCU-Specific Fields in the <tt>task_struct</tt> Structure</a></h3>
1334 <pre>
1347 </pre>
1349 <p>The <tt>->rcu_read_lock_nesting</tt> field records the
1350 nesting level for RCU read-side critical sections, and
1351 the <tt>->rcu_read_unlock_special</tt> field is a bitmask
1354 The <tt>->rcu_node_entry</tt> field is used to form lists of
1355 tasks that have blocked within preemptible-RCU read-side critical
1356 sections and the <tt>->rcu_blocked_node</tt> field references
1358 or <tt>NULL</tt> if it is not blocked within a preemptible-RCU
1359 read-side critical section.
1361 <p>The <tt>->rcu_tasks_nvcsw</tt> field tracks the number of
1363 beginning of the current tasks-RCU grace period,
1364 <tt>->rcu_tasks_holdout</tt> is set if the current tasks-RCU
1365 grace period is waiting on this task, <tt>->rcu_tasks_holdout_list</tt>
1367 and <tt>->rcu_tasks_idle_cpu</tt> tracks which CPU this
1379 <pre>
1382 3 return &rsp->node[0];
1386 7 for ((rnp) = &(rsp)->node[0]; \
1387 8 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
1390 11 for ((rnp) = &(rsp)->node[0]; \
1391 12 (rnp) < (rsp)->level[NUM_RCU_LVLS - 1]; (rnp)++)
1394 15 for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
1395 16 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
1396 </pre>
1400 <tt>->node[]</tt> array, which is the root <tt>rcu_node</tt>
1406 <tt>->node[]</tt> array, performing a breadth-first traversal by
1425 In the single-node case,
1426 <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> is a no-op
1438 Finally, in <tt>CONFIG_NO_HZ_IDLE</tt> kernels, each CPU's dyntick-idle
1449 for helping me get this document into a more human-readable state.