static-keys.txt - OpenGrok cross reference for /kernel/linux/linux-4.19/Documentation/static-keys.txt

Lines Matching +full:key +full:- +full:code
19 	DEFINE_STATIC_KEY_TRUE(key);
20 	DEFINE_STATIC_KEY_FALSE(key);
30 performance-sensitive fast-path kernel code, via a GCC feature and a code
33 	DEFINE_STATIC_KEY_FALSE(key);
37         if (static_branch_unlikely(&key))
38                 do unlikely code
40                 do likely code
43 	static_branch_enable(&key);
45 	static_branch_disable(&key);
48 The static_branch_unlikely() branch will be generated into the code with as little
49 impact to the likely code path as possible.
65 kernel code paths should be able to make use of the static keys facility.
74 http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01556.html
77 by default, without the need to check memory. Then, at run-time, we can patch
82 	if (static_branch_unlikely(&key))
85 Thus, by default the 'printk' will not be emitted. And the code generated will
86 consist of a single atomic 'no-op' instruction (5 bytes on x86), in the
87 straight-line code path. When the branch is 'flipped', we will patch the
88 'no-op' in the straight-line codepath with a 'jump' instruction to the
89 out-of-line true branch. Thus, changing branch direction is expensive but
96 Static key label API, usage and examples
100 In order to make use of this optimization you must first define a key::
102 	DEFINE_STATIC_KEY_TRUE(key);
106 	DEFINE_STATIC_KEY_FALSE(key);
109 The key must be global, that is, it can't be allocated on the stack or dynamically
110 allocated at run-time.
112 The key is then used in code as::
114         if (static_branch_unlikely(&key))
115                 do unlikely code
117                 do likely code
121         if (static_branch_likely(&key))
122                 do likely code
124                 do unlikely code
132 	static_branch_enable(&key);
136 	static_branch_disable(&key);
140 	static_branch_inc(&key);
142 	static_branch_dec(&key);
146 reference counting. For example, if the key is initialized true, a
149 key is initialized false, a 'static_branch_inc()', will change the branch to
160 patched). Calling the static key API from within a hotplug notifier is
180 4) Architecture level code patching interface, 'jump labels'
186 struct jump_entry table must be at least 4-byte aligned because the
187 static_key->entry field makes use of the two least significant bits.
195 * ``__always_inline bool arch_static_branch(struct static_key *key, bool branch)``,
198 * ``__always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)``,
221   +     if (static_branch_unlikely(&key))
225         pid = task_tgid_vnr(rcu_dereference(current->real_parent));
254 …f810441f0:       8b 05 8a 52 d8 00       mov    0xd8528a(%rip),%eax        # ffffffff81dc9480 <key>
276 vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched
277 to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump
280   6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes.
282 If we then include the padding bytes, the jump label code saves, 16 total bytes
283 of instruction memory for this small function. In this case the non-jump label
285 footprint. We can in fact improve this even further, since the 5-byte no-op
286 really can be a 2-byte no-op since we can reach the branch with a 2-byte jmp.
287 However, we have not yet implemented optimal no-op sizes (they are currently
288 hard-coded).
290 Since there are a number of static key API uses in the scheduler paths,
291 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the
292 performance improvement. Testing done on 3.3.0-rc2:
296  Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):
298         855.700314 task-clock                #    0.534 CPUs utilized            ( +-  0.11% )
299            200,003 context-switches          #    0.234 M/sec                    ( +-  0.00% )
300                  0 CPU-migrations            #    0.000 M/sec                    ( +- 39.58% )
301                487 page-faults               #    0.001 M/sec                    ( +-  0.02% )
302      1,474,374,262 cycles                    #    1.723 GHz                      ( +-  0.17% )
303    <not supported> stalled-cycles-frontend
304    <not supported> stalled-cycles-backend
305      1,178,049,567 instructions              #    0.80  insns per cycle          ( +-  0.06% )
306        208,368,926 branches                  #  243.507 M/sec                    ( +-  0.06% )
307          5,569,188 branch-misses             #    2.67% of all branches          ( +-  0.54% )
309        1.601607384 seconds time elapsed                                          ( +-  0.07% )
313  Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):
315         841.043185 task-clock                #    0.533 CPUs utilized            ( +-  0.12% )
316            200,004 context-switches          #    0.238 M/sec                    ( +-  0.00% )
317                  0 CPU-migrations            #    0.000 M/sec                    ( +- 40.87% )
318                487 page-faults               #    0.001 M/sec                    ( +-  0.05% )
319      1,432,559,428 cycles                    #    1.703 GHz                      ( +-  0.18% )
320    <not supported> stalled-cycles-frontend
321    <not supported> stalled-cycles-backend
322      1,175,363,994 instructions              #    0.82  insns per cycle          ( +-  0.04% )
323        206,859,359 branches                  #  245.956 M/sec                    ( +-  0.04% )
324          4,884,119 branch-misses             #    2.36% of all branches          ( +-  0.85% )
329 'branch-misses'. This is where we would expect to get the most savings, since