Lines Matching +full:line +full:- +full:name
1 .. SPDX-License-Identifier: GPL-2.0
10 coherence of one cache line stored in multiple CPU's caches; then
17 char name[16];
20 Member 'refcount'(A) and 'name'(B) _share_ one cache line like below::
22 +-----------+ +-----------+
24 +-----------+ +-----------+
28 +----------------------+ +----------------------+
30 +----------------------+ +----------------------+
32 ---------------------------+------------------+-----------------------------
34 +----------------------+
36 +----------------------+
38 +----------------------+
40 'refcount' is modified frequently, but 'name' is set once at object
43 and 'name' being read by other CPUs, all those reading CPUs have to
44 reload the whole cache line over and over due to the 'sharing', even
45 though 'name' is never changed.
47 There are many real-world cases of performance regressions caused by
49 mm_struct struct, whose cache line layout change triggered a
65 members could be purposely put in the same cache line to make them
75 purposely put in one cache line.
76 * global data being put together in one cache line. Some kernel
78 which can easily be grouped together and put into one cache line.
80 without being noticed (cache line is usually 64 bytes or more),
83 Following 'mitigation' section provides real-world examples.
94 once hotspots are detected, tools like 'perf-c2c' and 'pahole' can
99 perf-c2c can capture the cache lines with most false sharing hits,
100 decoded functions (line number of file) accessing that cache line,
101 and in-line offset of the data. Simple commands are::
103 $ perf c2c record -ag sleep 3
104 $ perf c2c report --call-graph none -k vmlinux
106 When running above during testing will-it-scale's tlb_flush1 case,
115 #----------------------------------------------------------------------
117 #----------------------------------------------------------------------
124 A nice introduction for perf-c2c is [3]_.
126 'pahole' decodes data structure layouts delimited in cache line
127 granularity. Users can match the offset in perf-c2c output with
137 unnecessary to hyper-optimize every rarely used data structure or
146 * Separate hot global data in its own dedicated cache line, even if it
148 cache line and TLB entries.
150 - Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated")
156 - Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing")
159 Like for some global variable, use compare(read)-then-write instead
170 …- Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false shari…
171 - Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP")
173 * Turn hot global data to 'per-cpu data + global data' when possible,
174 or reasonably increase the threshold for syncing per-cpu data to
177 - Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses")
178 - Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
184 * Be aware of cache line boundaries
185 * Group mostly read-only fields together
201 line sharing of data members.
205 .. [2] https://lore.kernel.org/lkml/CAHk-=whoqV=cX5VC80mmR9rr+Z+yQ6fiQZm36Fb-izsanHg23w@mail.gmail.…
206 .. [3] https://joemario.github.io/blog/2016/09/01/c2c-blog/