• Home
  • Raw
  • Download

Lines Matching +full:lock +full:- +full:offset

1 .. SPDX-License-Identifier: GPL-2.0
22 +-----------+ +-----------+
24 +-----------+ +-----------+
28 +----------------------+ +----------------------+
30 +----------------------+ +----------------------+
32 ---------------------------+------------------+-----------------------------
34 +----------------------+
36 +----------------------+
38 +----------------------+
47 There are many real-world cases of performance regressions caused by
66 cache hot and save cacheline/TLB, like a lock and the data protected
68 not work when the lock is heavily contended, as the lock owner CPU
69 could write to the data, while other CPUs are busy spinning the lock.
74 * lock (spinlock/mutex/semaphore) and data protected by it are
83 Following 'mitigation' section provides real-world examples.
94 once hotspots are detected, tools like 'perf-c2c' and 'pahole' can
99 perf-c2c can capture the cache lines with most false sharing hits,
101 and in-line offset of the data. Simple commands are::
103 $ perf c2c record -ag sleep 3
104 $ perf c2c report --call-graph none -k vmlinux
106 When running above during testing will-it-scale's tlb_flush1 case,
115 #----------------------------------------------------------------------
117 #----------------------------------------------------------------------
124 A nice introduction for perf-c2c is [3]_.
127 granularity. Users can match the offset in perf-c2c output with
137 unnecessary to hyper-optimize every rarely used data structure or
150 - Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated")
156 - Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing")
159 Like for some global variable, use compare(read)-then-write instead
170- Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false shari…
171 - Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP")
173 * Turn hot global data to 'per-cpu data + global data' when possible,
174 or reasonably increase the threshold for syncing per-cpu data to
177 - Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses")
178 - Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
185 * Group mostly read-only fields together
205 .. [2] https://lore.kernel.org/lkml/CAHk-=whoqV=cX5VC80mmR9rr+Z+yQ6fiQZm36Fb-izsanHg23w@mail.gmail.…
206 .. [3] https://joemario.github.io/blog/2016/09/01/c2c-blog/