scaling.rst - OpenGrok cross reference for /kernel/linux/linux-5.10/Documentation/networking/scaling.rst

Lines Matching refs:CPU
60 for each CPU if the device supports enough queues, or otherwise at least
77 this to notify a CPU when new packets arrive on the given queue. The
79 that can route each interrupt to a particular CPU. The active mapping
81 an IRQ may be handled on any CPU. Because a non-negligible part of packet
98 receive queue overflows due to a saturated CPU, because in default
104 a separate CPU. For interrupt handling, HT has shown no benefit in
105 initial tests, so limit the number of queues to the number of CPU cores
114 Whereas RSS selects the queue and hence CPU that will run the hardware
115 interrupt handler, RPS selects the CPU to perform protocol processing
117 on the desired CPU’s backlog queue and waking up the CPU for processing.
130 The first step in determining the target CPU for RPS is to calculate a
143 of the list. The indexed CPU is the target for processing the packet,
144 and the packet is queued to the tail of that CPU’s backlog queue. At
147 processing on the remote CPU, and any queued packets are then processed
163 CPU. Documentation/core-api/irq/irq-affinity.rst explains how CPUs are assigned to
172 CPU. If NUMA locality is not an issue, this could also be all CPUs in
174 interrupting CPU from the map since that already performs much work.
177 receive queue is mapped to each CPU, then RPS is probably redundant
180 share the same memory domain as the interrupting CPU for that queue.
188 to the same CPU is CPU load imbalance if flows vary in packet rate.
195 during CPU contention by dropping packets from large flows slightly
197 destination CPU approaches saturation.  Once a CPU's input packet
213 turned on. It is implemented for each CPU independently (to avoid lock
214 and cache contention) and toggled per CPU by setting the relevant bit
215 in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU
222 the same that selects a CPU in RPS, but as the number of buckets can
237 where a single connection taking up 50% of a CPU indicates a problem.
254 kernel processing of packets to the CPU where the application thread
256 to enqueue packets onto the backlog of another CPU and to wake up that
257 CPU.
263 The CPU recorded in each entry is the one which last processed the flow.
264 If an entry does not hold a valid CPU, then packets mapped to that entry
266 same CPU. Indeed, with many flows and few CPUs, it is very likely that
269 rps_sock_flow_table is a global flow table that contains the *desired* CPU
270 for flows: the CPU that is currently processing the flow in userspace.
271 Each table value is a CPU index that is updated during calls to recvmsg
275 When the scheduler moves a thread to a new CPU while it has outstanding
276 receive packets on the old CPU, packets may arrive out of order. To
279 receive queue of each device. Each table value stores a CPU index and a
280 counter. The CPU index represents the *current* CPU onto which packets
282 and userspace processing occur on the same CPU, and hence the CPU index
285 enqueued for kernel processing on the old CPU.
288 CPU's backlog when a packet in this flow was last enqueued. Each backlog
292 been enqueued onto the currently designated CPU for flow i (of course,
297 CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table
299 are compared. If the desired CPU for the flow (found in the
300 rps_sock_flow table) matches the current CPU (found in the rps_dev_flow
301 table), the packet is enqueued onto that CPU’s backlog. If they differ,
302 the current CPU is updated to match the desired CPU if one of the
305   - The current CPU's queue head counter >= the recorded tail counter
307   - The current CPU is unset (>= nr_cpu_ids)
308   - The current CPU is offline
311 CPU. These rules aim to ensure that a flow only moves to a new CPU when
312 there are no packets outstanding on the old CPU, as the outstanding
314 CPU.
357 directly to a CPU local to the thread consuming the data. The target CPU
358 will either be the same CPU where the application runs, or at least a CPU
359 which is local to the application thread’s CPU in the cache hierarchy.
368 The hardware queue for a flow is derived from the CPU recorded in
369 rps_dev_flow_table. The stack consults a CPU to hardware queue map which
372 functions in the cpu_rmap (“CPU affinity reverse map”) kernel library
373 to populate the map. For each CPU, the corresponding queue in the map is
374 set to be one whose processing CPU is closest in cache locality.
383 of CPU to queues is automatically deduced from the IRQ affinities
401 a mapping of CPU to hardware queue(s) or a mapping of receive queue(s)
408 these queues are processed on a CPU within this set. This choice
411 (contention can be eliminated completely if each CPU has its own
424 associating a given CPU to a given application thread. The application
429 in keeping the CPU overhead low. Transmit completion work is locked into
431 avoids the overhead of triggering an interrupt on another CPU. When the
444 running CPU as a key into the CPU-to-queue lookup table. If the
486 system, XPS is preferably configured so that each CPU maps onto one queue.
488 queue can also map onto one CPU, resulting in exclusive pairings that
491 with the CPU that processes transmit completions for that queue