scaling.rst - OpenGrok cross reference for /Documentation/networking/scaling.rst

Lines Matching +full:no +full:- +full:poll +full:- +full:on +full:- +full:init
1 .. SPDX-License-Identifier: GPL-2.0
13 multi-processor systems.
17 - RSS: Receive Side Scaling
18 - RPS: Receive Packet Steering
19 - RFS: Receive Flow Steering
20 - Accelerated Receive Flow Steering
21 - XPS: Transmit Packet Steering
28 (multi-queue). On reception, a NIC can send different packets to different
33 generally known as “Receive-side Scaling” (RSS). The goal of RSS and
35 Multi-queue distribution can also be used for traffic prioritization, but
39 and/or transport layer headers-- for example, a 4-tuple hash over
41 implementation of RSS uses a 128-entry indirection table where each entry
51 both directions of the flow to land on the same Rx queue (and CPU). The
52 "Symmetric-XOR" is a type of RSS algorithms that achieves this hash
62 Some advanced NICs allow steering packets to queues based on
64 can be directed to their own receive queue. Such “n-tuple” filters can
65 be configured from ethtool (--config-ntuple).
69 -----------------
71 The driver for a multi-queue capable NIC typically provides a kernel
83 commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the
92 this to notify a CPU when new packets arrive on the given queue. The
93 signaling path for PCIe devices uses message signaled interrupts (MSI-X),
96 an IRQ may be handled on any CPU. Because a non-negligible part of packet
99 affinity of each interrupt see Documentation/core-api/irq/irq-affinity.rst. Some systems
111 NIC maximum, if lower). The most efficient high-rate configuration
112 is likely the one with the smallest number of receive queues where no
117 Per-cpu load can be observed using the mpstat utility, but note that on
119 a separate CPU. For interrupt handling, HT has shown no benefit in
126 Modern NICs support creating multiple co-existing RSS configurations
127 which are selected based on explicit matching rules. This can be very
135   # ethtool -X eth0 hfunc toeplitz context new
142   # ethtool -x eth0 context 1
147   # ethtool -X eth0 equal 2 context 1
148   # ethtool -x eth0 context 1
154 To make use of the new context direct traffic to it using an n-tuple
157   # ethtool -N eth0 flow-type tcp6 dst-port 22 context 1
162   # ethtool -N eth0 delete 1023
163   # ethtool -X eth0 context 1 delete
174 on the desired CPU’s backlog queue and waking up the CPU for processing.
180    introduce inter-processor interrupts (IPIs))
188 flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash
189 depending on the protocol). This serves as a consistent hash of the
194 skb->hash and can be used elsewhere in the stack as a hash of the
204 processing on the remote CPU, and any queued packets are then processed
209 -----------------
211 RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on
216   /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
219 (the default), in which case packets are processed on the interrupting
220 CPU. Documentation/core-api/irq/irq-affinity.rst explains how CPUs are assigned to
233 For a multi-queue system, if RSS is configured so that a hardware
241 --------------
244 reordering. The trade-off to sending all packets from the same flow
246 In the extreme case a single flow dominates traffic. Especially on
256 net.core.netdev_max_backlog), the kernel starts a per-flow packet
261 No packets are dropped when the input packet queue length is below
270 turned on. It is implemented for each CPU independently (to avoid lock
277 Per-flow rate is calculated by hashing each packet into a hashtable
278 bucket and incrementing a per-bucket counter. The hash function is
280 be much larger than the number of CPUs, flow limit has finer-grained
293 Flow limit is useful on systems with many concurrent connections,
295 In such environments, enable the feature on all CPUs that handle
298 The feature depends on the input packet queue length to exceed
307 While RPS steers packets solely based on hash, and thus generally
312 consuming the packet is running. RFS relies on the same RPS mechanisms
333 receive packets on the old CPU, packets may arrive out of order. To
339 and userspace processing occur on the same CPU, and hence the CPU index
342 enqueued for kernel processing on the old CPU.
346 queue has a head counter that is incremented on dequeue. A tail counter
355 and the rps_dev_flow table of the queue that the packet was received on
362   - The current CPU's queue head counter >= the recorded tail counter
364   - The current CPU is unset (>= nr_cpu_ids)
365   - The current CPU is offline
369 there are no packets outstanding on the old CPU, as the outstanding
370 packets could arrive later than those about to be processed on the new
375 -----------------
377 RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on
383 The number of entries in the per-queue flow table are set through::
385   /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt
393 suggested flow count depends on the expected number of active connections
396 works fairly well on a moderately loaded server.
400 For a multi-queue device, the rps_flow_cnt for each queue might be
410 Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load
411 balancing mechanism that uses soft state to steer flows based on where
427 is maintained by the NIC driver. This is an auto-generated reverse map of
435 -----------------------------
441 configured for each receive queue by the driver, so no additional
456 which transmit queue to use when transmitting a packet on a multi-queue
465 these queues are processed on a CPU within this set. This choice
466 provides two benefits. First, contention on the device queue lock is
469 transmit queue). Secondly, cache miss rate on transmit completion is
475 This mapping is used to pick transmit queue based on the receive
479 on the same queue associations for transmit and receive. This is useful for
480 busy polling multi-threaded workloads where there are challenges in
483 received on a single queue. The receive queue number is cached in the
484 socket for the connection. In this model, sending the packets on the same
487 the same queue-association that a given application is polling on. This
488 avoids the overhead of triggering an interrupt on another CPU. When the
489 application cleans up the packets during the busy poll, transmit completion
494 CPUs/receive-queues that may use that queue to transmit. The reverse
495 mapping, from CPUs to transmit queues or from receive-queues to transmit
499 for the socket connection for a match in the receive queue-to-transmit queue
501 running CPU as a key into the CPU-to-queue lookup table. If the
504 into the set. When selecting the transmit queue based on receive queue(s)
510 This transmit queue is used for subsequent packets sent on the flow to
514 skb->ooo_okay is set for a packet in the flow. This flag indicates that
515 there are no outstanding packets in the flow, so the transmit queue can
522 -----------------
524 XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
526 how, XPS is configured at device init. The mapping of CPUs/receive-queues
529 For selection based on CPUs map::
531   /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
533 For selection based on receive-queues map::
535   /sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
542 has no effect, since there is no choice in this case. In a multi-queue
546 experience no contention. If there are fewer queues than CPUs, then the
551 For transmit queue selection based on receive queue(s), XPS has to be
552 explicitly configured mapping receive-queue(s) to transmit queue(s). If the
553 user configuration for receive-queue map does not apply, then the transmit
554 queue is selected based on the CPUs map.
560 These are rate-limitation mechanisms implemented by HW, where currently
561 a max-rate attribute is supported, by setting a Mbps value to::
563   /sys/class/net/<dev>/queues/tx-<n>/tx_maxrate
579 - Tom Herbert (therbert@google.com)
580 - Willem de Bruijn (willemb@google.com)