1================================ 2Documentation for /proc/sys/net/ 3================================ 4 5Copyright 6 7Copyright (c) 1999 8 9 - Terrehon Bowden <terrehon@pacbell.net> 10 - Bodo Bauer <bb@ricochet.net> 11 12Copyright (c) 2000 13 14 - Jorge Nerin <comandante@zaralinux.com> 15 16Copyright (c) 2009 17 18 - Shen Feng <shen@cn.fujitsu.com> 19 20For general info and legal blurb, please look in index.rst. 21 22------------------------------------------------------------------------------ 23 24This file contains the documentation for the sysctl files in 25/proc/sys/net 26 27The interface to the networking parts of the kernel is located in 28/proc/sys/net. The following table shows all possible subdirectories. You may 29see only some of them, depending on your kernel's configuration. 30 31 32Table : Subdirectories in /proc/sys/net 33 34 ========= =================== = ========== =================== 35 Directory Content Directory Content 36 ========= =================== = ========== =================== 37 802 E802 protocol mptcp Multipath TCP 38 appletalk Appletalk protocol netfilter Network Filter 39 ax25 AX25 netrom NET/ROM 40 bridge Bridging rose X.25 PLP layer 41 core General parameter tipc TIPC 42 ethernet Ethernet protocol unix Unix domain sockets 43 ipv4 IP version 4 x25 X.25 protocol 44 ipv6 IP version 6 45 ========= =================== = ========== =================== 46 471. /proc/sys/net/core - Network core options 48============================================ 49 50bpf_jit_enable 51-------------- 52 53This enables the BPF Just in Time (JIT) compiler. BPF is a flexible 54and efficient infrastructure allowing to execute bytecode at various 55hook points. It is used in a number of Linux kernel subsystems such 56as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints) 57and security (e.g. seccomp). LLVM has a BPF back end that can compile 58restricted C into a sequence of BPF instructions. After program load 59through bpf(2) and passing a verifier in the kernel, a JIT will then 60translate these BPF proglets into native CPU instructions. There are 61two flavors of JITs, the newer eBPF JIT currently supported on: 62 63 - x86_64 64 - x86_32 65 - arm64 66 - arm32 67 - ppc64 68 - ppc32 69 - sparc64 70 - mips64 71 - s390x 72 - riscv64 73 - riscv32 74 75And the older cBPF JIT supported on the following archs: 76 77 - mips 78 - sparc 79 80eBPF JITs are a superset of cBPF JITs, meaning the kernel will 81migrate cBPF instructions into eBPF instructions and then JIT 82compile them transparently. Older cBPF JITs can only translate 83tcpdump filters, seccomp rules, etc, but not mentioned eBPF 84programs loaded through bpf(2). 85 86Values: 87 88 - 0 - disable the JIT (default value) 89 - 1 - enable the JIT 90 - 2 - enable the JIT and ask the compiler to emit traces on kernel log. 91 92bpf_jit_harden 93-------------- 94 95This enables hardening for the BPF JIT compiler. Supported are eBPF 96JIT backends. Enabling hardening trades off performance, but can 97mitigate JIT spraying. 98 99Values: 100 101 - 0 - disable JIT hardening (default value) 102 - 1 - enable JIT hardening for unprivileged users only 103 - 2 - enable JIT hardening for all users 104 105bpf_jit_kallsyms 106---------------- 107 108When BPF JIT compiler is enabled, then compiled images are unknown 109addresses to the kernel, meaning they neither show up in traces nor 110in /proc/kallsyms. This enables export of these addresses, which can 111be used for debugging/tracing. If bpf_jit_harden is enabled, this 112feature is disabled. 113 114Values : 115 116 - 0 - disable JIT kallsyms export (default value) 117 - 1 - enable JIT kallsyms export for privileged users only 118 119bpf_jit_limit 120------------- 121 122This enforces a global limit for memory allocations to the BPF JIT 123compiler in order to reject unprivileged JIT requests once it has 124been surpassed. bpf_jit_limit contains the value of the global limit 125in bytes. 126 127dev_weight 128---------- 129 130The maximum number of packets that kernel can handle on a NAPI interrupt, 131it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware 132aggregated packet is counted as one packet in this context. 133 134Default: 64 135 136dev_weight_rx_bias 137------------------ 138 139RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function 140of the driver for the per softirq cycle netdev_budget. This parameter influences 141the proportion of the configured netdev_budget that is spent on RPS based packet 142processing during RX softirq cycles. It is further meant for making current 143dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. 144(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based 145on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). 146 147Default: 1 148 149dev_weight_tx_bias 150------------------ 151 152Scales the maximum number of packets that can be processed during a TX softirq cycle. 153Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric 154net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. 155 156Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). 157 158Default: 1 159 160default_qdisc 161------------- 162 163The default queuing discipline to use for network devices. This allows 164overriding the default of pfifo_fast with an alternative. Since the default 165queuing discipline is created without additional parameters so is best suited 166to queuing disciplines that work well without configuration like stochastic 167fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use 168queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin 169which require setting up classes and bandwidths. Note that physical multiqueue 170interfaces still use mq as root qdisc, which in turn uses this default for its 171leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead 172default to noqueue. 173 174Default: pfifo_fast 175 176busy_read 177--------- 178 179Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) 180Approximate time in us to busy loop waiting for packets on the device queue. 181This sets the default value of the SO_BUSY_POLL socket option. 182Can be set or overridden per socket by setting socket option SO_BUSY_POLL, 183which is the preferred method of enabling. If you need to enable the feature 184globally via sysctl, a value of 50 is recommended. 185 186Will increase power usage. 187 188Default: 0 (off) 189 190busy_poll 191---------------- 192Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL) 193Approximate time in us to busy loop waiting for events. 194Recommended value depends on the number of sockets you poll on. 195For several sockets 50, for several hundreds 100. 196For more than that you probably want to use epoll. 197Note that only sockets with SO_BUSY_POLL set will be busy polled, 198so you want to either selectively set SO_BUSY_POLL on those sockets or set 199sysctl.net.busy_read globally. 200 201Will increase power usage. 202 203Default: 0 (off) 204 205rmem_default 206------------ 207 208The default setting of the socket receive buffer in bytes. 209 210rmem_max 211-------- 212 213The maximum receive socket buffer size in bytes. 214 215tstamp_allow_data 216----------------- 217Allow processes to receive tx timestamps looped together with the original 218packet contents. If disabled, transmit timestamp requests from unprivileged 219processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. 220 221Default: 1 (on) 222 223 224wmem_default 225------------ 226 227The default setting (in bytes) of the socket send buffer. 228 229wmem_max 230-------- 231 232The maximum send socket buffer size in bytes. 233 234message_burst and message_cost 235------------------------------ 236 237These parameters are used to limit the warning messages written to the kernel 238log from the networking code. They enforce a rate limit to make a 239denial-of-service attack impossible. A higher message_cost factor, results in 240fewer messages that will be written. Message_burst controls when messages will 241be dropped. The default settings limit warning messages to one every five 242seconds. 243 244warnings 245-------- 246 247This sysctl is now unused. 248 249This was used to control console messages from the networking stack that 250occur because of problems on the network like duplicate address or bad 251checksums. 252 253These messages are now emitted at KERN_DEBUG and can generally be enabled 254and controlled by the dynamic_debug facility. 255 256netdev_budget 257------------- 258 259Maximum number of packets taken from all interfaces in one polling cycle (NAPI 260poll). In one polling cycle interfaces which are registered to polling are 261probed in a round-robin manner. Also, a polling cycle may not exceed 262netdev_budget_usecs microseconds, even if netdev_budget has not been 263exhausted. 264 265netdev_budget_usecs 266--------------------- 267 268Maximum number of microseconds in one NAPI polling cycle. Polling 269will exit when either netdev_budget_usecs have elapsed during the 270poll cycle or the number of packets processed reaches netdev_budget. 271 272netdev_max_backlog 273------------------ 274 275Maximum number of packets, queued on the INPUT side, when the interface 276receives packets faster than kernel can process them. 277 278netdev_rss_key 279-------------- 280 281RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is 282randomly generated. 283Some user space might need to gather its content even if drivers do not 284provide ethtool -x support yet. 285 286:: 287 288 myhost:~# cat /proc/sys/net/core/netdev_rss_key 289 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) 290 291File contains nul bytes if no driver ever called netdev_rss_key_fill() function. 292 293Note: 294 /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, 295 but most drivers only use 40 bytes of it. 296 297:: 298 299 myhost:~# ethtool -x eth0 300 RX flow hash indirection table for eth0 with 8 RX ring(s): 301 0: 0 1 2 3 4 5 6 7 302 RSS hash key: 303 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 304 305netdev_tstamp_prequeue 306---------------------- 307 308If set to 0, RX packet timestamps can be sampled after RPS processing, when 309the target CPU processes packets. It might give some delay on timestamps, but 310permit to distribute the load on several cpus. 311 312If set to 1 (default), timestamps are sampled as soon as possible, before 313queueing. 314 315netdev_unregister_timeout_secs 316------------------------------ 317 318Unregister network device timeout in seconds. 319This option controls the timeout (in seconds) used to issue a warning while 320waiting for a network device refcount to drop to 0 during device 321unregistration. A lower value may be useful during bisection to detect 322a leaked reference faster. A larger value may be useful to prevent false 323warnings on slow/loaded systems. 324Default value is 10, minimum 1, maximum 3600. 325 326optmem_max 327---------- 328 329Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence 330of struct cmsghdr structures with appended data. 331 332fb_tunnels_only_for_init_net 333---------------------------- 334 335Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0, 336sit0, ip6tnl0, ip6gre0) are automatically created. There are 3 possibilities 337(a) value = 0; respective fallback tunnels are created when module is 338loaded in every net namespaces (backward compatible behavior). 339(b) value = 1; [kcmd value: initns] respective fallback tunnels are 340created only in init net namespace and every other net namespace will 341not have them. 342(c) value = 2; [kcmd value: none] fallback tunnels are not created 343when a module is loaded in any of the net namespace. Setting value to 344"2" is pointless after boot if these modules are built-in, so there is 345a kernel command-line option that can change this default. Please refer to 346Documentation/admin-guide/kernel-parameters.txt for additional details. 347 348Not creating fallback tunnels gives control to userspace to create 349whatever is needed only and avoid creating devices which are redundant. 350 351Default : 0 (for compatibility reasons) 352 353devconf_inherit_init_net 354------------------------ 355 356Controls if a new network namespace should inherit all current 357settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By 358default, we keep the current behavior: for IPv4 we inherit all current 359settings from init_net and for IPv6 we reset all settings to default. 360 361If set to 1, both IPv4 and IPv6 settings are forced to inherit from 362current ones in init_net. If set to 2, both IPv4 and IPv6 settings are 363forced to reset to their default values. If set to 3, both IPv4 and IPv6 364settings are forced to inherit from current ones in the netns where this 365new netns has been created. 366 367Default : 0 (for compatibility reasons) 368 3692. /proc/sys/net/unix - Parameters for Unix domain sockets 370---------------------------------------------------------- 371 372There is only one file in this directory. 373unix_dgram_qlen limits the max number of datagrams queued in Unix domain 374socket's buffer. It will not take effect unless PF_UNIX flag is specified. 375 376 3773. /proc/sys/net/ipv4 - IPV4 settings 378------------------------------------- 379Please see: Documentation/networking/ip-sysctl.rst and 380Documentation/admin-guide/sysctl/net.rst for descriptions of these entries. 381 382 3834. Appletalk 384------------ 385 386The /proc/sys/net/appletalk directory holds the Appletalk configuration data 387when Appletalk is loaded. The configurable parameters are: 388 389aarp-expiry-time 390---------------- 391 392The amount of time we keep an ARP entry before expiring it. Used to age out 393old hosts. 394 395aarp-resolve-time 396----------------- 397 398The amount of time we will spend trying to resolve an Appletalk address. 399 400aarp-retransmit-limit 401--------------------- 402 403The number of times we will retransmit a query before giving up. 404 405aarp-tick-time 406-------------- 407 408Controls the rate at which expires are checked. 409 410The directory /proc/net/appletalk holds the list of active Appletalk sockets 411on a machine. 412 413The fields indicate the DDP type, the local address (in network:node format) 414the remote address, the size of the transmit pending queue, the size of the 415received queue (bytes waiting for applications to read) the state and the uid 416owning the socket. 417 418/proc/net/atalk_iface lists all the interfaces configured for appletalk.It 419shows the name of the interface, its Appletalk address, the network range on 420that address (or network number for phase 1 networks), and the status of the 421interface. 422 423/proc/net/atalk_route lists each known network route. It lists the target 424(network) that the route leads to, the router (may be directly connected), the 425route flags, and the device the route is using. 426 4275. TIPC 428------- 429 430tipc_rmem 431--------- 432 433The TIPC protocol now has a tunable for the receive memory, similar to the 434tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) 435 436:: 437 438 # cat /proc/sys/net/tipc/tipc_rmem 439 4252725 34021800 68043600 440 # 441 442The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values 443are scaled (shifted) versions of that same value. Note that the min value 444is not at this point in time used in any meaningful way, but the triplet is 445preserved in order to be consistent with things like tcp_rmem. 446 447named_timeout 448------------- 449 450TIPC name table updates are distributed asynchronously in a cluster, without 451any form of transaction handling. This means that different race scenarios are 452possible. One such is that a name withdrawal sent out by one node and received 453by another node may arrive after a second, overlapping name publication already 454has been accepted from a third node, although the conflicting updates 455originally may have been issued in the correct sequential order. 456If named_timeout is nonzero, failed topology updates will be placed on a defer 457queue until another event arrives that clears the error, or until the timeout 458expires. Value is in milliseconds. 459