1================================ 2Documentation for /proc/sys/net/ 3================================ 4 5Copyright 6 7Copyright (c) 1999 8 9 - Terrehon Bowden <terrehon@pacbell.net> 10 - Bodo Bauer <bb@ricochet.net> 11 12Copyright (c) 2000 13 14 - Jorge Nerin <comandante@zaralinux.com> 15 16Copyright (c) 2009 17 18 - Shen Feng <shen@cn.fujitsu.com> 19 20For general info and legal blurb, please look in index.rst. 21 22------------------------------------------------------------------------------ 23 24This file contains the documentation for the sysctl files in 25/proc/sys/net 26 27The interface to the networking parts of the kernel is located in 28/proc/sys/net. The following table shows all possible subdirectories. You may 29see only some of them, depending on your kernel's configuration. 30 31 32Table : Subdirectories in /proc/sys/net 33 34 ========= =================== = ========== ================== 35 Directory Content Directory Content 36 ========= =================== = ========== ================== 37 802 E802 protocol mptcp Multipath TCP 38 appletalk Appletalk protocol netfilter Network Filter 39 ax25 AX25 netrom NET/ROM 40 bridge Bridging rose X.25 PLP layer 41 core General parameter tipc TIPC 42 ethernet Ethernet protocol unix Unix domain sockets 43 ipv4 IP version 4 x25 X.25 protocol 44 ipv6 IP version 6 45 ========= =================== = ========== ================== 46 471. /proc/sys/net/core - Network core options 48============================================ 49 50bpf_jit_enable 51-------------- 52 53This enables the BPF Just in Time (JIT) compiler. BPF is a flexible 54and efficient infrastructure allowing to execute bytecode at various 55hook points. It is used in a number of Linux kernel subsystems such 56as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints) 57and security (e.g. seccomp). LLVM has a BPF back end that can compile 58restricted C into a sequence of BPF instructions. After program load 59through bpf(2) and passing a verifier in the kernel, a JIT will then 60translate these BPF proglets into native CPU instructions. There are 61two flavors of JITs, the newer eBPF JIT currently supported on: 62 63 - x86_64 64 - x86_32 65 - arm64 66 - arm32 67 - ppc64 68 - sparc64 69 - mips64 70 - s390x 71 - riscv64 72 - riscv32 73 74And the older cBPF JIT supported on the following archs: 75 76 - mips 77 - ppc 78 - sparc 79 80eBPF JITs are a superset of cBPF JITs, meaning the kernel will 81migrate cBPF instructions into eBPF instructions and then JIT 82compile them transparently. Older cBPF JITs can only translate 83tcpdump filters, seccomp rules, etc, but not mentioned eBPF 84programs loaded through bpf(2). 85 86Values: 87 88 - 0 - disable the JIT (default value) 89 - 1 - enable the JIT 90 - 2 - enable the JIT and ask the compiler to emit traces on kernel log. 91 92bpf_jit_harden 93-------------- 94 95This enables hardening for the BPF JIT compiler. Supported are eBPF 96JIT backends. Enabling hardening trades off performance, but can 97mitigate JIT spraying. 98 99Values: 100 101 - 0 - disable JIT hardening (default value) 102 - 1 - enable JIT hardening for unprivileged users only 103 - 2 - enable JIT hardening for all users 104 105bpf_jit_kallsyms 106---------------- 107 108When BPF JIT compiler is enabled, then compiled images are unknown 109addresses to the kernel, meaning they neither show up in traces nor 110in /proc/kallsyms. This enables export of these addresses, which can 111be used for debugging/tracing. If bpf_jit_harden is enabled, this 112feature is disabled. 113 114Values : 115 116 - 0 - disable JIT kallsyms export (default value) 117 - 1 - enable JIT kallsyms export for privileged users only 118 119bpf_jit_limit 120------------- 121 122This enforces a global limit for memory allocations to the BPF JIT 123compiler in order to reject unprivileged JIT requests once it has 124been surpassed. bpf_jit_limit contains the value of the global limit 125in bytes. 126 127dev_weight 128---------- 129 130The maximum number of packets that kernel can handle on a NAPI interrupt, 131it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware 132aggregated packet is counted as one packet in this context. 133 134Default: 64 135 136dev_weight_rx_bias 137------------------ 138 139RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function 140of the driver for the per softirq cycle netdev_budget. This parameter influences 141the proportion of the configured netdev_budget that is spent on RPS based packet 142processing during RX softirq cycles. It is further meant for making current 143dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. 144(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based 145on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). 146 147Default: 1 148 149dev_weight_tx_bias 150------------------ 151 152Scales the maximum number of packets that can be processed during a TX softirq cycle. 153Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric 154net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. 155 156Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). 157 158Default: 1 159 160default_qdisc 161------------- 162 163The default queuing discipline to use for network devices. This allows 164overriding the default of pfifo_fast with an alternative. Since the default 165queuing discipline is created without additional parameters so is best suited 166to queuing disciplines that work well without configuration like stochastic 167fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use 168queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin 169which require setting up classes and bandwidths. Note that physical multiqueue 170interfaces still use mq as root qdisc, which in turn uses this default for its 171leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead 172default to noqueue. 173 174Default: pfifo_fast 175 176busy_read 177--------- 178 179Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) 180Approximate time in us to busy loop waiting for packets on the device queue. 181This sets the default value of the SO_BUSY_POLL socket option. 182Can be set or overridden per socket by setting socket option SO_BUSY_POLL, 183which is the preferred method of enabling. If you need to enable the feature 184globally via sysctl, a value of 50 is recommended. 185 186Will increase power usage. 187 188Default: 0 (off) 189 190busy_poll 191---------------- 192Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL) 193Approximate time in us to busy loop waiting for events. 194Recommended value depends on the number of sockets you poll on. 195For several sockets 50, for several hundreds 100. 196For more than that you probably want to use epoll. 197Note that only sockets with SO_BUSY_POLL set will be busy polled, 198so you want to either selectively set SO_BUSY_POLL on those sockets or set 199sysctl.net.busy_read globally. 200 201Will increase power usage. 202 203Default: 0 (off) 204 205rmem_default 206------------ 207 208The default setting of the socket receive buffer in bytes. 209 210rmem_max 211-------- 212 213The maximum receive socket buffer size in bytes. 214 215tstamp_allow_data 216----------------- 217Allow processes to receive tx timestamps looped together with the original 218packet contents. If disabled, transmit timestamp requests from unprivileged 219processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. 220 221Default: 1 (on) 222 223 224wmem_default 225------------ 226 227The default setting (in bytes) of the socket send buffer. 228 229wmem_max 230-------- 231 232The maximum send socket buffer size in bytes. 233 234message_burst and message_cost 235------------------------------ 236 237These parameters are used to limit the warning messages written to the kernel 238log from the networking code. They enforce a rate limit to make a 239denial-of-service attack impossible. A higher message_cost factor, results in 240fewer messages that will be written. Message_burst controls when messages will 241be dropped. The default settings limit warning messages to one every five 242seconds. 243 244warnings 245-------- 246 247This sysctl is now unused. 248 249This was used to control console messages from the networking stack that 250occur because of problems on the network like duplicate address or bad 251checksums. 252 253These messages are now emitted at KERN_DEBUG and can generally be enabled 254and controlled by the dynamic_debug facility. 255 256netdev_budget 257------------- 258 259Maximum number of packets taken from all interfaces in one polling cycle (NAPI 260poll). In one polling cycle interfaces which are registered to polling are 261probed in a round-robin manner. Also, a polling cycle may not exceed 262netdev_budget_usecs microseconds, even if netdev_budget has not been 263exhausted. 264 265netdev_budget_usecs 266--------------------- 267 268Maximum number of microseconds in one NAPI polling cycle. Polling 269will exit when either netdev_budget_usecs have elapsed during the 270poll cycle or the number of packets processed reaches netdev_budget. 271 272netdev_max_backlog 273------------------ 274 275Maximum number of packets, queued on the INPUT side, when the interface 276receives packets faster than kernel can process them. 277 278netdev_rss_key 279-------------- 280 281RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is 282randomly generated. 283Some user space might need to gather its content even if drivers do not 284provide ethtool -x support yet. 285 286:: 287 288 myhost:~# cat /proc/sys/net/core/netdev_rss_key 289 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) 290 291File contains nul bytes if no driver ever called netdev_rss_key_fill() function. 292 293Note: 294 /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, 295 but most drivers only use 40 bytes of it. 296 297:: 298 299 myhost:~# ethtool -x eth0 300 RX flow hash indirection table for eth0 with 8 RX ring(s): 301 0: 0 1 2 3 4 5 6 7 302 RSS hash key: 303 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 304 305netdev_tstamp_prequeue 306---------------------- 307 308If set to 0, RX packet timestamps can be sampled after RPS processing, when 309the target CPU processes packets. It might give some delay on timestamps, but 310permit to distribute the load on several cpus. 311 312If set to 1 (default), timestamps are sampled as soon as possible, before 313queueing. 314 315optmem_max 316---------- 317 318Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence 319of struct cmsghdr structures with appended data. 320 321fb_tunnels_only_for_init_net 322---------------------------- 323 324Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0, 325sit0, ip6tnl0, ip6gre0) are automatically created. There are 3 possibilities 326(a) value = 0; respective fallback tunnels are created when module is 327loaded in every net namespaces (backward compatible behavior). 328(b) value = 1; [kcmd value: initns] respective fallback tunnels are 329created only in init net namespace and every other net namespace will 330not have them. 331(c) value = 2; [kcmd value: none] fallback tunnels are not created 332when a module is loaded in any of the net namespace. Setting value to 333"2" is pointless after boot if these modules are built-in, so there is 334a kernel command-line option that can change this default. Please refer to 335Documentation/admin-guide/kernel-parameters.txt for additional details. 336 337Not creating fallback tunnels gives control to userspace to create 338whatever is needed only and avoid creating devices which are redundant. 339 340Default : 0 (for compatibility reasons) 341 342devconf_inherit_init_net 343------------------------ 344 345Controls if a new network namespace should inherit all current 346settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By 347default, we keep the current behavior: for IPv4 we inherit all current 348settings from init_net and for IPv6 we reset all settings to default. 349 350If set to 1, both IPv4 and IPv6 settings are forced to inherit from 351current ones in init_net. If set to 2, both IPv4 and IPv6 settings are 352forced to reset to their default values. If set to 3, both IPv4 and IPv6 353settings are forced to inherit from current ones in the netns where this 354new netns has been created. 355 356Default : 0 (for compatibility reasons) 357 3582. /proc/sys/net/unix - Parameters for Unix domain sockets 359---------------------------------------------------------- 360 361There is only one file in this directory. 362unix_dgram_qlen limits the max number of datagrams queued in Unix domain 363socket's buffer. It will not take effect unless PF_UNIX flag is specified. 364 365 3663. /proc/sys/net/ipv4 - IPV4 settings 367------------------------------------- 368Please see: Documentation/networking/ip-sysctl.rst and 369Documentation/admin-guide/sysctl/net.rst for descriptions of these entries. 370 371 3724. Appletalk 373------------ 374 375The /proc/sys/net/appletalk directory holds the Appletalk configuration data 376when Appletalk is loaded. The configurable parameters are: 377 378aarp-expiry-time 379---------------- 380 381The amount of time we keep an ARP entry before expiring it. Used to age out 382old hosts. 383 384aarp-resolve-time 385----------------- 386 387The amount of time we will spend trying to resolve an Appletalk address. 388 389aarp-retransmit-limit 390--------------------- 391 392The number of times we will retransmit a query before giving up. 393 394aarp-tick-time 395-------------- 396 397Controls the rate at which expires are checked. 398 399The directory /proc/net/appletalk holds the list of active Appletalk sockets 400on a machine. 401 402The fields indicate the DDP type, the local address (in network:node format) 403the remote address, the size of the transmit pending queue, the size of the 404received queue (bytes waiting for applications to read) the state and the uid 405owning the socket. 406 407/proc/net/atalk_iface lists all the interfaces configured for appletalk.It 408shows the name of the interface, its Appletalk address, the network range on 409that address (or network number for phase 1 networks), and the status of the 410interface. 411 412/proc/net/atalk_route lists each known network route. It lists the target 413(network) that the route leads to, the router (may be directly connected), the 414route flags, and the device the route is using. 415 4165. TIPC 417------- 418 419tipc_rmem 420--------- 421 422The TIPC protocol now has a tunable for the receive memory, similar to the 423tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) 424 425:: 426 427 # cat /proc/sys/net/tipc/tipc_rmem 428 4252725 34021800 68043600 429 # 430 431The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values 432are scaled (shifted) versions of that same value. Note that the min value 433is not at this point in time used in any meaningful way, but the triplet is 434preserved in order to be consistent with things like tcp_rmem. 435 436named_timeout 437------------- 438 439TIPC name table updates are distributed asynchronously in a cluster, without 440any form of transaction handling. This means that different race scenarios are 441possible. One such is that a name withdrawal sent out by one node and received 442by another node may arrive after a second, overlapping name publication already 443has been accepted from a third node, although the conflicting updates 444originally may have been issued in the correct sequential order. 445If named_timeout is nonzero, failed topology updates will be placed on a defer 446queue until another event arrives that clears the error, or until the timeout 447expires. Value is in milliseconds. 448