1------------------------------------------------------------------------------ 2 T H E /proc F I L E S Y S T E M 3------------------------------------------------------------------------------ 4/proc/sys Terrehon Bowden <terrehon@pacbell.net> October 7 1999 5 Bodo Bauer <bb@ricochet.net> 6 72.4.x update Jorge Nerin <comandante@zaralinux.com> November 14 2000 8move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009 9------------------------------------------------------------------------------ 10Version 1.3 Kernel version 2.2.12 11 Kernel version 2.4.0-test11-pre4 12------------------------------------------------------------------------------ 13fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 14 15Table of Contents 16----------------- 17 18 0 Preface 19 0.1 Introduction/Credits 20 0.2 Legal Stuff 21 22 1 Collecting System Information 23 1.1 Process-Specific Subdirectories 24 1.2 Kernel data 25 1.3 IDE devices in /proc/ide 26 1.4 Networking info in /proc/net 27 1.5 SCSI info 28 1.6 Parallel port info in /proc/parport 29 1.7 TTY info in /proc/tty 30 1.8 Miscellaneous kernel statistics in /proc/stat 31 1.9 Ext4 file system parameters 32 33 2 Modifying System Parameters 34 35 3 Per-Process Parameters 36 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer 37 score 38 3.2 /proc/<pid>/oom_score - Display current oom-killer score 39 3.3 /proc/<pid>/io - Display the IO accounting fields 40 3.4 /proc/<pid>/coredump_filter - Core dump filtering settings 41 3.5 /proc/<pid>/mountinfo - Information about mounts 42 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 43 44 4 Configuring procfs 45 4.1 Mount options 46 47------------------------------------------------------------------------------ 48Preface 49------------------------------------------------------------------------------ 50 510.1 Introduction/Credits 52------------------------ 53 54This documentation is part of a soon (or so we hope) to be released book on 55the SuSE Linux distribution. As there is no complete documentation for the 56/proc file system and we've used many freely available sources to write these 57chapters, it seems only fair to give the work back to the Linux community. 58This work is based on the 2.2.* kernel version and the upcoming 2.4.*. I'm 59afraid it's still far from complete, but we hope it will be useful. As far as 60we know, it is the first 'all-in-one' document about the /proc file system. It 61is focused on the Intel x86 hardware, so if you are looking for PPC, ARM, 62SPARC, AXP, etc., features, you probably won't find what you are looking for. 63It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But 64additions and patches are welcome and will be added to this document if you 65mail them to Bodo. 66 67We'd like to thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of 68other people for help compiling this documentation. We'd also like to extend a 69special thank you to Andi Kleen for documentation, which we relied on heavily 70to create this document, as well as the additional information he provided. 71Thanks to everybody else who contributed source or docs to the Linux kernel 72and helped create a great piece of software... :) 73 74If you have any comments, corrections or additions, please don't hesitate to 75contact Bodo Bauer at bb@ricochet.net. We'll be happy to add them to this 76document. 77 78The latest version of this document is available online at 79http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html 80 81If the above direction does not works for you, you could try the kernel 82mailing list at linux-kernel@vger.kernel.org and/or try to reach me at 83comandante@zaralinux.com. 84 850.2 Legal Stuff 86--------------- 87 88We don't guarantee the correctness of this document, and if you come to us 89complaining about how you screwed up your system because of incorrect 90documentation, we won't feel responsible... 91 92------------------------------------------------------------------------------ 93CHAPTER 1: COLLECTING SYSTEM INFORMATION 94------------------------------------------------------------------------------ 95 96------------------------------------------------------------------------------ 97In This Chapter 98------------------------------------------------------------------------------ 99* Investigating the properties of the pseudo file system /proc and its 100 ability to provide information on the running Linux system 101* Examining /proc's structure 102* Uncovering various information about the kernel and the processes running 103 on the system 104------------------------------------------------------------------------------ 105 106 107The proc file system acts as an interface to internal data structures in the 108kernel. It can be used to obtain information about the system and to change 109certain kernel parameters at runtime (sysctl). 110 111First, we'll take a look at the read-only parts of /proc. In Chapter 2, we 112show you how you can use /proc/sys to change settings. 113 1141.1 Process-Specific Subdirectories 115----------------------------------- 116 117The directory /proc contains (among other things) one subdirectory for each 118process running on the system, which is named after the process ID (PID). 119 120The link self points to the process reading the file system. Each process 121subdirectory has the entries listed in Table 1-1. 122 123 124Table 1-1: Process specific entries in /proc 125.............................................................................. 126 File Content 127 clear_refs Clears page referenced bits shown in smaps output 128 cmdline Command line arguments 129 cpu Current and last cpu in which it was executed (2.4)(smp) 130 cwd Link to the current working directory 131 environ Values of environment variables 132 exe Link to the executable of this process 133 fd Directory, which contains all file descriptors 134 maps Memory maps to executables and library files (2.4) 135 mem Memory held by this process 136 root Link to the root directory of this process 137 stat Process status 138 statm Process memory status information 139 status Process status in human readable form 140 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan 141 pagemap Page table 142 stack Report full stack trace, enable via CONFIG_STACKTRACE 143 smaps a extension based on maps, showing the memory consumption of 144 each mapping 145.............................................................................. 146 147For example, to get the status information of a process, all you have to do is 148read the file /proc/PID/status: 149 150 >cat /proc/self/status 151 Name: cat 152 State: R (running) 153 Tgid: 5452 154 Pid: 5452 155 PPid: 743 156 TracerPid: 0 (2.4) 157 Uid: 501 501 501 501 158 Gid: 100 100 100 100 159 FDSize: 256 160 Groups: 100 14 16 161 VmPeak: 5004 kB 162 VmSize: 5004 kB 163 VmLck: 0 kB 164 VmHWM: 476 kB 165 VmRSS: 476 kB 166 VmData: 156 kB 167 VmStk: 88 kB 168 VmExe: 68 kB 169 VmLib: 1412 kB 170 VmPTE: 20 kb 171 VmSwap: 0 kB 172 Threads: 1 173 SigQ: 0/28578 174 SigPnd: 0000000000000000 175 ShdPnd: 0000000000000000 176 SigBlk: 0000000000000000 177 SigIgn: 0000000000000000 178 SigCgt: 0000000000000000 179 CapInh: 00000000fffffeff 180 CapPrm: 0000000000000000 181 CapEff: 0000000000000000 182 CapBnd: ffffffffffffffff 183 voluntary_ctxt_switches: 0 184 nonvoluntary_ctxt_switches: 1 185 186This shows you nearly the same information you would get if you viewed it with 187the ps command. In fact, ps uses the proc file system to obtain its 188information. But you get a more detailed view of the process by reading the 189file /proc/PID/status. It fields are described in table 1-2. 190 191The statm file contains more detailed information about the process 192memory usage. Its seven fields are explained in Table 1-3. The stat file 193contains details information about the process itself. Its fields are 194explained in Table 1-4. 195 196(for SMP CONFIG users) 197For making accounting scalable, RSS related information are handled in 198asynchronous manner and the vaule may not be very precise. To see a precise 199snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table. 200It's slow but very precise. 201 202Table 1-2: Contents of the status files (as of 2.6.30-rc7) 203.............................................................................. 204 Field Content 205 Name filename of the executable 206 State state (R is running, S is sleeping, D is sleeping 207 in an uninterruptible wait, Z is zombie, 208 T is traced or stopped) 209 Tgid thread group ID 210 Pid process id 211 PPid process id of the parent process 212 TracerPid PID of process tracing this process (0 if not) 213 Uid Real, effective, saved set, and file system UIDs 214 Gid Real, effective, saved set, and file system GIDs 215 FDSize number of file descriptor slots currently allocated 216 Groups supplementary group list 217 VmPeak peak virtual memory size 218 VmSize total program size 219 VmLck locked memory size 220 VmHWM peak resident set size ("high water mark") 221 VmRSS size of memory portions 222 VmData size of data, stack, and text segments 223 VmStk size of data, stack, and text segments 224 VmExe size of text segment 225 VmLib size of shared library code 226 VmPTE size of page table entries 227 VmSwap size of swap usage (the number of referred swapents) 228 Threads number of threads 229 SigQ number of signals queued/max. number for queue 230 SigPnd bitmap of pending signals for the thread 231 ShdPnd bitmap of shared pending signals for the process 232 SigBlk bitmap of blocked signals 233 SigIgn bitmap of ignored signals 234 SigCgt bitmap of catched signals 235 CapInh bitmap of inheritable capabilities 236 CapPrm bitmap of permitted capabilities 237 CapEff bitmap of effective capabilities 238 CapBnd bitmap of capabilities bounding set 239 Cpus_allowed mask of CPUs on which this process may run 240 Cpus_allowed_list Same as previous, but in "list format" 241 Mems_allowed mask of memory nodes allowed to this process 242 Mems_allowed_list Same as previous, but in "list format" 243 voluntary_ctxt_switches number of voluntary context switches 244 nonvoluntary_ctxt_switches number of non voluntary context switches 245.............................................................................. 246 247Table 1-3: Contents of the statm files (as of 2.6.8-rc3) 248.............................................................................. 249 Field Content 250 size total program size (pages) (same as VmSize in status) 251 resident size of memory portions (pages) (same as VmRSS in status) 252 shared number of pages that are shared (i.e. backed by a file) 253 trs number of pages that are 'code' (not including libs; broken, 254 includes data segment) 255 lrs number of pages of library (always 0 on 2.6) 256 drs number of pages of data/stack (including libs; broken, 257 includes library text) 258 dt number of dirty pages (always 0 on 2.6) 259.............................................................................. 260 261 262Table 1-4: Contents of the stat files (as of 2.6.30-rc7) 263.............................................................................. 264 Field Content 265 pid process id 266 tcomm filename of the executable 267 state state (R is running, S is sleeping, D is sleeping in an 268 uninterruptible wait, Z is zombie, T is traced or stopped) 269 ppid process id of the parent process 270 pgrp pgrp of the process 271 sid session id 272 tty_nr tty the process uses 273 tty_pgrp pgrp of the tty 274 flags task flags 275 min_flt number of minor faults 276 cmin_flt number of minor faults with child's 277 maj_flt number of major faults 278 cmaj_flt number of major faults with child's 279 utime user mode jiffies 280 stime kernel mode jiffies 281 cutime user mode jiffies with child's 282 cstime kernel mode jiffies with child's 283 priority priority level 284 nice nice level 285 num_threads number of threads 286 it_real_value (obsolete, always 0) 287 start_time time the process started after system boot 288 vsize virtual memory size 289 rss resident set memory size 290 rsslim current limit in bytes on the rss 291 start_code address above which program text can run 292 end_code address below which program text can run 293 start_stack address of the start of the main process stack 294 esp current value of ESP 295 eip current value of EIP 296 pending bitmap of pending signals 297 blocked bitmap of blocked signals 298 sigign bitmap of ignored signals 299 sigcatch bitmap of catched signals 300 wchan address where process went to sleep 301 0 (place holder) 302 0 (place holder) 303 exit_signal signal to send to parent thread on exit 304 task_cpu which CPU the task is scheduled on 305 rt_priority realtime priority 306 policy scheduling policy (man sched_setscheduler) 307 blkio_ticks time spent waiting for block IO 308 gtime guest time of the task in jiffies 309 cgtime guest time of the task children in jiffies 310 start_data address above which program data+bss is placed 311 end_data address below which program data+bss is placed 312 start_brk address above which program heap can be expanded with brk() 313.............................................................................. 314 315The /proc/PID/maps file containing the currently mapped memory regions and 316their access permissions. 317 318The format is: 319 320address perms offset dev inode pathname 321 32208048000-08049000 r-xp 00000000 03:00 8312 /opt/test 32308049000-0804a000 rw-p 00001000 03:00 8312 /opt/test 3240804a000-0806b000 rw-p 00000000 00:00 0 [heap] 325a7cb1000-a7cb2000 ---p 00000000 00:00 0 326a7cb2000-a7eb2000 rw-p 00000000 00:00 0 327a7eb2000-a7eb3000 ---p 00000000 00:00 0 328a7eb3000-a7ed5000 rw-p 00000000 00:00 0 [stack:1001] 329a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 330a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6 331a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6 332a800b000-a800e000 rw-p 00000000 00:00 0 333a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0 334a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0 335a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0 336a8024000-a8027000 rw-p 00000000 00:00 0 337a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 338a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 339a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 340aff35000-aff4a000 rw-p 00000000 00:00 0 [stack] 341ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] 342 343where "address" is the address space in the process that it occupies, "perms" 344is a set of permissions: 345 346 r = read 347 w = write 348 x = execute 349 s = shared 350 p = private (copy on write) 351 352"offset" is the offset into the mapping, "dev" is the device (major:minor), and 353"inode" is the inode on that device. 0 indicates that no inode is associated 354with the memory region, as the case would be with BSS (uninitialized data). 355The "pathname" shows the name associated file for this mapping. If the mapping 356is not associated with a file: 357 358 [heap] = the heap of the program 359 [stack] = the stack of the main process 360 [stack:1001] = the stack of the thread with tid 1001 361 [vdso] = the "virtual dynamic shared object", 362 the kernel system call handler 363 [anon:<name>] = an anonymous mapping that has been 364 named by userspace 365 366 or if empty, the mapping is anonymous. 367 368The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint 369of the individual tasks of a process. In this file you will see a mapping marked 370as [stack] if that task sees it as a stack. This is a key difference from the 371content of /proc/PID/maps, where you will see all mappings that are being used 372as stack by all of those tasks. Hence, for the example above, the task-level 373map, i.e. /proc/PID/task/TID/maps for thread 1001 will look like this: 374 37508048000-08049000 r-xp 00000000 03:00 8312 /opt/test 37608049000-0804a000 rw-p 00001000 03:00 8312 /opt/test 3770804a000-0806b000 rw-p 00000000 00:00 0 [heap] 378a7cb1000-a7cb2000 ---p 00000000 00:00 0 379a7cb2000-a7eb2000 rw-p 00000000 00:00 0 380a7eb2000-a7eb3000 ---p 00000000 00:00 0 381a7eb3000-a7ed5000 rw-p 00000000 00:00 0 [stack] 382a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 383a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6 384a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6 385a800b000-a800e000 rw-p 00000000 00:00 0 386a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0 387a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0 388a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0 389a8024000-a8027000 rw-p 00000000 00:00 0 390a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 391a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 392a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 393aff35000-aff4a000 rw-p 00000000 00:00 0 394ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] 395 396The /proc/PID/smaps is an extension based on maps, showing the memory 397consumption for each of the process's mappings. For each of mappings there 398is a series of lines such as the following: 399 40008048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash 401Size: 1084 kB 402Rss: 892 kB 403Pss: 374 kB 404Shared_Clean: 892 kB 405Shared_Dirty: 0 kB 406Private_Clean: 0 kB 407Private_Dirty: 0 kB 408Referenced: 892 kB 409Anonymous: 0 kB 410Swap: 0 kB 411SwapPss: 0 kB 412KernelPageSize: 4 kB 413MMUPageSize: 4 kB 414Locked: 374 kB 415Name: name from userspace 416 417The first of these lines shows the same information as is displayed for the 418mapping in /proc/PID/maps. The remaining lines show the size of the mapping 419(size), the amount of the mapping that is currently resident in RAM (RSS), the 420process' proportional share of this mapping (PSS), the number of clean and 421dirty private pages in the mapping. 422 423The "proportional set size" (PSS) of a process is the count of pages it has 424in memory, where each page is divided by the number of processes sharing it. 425So if a process has 1000 pages all to itself, and 1000 shared with one other 426process, its PSS will be 1500. 427Note that even a page which is part of a MAP_SHARED mapping, but has only 428a single pte mapped, i.e. is currently used by only one process, is accounted 429as private and not as shared. 430"Referenced" indicates the amount of memory currently marked as referenced or 431accessed. 432"Anonymous" shows the amount of memory that does not belong to any file. Even 433a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE 434and a page is modified, the file page is replaced by a private anonymous copy. 435"Swap" shows how much would-be-anonymous memory is also used, but out on 436swap. 437"SwapPss" shows proportional swap share of this mapping. 438 439The "Name" field will only be present on a mapping that has been named by 440userspace, and will show the name passed in by userspace. 441 442This file is only present if the CONFIG_MMU kernel configuration option is 443enabled. 444 445The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG 446bits on both physical and virtual pages associated with a process. 447To clear the bits for all the pages associated with the process 448 > echo 1 > /proc/PID/clear_refs 449 450To clear the bits for the anonymous pages associated with the process 451 > echo 2 > /proc/PID/clear_refs 452 453To clear the bits for the file mapped pages associated with the process 454 > echo 3 > /proc/PID/clear_refs 455Any other value written to /proc/PID/clear_refs will have no effect. 456 457To reset the peak resident set size ("high water mark") to the process's 458current value: 459 > echo 5 > /proc/PID/clear_refs 460 461The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags 462using /proc/kpageflags and number of times a page is mapped using 463/proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt. 464 4651.2 Kernel data 466--------------- 467 468Similar to the process entries, the kernel data files give information about 469the running kernel. The files used to obtain this information are contained in 470/proc and are listed in Table 1-5. Not all of these will be present in your 471system. It depends on the kernel configuration and the loaded modules, which 472files are there, and which are missing. 473 474Table 1-5: Kernel info in /proc 475.............................................................................. 476 File Content 477 apm Advanced power management info 478 buddyinfo Kernel memory allocator information (see text) (2.5) 479 bus Directory containing bus specific information 480 cmdline Kernel command line 481 cpuinfo Info about the CPU 482 devices Available devices (block and character) 483 dma Used DMS channels 484 filesystems Supported filesystems 485 driver Various drivers grouped here, currently rtc (2.4) 486 execdomains Execdomains, related to security (2.4) 487 fb Frame Buffer devices (2.4) 488 fs File system parameters, currently nfs/exports (2.4) 489 ide Directory containing info about the IDE subsystem 490 interrupts Interrupt usage 491 iomem Memory map (2.4) 492 ioports I/O port usage 493 irq Masks for irq to cpu affinity (2.4)(smp?) 494 isapnp ISA PnP (Plug&Play) Info (2.4) 495 kcore Kernel core image (can be ELF or A.OUT(deprecated in 2.4)) 496 kmsg Kernel messages 497 ksyms Kernel symbol table 498 loadavg Load average of last 1, 5 & 15 minutes 499 locks Kernel locks 500 meminfo Memory info 501 misc Miscellaneous 502 modules List of loaded modules 503 mounts Mounted filesystems 504 net Networking info (see text) 505 pagetypeinfo Additional page allocator information (see text) (2.5) 506 partitions Table of partitions known to the system 507 pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, 508 decoupled by lspci (2.4) 509 rtc Real time clock 510 scsi SCSI info (see text) 511 slabinfo Slab pool info 512 softirqs softirq usage 513 stat Overall statistics 514 swaps Swap space utilization 515 sys See chapter 2 516 sysvipc Info of SysVIPC Resources (msg, sem, shm) (2.4) 517 tty Info of tty drivers 518 uptime System uptime 519 version Kernel version 520 video bttv info of video resources (2.4) 521 vmallocinfo Show vmalloced areas 522.............................................................................. 523 524You can, for example, check which interrupts are currently in use and what 525they are used for by looking in the file /proc/interrupts: 526 527 > cat /proc/interrupts 528 CPU0 529 0: 8728810 XT-PIC timer 530 1: 895 XT-PIC keyboard 531 2: 0 XT-PIC cascade 532 3: 531695 XT-PIC aha152x 533 4: 2014133 XT-PIC serial 534 5: 44401 XT-PIC pcnet_cs 535 8: 2 XT-PIC rtc 536 11: 8 XT-PIC i82365 537 12: 182918 XT-PIC PS/2 Mouse 538 13: 1 XT-PIC fpu 539 14: 1232265 XT-PIC ide0 540 15: 7 XT-PIC ide1 541 NMI: 0 542 543In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the 544output of a SMP machine): 545 546 > cat /proc/interrupts 547 548 CPU0 CPU1 549 0: 1243498 1214548 IO-APIC-edge timer 550 1: 8949 8958 IO-APIC-edge keyboard 551 2: 0 0 XT-PIC cascade 552 5: 11286 10161 IO-APIC-edge soundblaster 553 8: 1 0 IO-APIC-edge rtc 554 9: 27422 27407 IO-APIC-edge 3c503 555 12: 113645 113873 IO-APIC-edge PS/2 Mouse 556 13: 0 0 XT-PIC fpu 557 14: 22491 24012 IO-APIC-edge ide0 558 15: 2183 2415 IO-APIC-edge ide1 559 17: 30564 30414 IO-APIC-level eth0 560 18: 177 164 IO-APIC-level bttv 561 NMI: 2457961 2457959 562 LOC: 2457882 2457881 563 ERR: 2155 564 565NMI is incremented in this case because every timer interrupt generates a NMI 566(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups. 567 568LOC is the local interrupt counter of the internal APIC of every CPU. 569 570ERR is incremented in the case of errors in the IO-APIC bus (the bus that 571connects the CPUs in a SMP system. This means that an error has been detected, 572the IO-APIC automatically retry the transmission, so it should not be a big 573problem, but you should read the SMP-FAQ. 574 575In 2.6.2* /proc/interrupts was expanded again. This time the goal was for 576/proc/interrupts to display every IRQ vector in use by the system, not 577just those considered 'most important'. The new vectors are: 578 579 THR -- interrupt raised when a machine check threshold counter 580 (typically counting ECC corrected errors of memory or cache) exceeds 581 a configurable threshold. Only available on some systems. 582 583 TRM -- a thermal event interrupt occurs when a temperature threshold 584 has been exceeded for the CPU. This interrupt may also be generated 585 when the temperature drops back to normal. 586 587 SPU -- a spurious interrupt is some interrupt that was raised then lowered 588 by some IO device before it could be fully processed by the APIC. Hence 589 the APIC sees the interrupt but does not know what device it came from. 590 For this case the APIC will generate the interrupt with a IRQ vector 591 of 0xff. This might also be generated by chipset bugs. 592 593 RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are 594 sent from one CPU to another per the needs of the OS. Typically, 595 their statistics are used by kernel developers and interested users to 596 determine the occurrence of interrupts of the given type. 597 598The above IRQ vectors are displayed only when relevant. For example, 599the threshold vector does not exist on x86_64 platforms. Others are 600suppressed when the system is a uniprocessor. As of this writing, only 601i386 and x86_64 platforms support the new IRQ vector displays. 602 603Of some interest is the introduction of the /proc/irq directory to 2.4. 604It could be used to set IRQ to CPU affinity, this means that you can "hook" an 605IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the 606irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and 607prof_cpu_mask. 608 609For example 610 > ls /proc/irq/ 611 0 10 12 14 16 18 2 4 6 8 prof_cpu_mask 612 1 11 13 15 17 19 3 5 7 9 default_smp_affinity 613 > ls /proc/irq/0/ 614 smp_affinity 615 616smp_affinity is a bitmask, in which you can specify which CPUs can handle the 617IRQ, you can set it by doing: 618 619 > echo 1 > /proc/irq/10/smp_affinity 620 621This means that only the first CPU will handle the IRQ, but you can also echo 6225 which means that only the first and fourth CPU can handle the IRQ. 623 624The contents of each smp_affinity file is the same by default: 625 626 > cat /proc/irq/0/smp_affinity 627 ffffffff 628 629There is an alternate interface, smp_affinity_list which allows specifying 630a cpu range instead of a bitmask: 631 632 > cat /proc/irq/0/smp_affinity_list 633 1024-1031 634 635The default_smp_affinity mask applies to all non-active IRQs, which are the 636IRQs which have not yet been allocated/activated, and hence which lack a 637/proc/irq/[0-9]* directory. 638 639The node file on an SMP system shows the node to which the device using the IRQ 640reports itself as being attached. This hardware locality information does not 641include information about any possible driver locality preference. 642 643prof_cpu_mask specifies which CPUs are to be profiled by the system wide 644profiler. Default value is ffffffff (all cpus if there are only 32 of them). 645 646The way IRQs are routed is handled by the IO-APIC, and it's Round Robin 647between all the CPUs which are allowed to handle it. As usual the kernel has 648more info than you and does a better job than you, so the defaults are the 649best choice for almost everyone. [Note this applies only to those IO-APIC's 650that support "Round Robin" interrupt distribution.] 651 652There are three more important subdirectories in /proc: net, scsi, and sys. 653The general rule is that the contents, or even the existence of these 654directories, depend on your kernel configuration. If SCSI is not enabled, the 655directory scsi may not exist. The same is true with the net, which is there 656only when networking support is present in the running kernel. 657 658The slabinfo file gives information about memory usage at the slab level. 659Linux uses slab pools for memory management above page level in version 2.2. 660Commonly used objects have their own slab pool (such as network buffers, 661directory cache, and so on). 662 663.............................................................................. 664 665> cat /proc/buddyinfo 666 667Node 0, zone DMA 0 4 5 4 4 3 ... 668Node 0, zone Normal 1 0 0 1 101 8 ... 669Node 0, zone HighMem 2 0 0 1 1 0 ... 670 671External fragmentation is a problem under some workloads, and buddyinfo is a 672useful tool for helping diagnose these problems. Buddyinfo will give you a 673clue as to how big an area you can safely allocate, or why a previous 674allocation failed. 675 676Each column represents the number of pages of a certain order which are 677available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in 678ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 679available in ZONE_NORMAL, etc... 680 681More information relevant to external fragmentation can be found in 682pagetypeinfo. 683 684> cat /proc/pagetypeinfo 685Page block order: 9 686Pages per block: 512 687 688Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 689Node 0, zone DMA, type Unmovable 0 0 0 1 1 1 1 1 1 1 0 690Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 691Node 0, zone DMA, type Movable 1 1 2 1 2 1 1 0 1 0 2 692Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 1 0 693Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 694Node 0, zone DMA32, type Unmovable 103 54 77 1 1 1 11 8 7 1 9 695Node 0, zone DMA32, type Reclaimable 0 0 2 1 0 0 0 0 1 0 0 696Node 0, zone DMA32, type Movable 169 152 113 91 77 54 39 13 6 1 452 697Node 0, zone DMA32, type Reserve 1 2 2 2 2 0 1 1 1 1 0 698Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 699 700Number of blocks type Unmovable Reclaimable Movable Reserve Isolate 701Node 0, zone DMA 2 0 5 1 0 702Node 0, zone DMA32 41 6 967 2 0 703 704Fragmentation avoidance in the kernel works by grouping pages of different 705migrate types into the same contiguous regions of memory called page blocks. 706A page block is typically the size of the default hugepage size e.g. 2MB on 707X86-64. By keeping pages grouped based on their ability to move, the kernel 708can reclaim pages within a page block to satisfy a high-order allocation. 709 710The pagetypinfo begins with information on the size of a page block. It 711then gives the same type of information as buddyinfo except broken down 712by migrate-type and finishes with details on how many page blocks of each 713type exist. 714 715If min_free_kbytes has been tuned correctly (recommendations made by hugeadm 716from libhugetlbfs http://sourceforge.net/projects/libhugetlbfs/), one can 717make an estimate of the likely number of huge pages that can be allocated 718at a given point in time. All the "Movable" blocks should be allocatable 719unless memory has been mlock()'d. Some of the Reclaimable blocks should 720also be allocatable although a lot of filesystem metadata may have to be 721reclaimed to achieve this. 722 723.............................................................................. 724 725meminfo: 726 727Provides information about distribution and utilization of memory. This 728varies by architecture and compile options. The following is from a 72916GB PIII, which has highmem enabled. You may not have all of these fields. 730 731> cat /proc/meminfo 732 733The "Locked" indicates whether the mapping is locked in memory or not. 734 735 736MemTotal: 16344972 kB 737MemFree: 13634064 kB 738Buffers: 3656 kB 739Cached: 1195708 kB 740SwapCached: 0 kB 741Active: 891636 kB 742Inactive: 1077224 kB 743HighTotal: 15597528 kB 744HighFree: 13629632 kB 745LowTotal: 747444 kB 746LowFree: 4432 kB 747SwapTotal: 0 kB 748SwapFree: 0 kB 749Dirty: 968 kB 750Writeback: 0 kB 751AnonPages: 861800 kB 752Mapped: 280372 kB 753Slab: 284364 kB 754SReclaimable: 159856 kB 755SUnreclaim: 124508 kB 756PageTables: 24448 kB 757NFS_Unstable: 0 kB 758Bounce: 0 kB 759WritebackTmp: 0 kB 760CommitLimit: 7669796 kB 761Committed_AS: 100056 kB 762VmallocTotal: 112216 kB 763VmallocUsed: 428 kB 764VmallocChunk: 111088 kB 765 766 MemTotal: Total usable ram (i.e. physical ram minus a few reserved 767 bits and the kernel binary code) 768 MemFree: The sum of LowFree+HighFree 769 Buffers: Relatively temporary storage for raw disk blocks 770 shouldn't get tremendously large (20MB or so) 771 Cached: in-memory cache for files read from the disk (the 772 pagecache). Doesn't include SwapCached 773 SwapCached: Memory that once was swapped out, is swapped back in but 774 still also is in the swapfile (if memory is needed it 775 doesn't need to be swapped out AGAIN because it is already 776 in the swapfile. This saves I/O) 777 Active: Memory that has been used more recently and usually not 778 reclaimed unless absolutely necessary. 779 Inactive: Memory which has been less recently used. It is more 780 eligible to be reclaimed for other purposes 781 HighTotal: 782 HighFree: Highmem is all memory above ~860MB of physical memory 783 Highmem areas are for use by userspace programs, or 784 for the pagecache. The kernel must use tricks to access 785 this memory, making it slower to access than lowmem. 786 LowTotal: 787 LowFree: Lowmem is memory which can be used for everything that 788 highmem can be used for, but it is also available for the 789 kernel's use for its own data structures. Among many 790 other things, it is where everything from the Slab is 791 allocated. Bad things happen when you're out of lowmem. 792 SwapTotal: total amount of swap space available 793 SwapFree: Memory which has been evicted from RAM, and is temporarily 794 on the disk 795 Dirty: Memory which is waiting to get written back to the disk 796 Writeback: Memory which is actively being written back to the disk 797 AnonPages: Non-file backed pages mapped into userspace page tables 798 Mapped: files which have been mmaped, such as libraries 799 Slab: in-kernel data structures cache 800SReclaimable: Part of Slab, that might be reclaimed, such as caches 801 SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure 802 PageTables: amount of memory dedicated to the lowest level of page 803 tables. 804NFS_Unstable: NFS pages sent to the server, but not yet committed to stable 805 storage 806 Bounce: Memory used for block device "bounce buffers" 807WritebackTmp: Memory used by FUSE for temporary writeback buffers 808 CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), 809 this is the total amount of memory currently available to 810 be allocated on the system. This limit is only adhered to 811 if strict overcommit accounting is enabled (mode 2 in 812 'vm.overcommit_memory'). 813 The CommitLimit is calculated with the following formula: 814 CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap 815 For example, on a system with 1G of physical RAM and 7G 816 of swap with a `vm.overcommit_ratio` of 30 it would 817 yield a CommitLimit of 7.3G. 818 For more details, see the memory overcommit documentation 819 in vm/overcommit-accounting. 820Committed_AS: The amount of memory presently allocated on the system. 821 The committed memory is a sum of all of the memory which 822 has been allocated by processes, even if it has not been 823 "used" by them as of yet. A process which malloc()'s 1G 824 of memory, but only touches 300M of it will only show up 825 as using 300M of memory even if it has the address space 826 allocated for the entire 1G. This 1G is memory which has 827 been "committed" to by the VM and can be used at any time 828 by the allocating application. With strict overcommit 829 enabled on the system (mode 2 in 'vm.overcommit_memory'), 830 allocations which would exceed the CommitLimit (detailed 831 above) will not be permitted. This is useful if one needs 832 to guarantee that processes will not fail due to lack of 833 memory once that memory has been successfully allocated. 834VmallocTotal: total size of vmalloc memory area 835 VmallocUsed: amount of vmalloc area which is used 836VmallocChunk: largest contiguous block of vmalloc area which is free 837 838.............................................................................. 839 840vmallocinfo: 841 842Provides information about vmalloced/vmaped areas. One line per area, 843containing the virtual address range of the area, size in bytes, 844caller information of the creator, and optional information depending 845on the kind of area : 846 847 pages=nr number of pages 848 phys=addr if a physical address was specified 849 ioremap I/O mapping (ioremap() and friends) 850 vmalloc vmalloc() area 851 vmap vmap()ed pages 852 user VM_USERMAP area 853 vpages buffer for pages pointers was vmalloced (huge area) 854 N<node>=nr (Only on NUMA kernels) 855 Number of pages allocated on memory node <node> 856 857> cat /proc/vmallocinfo 8580xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ... 859 /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 8600xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ... 861 /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 8620xffffc20000302000-0xffffc20000304000 8192 acpi_tb_verify_table+0x21/0x4f... 863 phys=7fee8000 ioremap 8640xffffc20000304000-0xffffc20000307000 12288 acpi_tb_verify_table+0x21/0x4f... 865 phys=7fee7000 ioremap 8660xffffc2000031d000-0xffffc2000031f000 8192 init_vdso_vars+0x112/0x210 8670xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e ... 868 /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 8690xffffc2000033a000-0xffffc2000033d000 12288 sys_swapon+0x640/0xac0 ... 870 pages=2 vmalloc N1=2 8710xffffc20000347000-0xffffc2000034c000 20480 xt_alloc_table_info+0xfe ... 872 /0x130 [x_tables] pages=4 vmalloc N0=4 8730xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 ... 874 pages=14 vmalloc N2=14 8750xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 ... 876 pages=4 vmalloc N1=4 8770xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 ... 878 pages=2 vmalloc N1=2 8790xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ... 880 pages=10 vmalloc N0=10 881 882.............................................................................. 883 884softirqs: 885 886Provides counts of softirq handlers serviced since boot time, for each cpu. 887 888> cat /proc/softirqs 889 CPU0 CPU1 CPU2 CPU3 890 HI: 0 0 0 0 891 TIMER: 27166 27120 27097 27034 892 NET_TX: 0 0 0 17 893 NET_RX: 42 0 0 39 894 BLOCK: 0 0 107 1121 895 TASKLET: 0 0 0 290 896 SCHED: 27035 26983 26971 26746 897 HRTIMER: 0 0 0 0 898 RCU: 1678 1769 2178 2250 899 900 9011.3 IDE devices in /proc/ide 902---------------------------- 903 904The subdirectory /proc/ide contains information about all IDE devices of which 905the kernel is aware. There is one subdirectory for each IDE controller, the 906file drivers and a link for each IDE device, pointing to the device directory 907in the controller specific subtree. 908 909The file drivers contains general information about the drivers used for the 910IDE devices: 911 912 > cat /proc/ide/drivers 913 ide-cdrom version 4.53 914 ide-disk version 1.08 915 916More detailed information can be found in the controller specific 917subdirectories. These are named ide0, ide1 and so on. Each of these 918directories contains the files shown in table 1-6. 919 920 921Table 1-6: IDE controller info in /proc/ide/ide? 922.............................................................................. 923 File Content 924 channel IDE channel (0 or 1) 925 config Configuration (only for PCI/IDE bridge) 926 mate Mate name 927 model Type/Chipset of IDE controller 928.............................................................................. 929 930Each device connected to a controller has a separate subdirectory in the 931controllers directory. The files listed in table 1-7 are contained in these 932directories. 933 934 935Table 1-7: IDE device information 936.............................................................................. 937 File Content 938 cache The cache 939 capacity Capacity of the medium (in 512Byte blocks) 940 driver driver and version 941 geometry physical and logical geometry 942 identify device identify block 943 media media type 944 model device identifier 945 settings device setup 946 smart_thresholds IDE disk management thresholds 947 smart_values IDE disk management values 948.............................................................................. 949 950The most interesting file is settings. This file contains a nice overview of 951the drive parameters: 952 953 # cat /proc/ide/ide0/hda/settings 954 name value min max mode 955 ---- ----- --- --- ---- 956 bios_cyl 526 0 65535 rw 957 bios_head 255 0 255 rw 958 bios_sect 63 0 63 rw 959 breada_readahead 4 0 127 rw 960 bswap 0 0 1 r 961 file_readahead 72 0 2097151 rw 962 io_32bit 0 0 3 rw 963 keepsettings 0 0 1 rw 964 max_kb_per_request 122 1 127 rw 965 multcount 0 0 8 rw 966 nice1 1 0 1 rw 967 nowerr 0 0 1 rw 968 pio_mode write-only 0 255 w 969 slow 0 0 1 rw 970 unmaskirq 0 0 1 rw 971 using_dma 0 0 1 rw 972 973 9741.4 Networking info in /proc/net 975-------------------------------- 976 977The subdirectory /proc/net follows the usual pattern. Table 1-8 shows the 978additional values you get for IP version 6 if you configure the kernel to 979support this. Table 1-9 lists the files and their meaning. 980 981 982Table 1-8: IPv6 info in /proc/net 983.............................................................................. 984 File Content 985 udp6 UDP sockets (IPv6) 986 tcp6 TCP sockets (IPv6) 987 raw6 Raw device statistics (IPv6) 988 igmp6 IP multicast addresses, which this host joined (IPv6) 989 if_inet6 List of IPv6 interface addresses 990 ipv6_route Kernel routing table for IPv6 991 rt6_stats Global IPv6 routing tables statistics 992 sockstat6 Socket statistics (IPv6) 993 snmp6 Snmp data (IPv6) 994.............................................................................. 995 996 997Table 1-9: Network info in /proc/net 998.............................................................................. 999 File Content 1000 arp Kernel ARP table 1001 dev network devices with statistics 1002 dev_mcast the Layer2 multicast groups a device is listening too 1003 (interface index, label, number of references, number of bound 1004 addresses). 1005 dev_stat network device status 1006 ip_fwchains Firewall chain linkage 1007 ip_fwnames Firewall chain names 1008 ip_masq Directory containing the masquerading tables 1009 ip_masquerade Major masquerading table 1010 netstat Network statistics 1011 raw raw device statistics 1012 route Kernel routing table 1013 rpc Directory containing rpc info 1014 rt_cache Routing cache 1015 snmp SNMP data 1016 sockstat Socket statistics 1017 tcp TCP sockets 1018 tr_rif Token ring RIF routing table 1019 udp UDP sockets 1020 unix UNIX domain sockets 1021 wireless Wireless interface data (Wavelan etc) 1022 igmp IP multicast addresses, which this host joined 1023 psched Global packet scheduler parameters. 1024 netlink List of PF_NETLINK sockets 1025 ip_mr_vifs List of multicast virtual interfaces 1026 ip_mr_cache List of multicast routing cache 1027.............................................................................. 1028 1029You can use this information to see which network devices are available in 1030your system and how much traffic was routed over those devices: 1031 1032 > cat /proc/net/dev 1033 Inter-|Receive |[... 1034 face |bytes packets errs drop fifo frame compressed multicast|[... 1035 lo: 908188 5596 0 0 0 0 0 0 [... 1036 ppp0:15475140 20721 410 0 0 410 0 0 [... 1037 eth0: 614530 7085 0 0 0 0 0 1 [... 1038 1039 ...] Transmit 1040 ...] bytes packets errs drop fifo colls carrier compressed 1041 ...] 908188 5596 0 0 0 0 0 0 1042 ...] 1375103 17405 0 0 0 0 0 0 1043 ...] 1703981 5535 0 0 0 3 0 0 1044 1045In addition, each Channel Bond interface has its own directory. For 1046example, the bond0 device will have a directory called /proc/net/bond0/. 1047It will contain information that is specific to that bond, such as the 1048current slaves of the bond, the link status of the slaves, and how 1049many times the slaves link has failed. 1050 10511.5 SCSI info 1052------------- 1053 1054If you have a SCSI host adapter in your system, you'll find a subdirectory 1055named after the driver for this adapter in /proc/scsi. You'll also see a list 1056of all recognized SCSI devices in /proc/scsi: 1057 1058 >cat /proc/scsi/scsi 1059 Attached devices: 1060 Host: scsi0 Channel: 00 Id: 00 Lun: 00 1061 Vendor: IBM Model: DGHS09U Rev: 03E0 1062 Type: Direct-Access ANSI SCSI revision: 03 1063 Host: scsi0 Channel: 00 Id: 06 Lun: 00 1064 Vendor: PIONEER Model: CD-ROM DR-U06S Rev: 1.04 1065 Type: CD-ROM ANSI SCSI revision: 02 1066 1067 1068The directory named after the driver has one file for each adapter found in 1069the system. These files contain information about the controller, including 1070the used IRQ and the IO address range. The amount of information shown is 1071dependent on the adapter you use. The example shows the output for an Adaptec 1072AHA-2940 SCSI adapter: 1073 1074 > cat /proc/scsi/aic7xxx/0 1075 1076 Adaptec AIC7xxx driver version: 5.1.19/3.2.4 1077 Compile Options: 1078 TCQ Enabled By Default : Disabled 1079 AIC7XXX_PROC_STATS : Disabled 1080 AIC7XXX_RESET_DELAY : 5 1081 Adapter Configuration: 1082 SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter 1083 Ultra Wide Controller 1084 PCI MMAPed I/O Base: 0xeb001000 1085 Adapter SEEPROM Config: SEEPROM found and used. 1086 Adaptec SCSI BIOS: Enabled 1087 IRQ: 10 1088 SCBs: Active 0, Max Active 2, 1089 Allocated 15, HW 16, Page 255 1090 Interrupts: 160328 1091 BIOS Control Word: 0x18b6 1092 Adapter Control Word: 0x005b 1093 Extended Translation: Enabled 1094 Disconnect Enable Flags: 0xffff 1095 Ultra Enable Flags: 0x0001 1096 Tag Queue Enable Flags: 0x0000 1097 Ordered Queue Tag Flags: 0x0000 1098 Default Tag Queue Depth: 8 1099 Tagged Queue By Device array for aic7xxx host instance 0: 1100 {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} 1101 Actual queue depth per device for aic7xxx host instance 0: 1102 {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} 1103 Statistics: 1104 (scsi0:0:0:0) 1105 Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 1106 Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) 1107 Total transfers 160151 (74577 reads and 85574 writes) 1108 (scsi0:0:6:0) 1109 Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 1110 Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) 1111 Total transfers 0 (0 reads and 0 writes) 1112 1113 11141.6 Parallel port info in /proc/parport 1115--------------------------------------- 1116 1117The directory /proc/parport contains information about the parallel ports of 1118your system. It has one subdirectory for each port, named after the port 1119number (0,1,2,...). 1120 1121These directories contain the four files shown in Table 1-10. 1122 1123 1124Table 1-10: Files in /proc/parport 1125.............................................................................. 1126 File Content 1127 autoprobe Any IEEE-1284 device ID information that has been acquired. 1128 devices list of the device drivers using that port. A + will appear by the 1129 name of the device currently using the port (it might not appear 1130 against any). 1131 hardware Parallel port's base address, IRQ line and DMA channel. 1132 irq IRQ that parport is using for that port. This is in a separate 1133 file to allow you to alter it by writing a new value in (IRQ 1134 number or none). 1135.............................................................................. 1136 11371.7 TTY info in /proc/tty 1138------------------------- 1139 1140Information about the available and actually used tty's can be found in the 1141directory /proc/tty.You'll find entries for drivers and line disciplines in 1142this directory, as shown in Table 1-11. 1143 1144 1145Table 1-11: Files in /proc/tty 1146.............................................................................. 1147 File Content 1148 drivers list of drivers and their usage 1149 ldiscs registered line disciplines 1150 driver/serial usage statistic and status of single tty lines 1151.............................................................................. 1152 1153To see which tty's are currently in use, you can simply look into the file 1154/proc/tty/drivers: 1155 1156 > cat /proc/tty/drivers 1157 pty_slave /dev/pts 136 0-255 pty:slave 1158 pty_master /dev/ptm 128 0-255 pty:master 1159 pty_slave /dev/ttyp 3 0-255 pty:slave 1160 pty_master /dev/pty 2 0-255 pty:master 1161 serial /dev/cua 5 64-67 serial:callout 1162 serial /dev/ttyS 4 64-67 serial 1163 /dev/tty0 /dev/tty0 4 0 system:vtmaster 1164 /dev/ptmx /dev/ptmx 5 2 system 1165 /dev/console /dev/console 5 1 system:console 1166 /dev/tty /dev/tty 5 0 system:/dev/tty 1167 unknown /dev/tty 4 1-63 console 1168 1169 11701.8 Miscellaneous kernel statistics in /proc/stat 1171------------------------------------------------- 1172 1173Various pieces of information about kernel activity are available in the 1174/proc/stat file. All of the numbers reported in this file are aggregates 1175since the system first booted. For a quick look, simply cat the file: 1176 1177 > cat /proc/stat 1178 cpu 2255 34 2290 22625563 6290 127 456 0 0 1179 cpu0 1132 34 1441 11311718 3675 127 438 0 0 1180 cpu1 1123 0 849 11313845 2614 0 18 0 0 1181 intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...] 1182 ctxt 1990473 1183 btime 1062191376 1184 processes 2915 1185 procs_running 1 1186 procs_blocked 0 1187 softirq 183433 0 21755 12 39 1137 231 21459 2263 1188 1189The very first "cpu" line aggregates the numbers in all of the other "cpuN" 1190lines. These numbers identify the amount of time the CPU has spent performing 1191different kinds of work. Time units are in USER_HZ (typically hundredths of a 1192second). The meanings of the columns are as follows, from left to right: 1193 1194- user: normal processes executing in user mode 1195- nice: niced processes executing in user mode 1196- system: processes executing in kernel mode 1197- idle: twiddling thumbs 1198- iowait: waiting for I/O to complete 1199- irq: servicing interrupts 1200- softirq: servicing softirqs 1201- steal: involuntary wait 1202- guest: running a normal guest 1203- guest_nice: running a niced guest 1204 1205The "intr" line gives counts of interrupts serviced since boot time, for each 1206of the possible system interrupts. The first column is the total of all 1207interrupts serviced; each subsequent column is the total for that particular 1208interrupt. 1209 1210The "ctxt" line gives the total number of context switches across all CPUs. 1211 1212The "btime" line gives the time at which the system booted, in seconds since 1213the Unix epoch. 1214 1215The "processes" line gives the number of processes and threads created, which 1216includes (but is not limited to) those created by calls to the fork() and 1217clone() system calls. 1218 1219The "procs_running" line gives the total number of threads that are 1220running or ready to run (i.e., the total number of runnable threads). 1221 1222The "procs_blocked" line gives the number of processes currently blocked, 1223waiting for I/O to complete. 1224 1225The "softirq" line gives counts of softirqs serviced since boot time, for each 1226of the possible system softirqs. The first column is the total of all 1227softirqs serviced; each subsequent column is the total for that particular 1228softirq. 1229 1230 12311.9 Ext4 file system parameters 1232------------------------------ 1233 1234Information about mounted ext4 file systems can be found in 1235/proc/fs/ext4. Each mounted filesystem will have a directory in 1236/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or 1237/proc/fs/ext4/dm-0). The files in each per-device directory are shown 1238in Table 1-12, below. 1239 1240Table 1-12: Files in /proc/fs/ext4/<devname> 1241.............................................................................. 1242 File Content 1243 mb_groups details of multiblock allocator buddy cache of free blocks 1244.............................................................................. 1245 12462.0 /proc/consoles 1247------------------ 1248Shows registered system console lines. 1249 1250To see which character device lines are currently used for the system console 1251/dev/console, you may simply look into the file /proc/consoles: 1252 1253 > cat /proc/consoles 1254 tty0 -WU (ECp) 4:7 1255 ttyS0 -W- (Ep) 4:64 1256 1257The columns are: 1258 1259 device name of the device 1260 operations R = can do read operations 1261 W = can do write operations 1262 U = can do unblank 1263 flags E = it is enabled 1264 C = it is preferred console 1265 B = it is primary boot console 1266 p = it is used for printk buffer 1267 b = it is not a TTY but a Braille device 1268 a = it is safe to use when cpu is offline 1269 major:minor major and minor number of the device separated by a colon 1270 1271------------------------------------------------------------------------------ 1272Summary 1273------------------------------------------------------------------------------ 1274The /proc file system serves information about the running system. It not only 1275allows access to process data but also allows you to request the kernel status 1276by reading files in the hierarchy. 1277 1278The directory structure of /proc reflects the types of information and makes 1279it easy, if not obvious, where to look for specific data. 1280------------------------------------------------------------------------------ 1281 1282------------------------------------------------------------------------------ 1283CHAPTER 2: MODIFYING SYSTEM PARAMETERS 1284------------------------------------------------------------------------------ 1285 1286------------------------------------------------------------------------------ 1287In This Chapter 1288------------------------------------------------------------------------------ 1289* Modifying kernel parameters by writing into files found in /proc/sys 1290* Exploring the files which modify certain parameters 1291* Review of the /proc/sys file tree 1292------------------------------------------------------------------------------ 1293 1294 1295A very interesting part of /proc is the directory /proc/sys. This is not only 1296a source of information, it also allows you to change parameters within the 1297kernel. Be very careful when attempting this. You can optimize your system, 1298but you can also cause it to crash. Never alter kernel parameters on a 1299production system. Set up a development machine and test to make sure that 1300everything works the way you want it to. You may have no alternative but to 1301reboot the machine once an error has been made. 1302 1303To change a value, simply echo the new value into the file. An example is 1304given below in the section on the file system data. You need to be root to do 1305this. You can create your own boot script to perform this every time your 1306system boots. 1307 1308The files in /proc/sys can be used to fine tune and monitor miscellaneous and 1309general things in the operation of the Linux kernel. Since some of the files 1310can inadvertently disrupt your system, it is advisable to read both 1311documentation and source before actually making adjustments. In any case, be 1312very careful when writing to any of these files. The entries in /proc may 1313change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt 1314review the kernel documentation in the directory /usr/src/linux/Documentation. 1315This chapter is heavily based on the documentation included in the pre 2.2 1316kernels, and became part of it in version 2.2.1 of the Linux kernel. 1317 1318Please see: Documentation/sysctl/ directory for descriptions of these 1319entries. 1320 1321------------------------------------------------------------------------------ 1322Summary 1323------------------------------------------------------------------------------ 1324Certain aspects of kernel behavior can be modified at runtime, without the 1325need to recompile the kernel, or even to reboot the system. The files in the 1326/proc/sys tree can not only be read, but also modified. You can use the echo 1327command to write value into these files, thereby changing the default settings 1328of the kernel. 1329------------------------------------------------------------------------------ 1330 1331------------------------------------------------------------------------------ 1332CHAPTER 3: PER-PROCESS PARAMETERS 1333------------------------------------------------------------------------------ 1334 13353.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score 1336-------------------------------------------------------------------------------- 1337 1338These file can be used to adjust the badness heuristic used to select which 1339process gets killed in out of memory conditions. 1340 1341The badness heuristic assigns a value to each candidate task ranging from 0 1342(never kill) to 1000 (always kill) to determine which process is targeted. The 1343units are roughly a proportion along that range of allowed memory the process 1344may allocate from based on an estimation of its current memory and swap use. 1345For example, if a task is using all allowed memory, its badness score will be 13461000. If it is using half of its allowed memory, its score will be 500. 1347 1348There is an additional factor included in the badness score: root 1349processes are given 3% extra memory over other tasks. 1350 1351The amount of "allowed" memory depends on the context in which the oom killer 1352was called. If it is due to the memory assigned to the allocating task's cpuset 1353being exhausted, the allowed memory represents the set of mems assigned to that 1354cpuset. If it is due to a mempolicy's node(s) being exhausted, the allowed 1355memory represents the set of mempolicy nodes. If it is due to a memory 1356limit (or swap limit) being reached, the allowed memory is that configured 1357limit. Finally, if it is due to the entire system being out of memory, the 1358allowed memory represents all allocatable resources. 1359 1360The value of /proc/<pid>/oom_score_adj is added to the badness score before it 1361is used to determine which task to kill. Acceptable values range from -1000 1362(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows userspace to 1363polarize the preference for oom killing either by always preferring a certain 1364task or completely disabling it. The lowest possible value, -1000, is 1365equivalent to disabling oom killing entirely for that task since it will always 1366report a badness score of 0. 1367 1368Consequently, it is very simple for userspace to define the amount of memory to 1369consider for each task. Setting a /proc/<pid>/oom_score_adj value of +500, for 1370example, is roughly equivalent to allowing the remainder of tasks sharing the 1371same system, cpuset, mempolicy, or memory controller resources to use at least 137250% more memory. A value of -500, on the other hand, would be roughly 1373equivalent to discounting 50% of the task's allowed memory from being considered 1374as scoring against the task. 1375 1376For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also 1377be used to tune the badness score. Its acceptable values range from -16 1378(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17 1379(OOM_DISABLE) to disable oom killing entirely for that task. Its value is 1380scaled linearly with /proc/<pid>/oom_score_adj. 1381 1382Writing to /proc/<pid>/oom_score_adj or /proc/<pid>/oom_adj will change the 1383other with its scaled value. 1384 1385The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last 1386value set by a CAP_SYS_RESOURCE process. To reduce the value any lower 1387requires CAP_SYS_RESOURCE. 1388 1389NOTICE: /proc/<pid>/oom_adj is deprecated and will be removed, please see 1390Documentation/feature-removal-schedule.txt. 1391 1392Caveat: when a parent task is selected, the oom killer will sacrifice any first 1393generation children with separate address spaces instead, if possible. This 1394avoids servers and important system daemons from being killed and loses the 1395minimal amount of work. 1396 1397 13983.2 /proc/<pid>/oom_score - Display current oom-killer score 1399------------------------------------------------------------- 1400 1401This file can be used to check the current score used by the oom-killer is for 1402any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which 1403process should be killed in an out-of-memory situation. 1404 1405 14063.3 /proc/<pid>/io - Display the IO accounting fields 1407------------------------------------------------------- 1408 1409This file contains IO statistics for each running process 1410 1411Example 1412------- 1413 1414test:/tmp # dd if=/dev/zero of=/tmp/test.dat & 1415[1] 3828 1416 1417test:/tmp # cat /proc/3828/io 1418rchar: 323934931 1419wchar: 323929600 1420syscr: 632687 1421syscw: 632675 1422read_bytes: 0 1423write_bytes: 323932160 1424cancelled_write_bytes: 0 1425 1426 1427Description 1428----------- 1429 1430rchar 1431----- 1432 1433I/O counter: chars read 1434The number of bytes which this task has caused to be read from storage. This 1435is simply the sum of bytes which this process passed to read() and pread(). 1436It includes things like tty IO and it is unaffected by whether or not actual 1437physical disk IO was required (the read might have been satisfied from 1438pagecache) 1439 1440 1441wchar 1442----- 1443 1444I/O counter: chars written 1445The number of bytes which this task has caused, or shall cause to be written 1446to disk. Similar caveats apply here as with rchar. 1447 1448 1449syscr 1450----- 1451 1452I/O counter: read syscalls 1453Attempt to count the number of read I/O operations, i.e. syscalls like read() 1454and pread(). 1455 1456 1457syscw 1458----- 1459 1460I/O counter: write syscalls 1461Attempt to count the number of write I/O operations, i.e. syscalls like 1462write() and pwrite(). 1463 1464 1465read_bytes 1466---------- 1467 1468I/O counter: bytes read 1469Attempt to count the number of bytes which this process really did cause to 1470be fetched from the storage layer. Done at the submit_bio() level, so it is 1471accurate for block-backed filesystems. <please add status regarding NFS and 1472CIFS at a later time> 1473 1474 1475write_bytes 1476----------- 1477 1478I/O counter: bytes written 1479Attempt to count the number of bytes which this process caused to be sent to 1480the storage layer. This is done at page-dirtying time. 1481 1482 1483cancelled_write_bytes 1484--------------------- 1485 1486The big inaccuracy here is truncate. If a process writes 1MB to a file and 1487then deletes the file, it will in fact perform no writeout. But it will have 1488been accounted as having caused 1MB of write. 1489In other words: The number of bytes which this process caused to not happen, 1490by truncating pagecache. A task can cause "negative" IO too. If this task 1491truncates some dirty pagecache, some IO which another task has been accounted 1492for (in its write_bytes) will not be happening. We _could_ just subtract that 1493from the truncating task's write_bytes, but there is information loss in doing 1494that. 1495 1496 1497Note 1498---- 1499 1500At its current implementation state, this is a bit racy on 32-bit machines: if 1501process A reads process B's /proc/pid/io while process B is updating one of 1502those 64-bit counters, process A could see an intermediate result. 1503 1504 1505More information about this can be found within the taskstats documentation in 1506Documentation/accounting. 1507 15083.4 /proc/<pid>/coredump_filter - Core dump filtering settings 1509--------------------------------------------------------------- 1510When a process is dumped, all anonymous memory is written to a core file as 1511long as the size of the core file isn't limited. But sometimes we don't want 1512to dump some memory segments, for example, huge shared memory. Conversely, 1513sometimes we want to save file-backed memory segments into a core file, not 1514only the individual files. 1515 1516/proc/<pid>/coredump_filter allows you to customize which memory segments 1517will be dumped when the <pid> process is dumped. coredump_filter is a bitmask 1518of memory types. If a bit of the bitmask is set, memory segments of the 1519corresponding memory type are dumped, otherwise they are not dumped. 1520 1521The following 7 memory types are supported: 1522 - (bit 0) anonymous private memory 1523 - (bit 1) anonymous shared memory 1524 - (bit 2) file-backed private memory 1525 - (bit 3) file-backed shared memory 1526 - (bit 4) ELF header pages in file-backed private memory areas (it is 1527 effective only if the bit 2 is cleared) 1528 - (bit 5) hugetlb private memory 1529 - (bit 6) hugetlb shared memory 1530 1531 Note that MMIO pages such as frame buffer are never dumped and vDSO pages 1532 are always dumped regardless of the bitmask status. 1533 1534 Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only 1535 effected by bit 5-6. 1536 1537Default value of coredump_filter is 0x23; this means all anonymous memory 1538segments and hugetlb private memory are dumped. 1539 1540If you don't want to dump all shared memory segments attached to pid 1234, 1541write 0x21 to the process's proc file. 1542 1543 $ echo 0x21 > /proc/1234/coredump_filter 1544 1545When a new process is created, the process inherits the bitmask status from its 1546parent. It is useful to set up coredump_filter before the program runs. 1547For example: 1548 1549 $ echo 0x7 > /proc/self/coredump_filter 1550 $ ./some_program 1551 15523.5 /proc/<pid>/mountinfo - Information about mounts 1553-------------------------------------------------------- 1554 1555This file contains lines of the form: 1556 155736 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue 1558(1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) 1559 1560(1) mount ID: unique identifier of the mount (may be reused after umount) 1561(2) parent ID: ID of parent (or of self for the top of the mount tree) 1562(3) major:minor: value of st_dev for files on filesystem 1563(4) root: root of the mount within the filesystem 1564(5) mount point: mount point relative to the process's root 1565(6) mount options: per mount options 1566(7) optional fields: zero or more fields of the form "tag[:value]" 1567(8) separator: marks the end of the optional fields 1568(9) filesystem type: name of filesystem of the form "type[.subtype]" 1569(10) mount source: filesystem specific information or "none" 1570(11) super options: per super block options 1571 1572Parsers should ignore all unrecognised optional fields. Currently the 1573possible optional fields are: 1574 1575shared:X mount is shared in peer group X 1576master:X mount is slave to peer group X 1577propagate_from:X mount is slave and receives propagation from peer group X (*) 1578unbindable mount is unbindable 1579 1580(*) X is the closest dominant peer group under the process's root. If 1581X is the immediate master of the mount, or if there's no dominant peer 1582group under the same root, then only the "master:X" field is present 1583and not the "propagate_from:X" field. 1584 1585For more information on mount propagation see: 1586 1587 Documentation/filesystems/sharedsubtree.txt 1588 1589 15903.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 1591-------------------------------------------------------- 1592These files provide a method to access a tasks comm value. It also allows for 1593a task to set its own or one of its thread siblings comm value. The comm value 1594is limited in size compared to the cmdline value, so writing anything longer 1595then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated 1596comm value. 1597 1598 1599------------------------------------------------------------------------------ 1600Configuring procfs 1601------------------------------------------------------------------------------ 1602 16034.1 Mount options 1604--------------------- 1605 1606The following mount options are supported: 1607 1608 hidepid= Set /proc/<pid>/ access mode. 1609 gid= Set the group authorized to learn processes information. 1610 1611hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories 1612(default). 1613 1614hidepid=1 means users may not access any /proc/<pid>/ directories but their 1615own. Sensitive files like cmdline, sched*, status are now protected against 1616other users. This makes it impossible to learn whether any user runs 1617specific program (given the program doesn't reveal itself by its behaviour). 1618As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users, 1619poorly written programs passing sensitive information via program arguments are 1620now protected against local eavesdroppers. 1621 1622hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other 1623users. It doesn't mean that it hides a fact whether a process with a specific 1624pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"), 1625but it hides process' uid and gid, which may be learned by stat()'ing 1626/proc/<pid>/ otherwise. It greatly complicates an intruder's task of gathering 1627information about running processes, whether some daemon runs with elevated 1628privileges, whether other user runs some sensitive program, whether other users 1629run any program at all, etc. 1630 1631gid= defines a group authorized to learn processes information otherwise 1632prohibited by hidepid=. If you use some daemon like identd which needs to learn 1633information about processes information, just add identd to this group. 1634