1Demonstrations of nfsdist, the Linux eBPF/bcc version. 2 3nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their 4latency as a power-of-2 histogram. For example: 5 6 7./nfsdist.py 8 9Tracing NFS operation latency... Hit Ctrl-C to end. 10 11operation = read 12 usecs : count distribution 13 0 -> 1 : 4 | | 14 2 -> 3 : 0 | | 15 4 -> 7 : 0 | | 16 8 -> 15 : 7107 |************** | 17 16 -> 31 : 19864 |****************************************| 18 32 -> 63 : 1494 |*** | 19 64 -> 127 : 491 | | 20 128 -> 255 : 1810 |*** | 21 256 -> 511 : 6356 |************ | 22 512 -> 1023 : 4860 |********* | 23 1024 -> 2047 : 3070 |****** | 24 2048 -> 4095 : 1853 |*** | 25 4096 -> 8191 : 921 |* | 26 8192 -> 16383 : 122 | | 27 16384 -> 32767 : 15 | | 28 32768 -> 65535 : 5 | | 29 65536 -> 131071 : 2 | | 30 131072 -> 262143 : 1 | | 31 32operation = write 33 usecs : count distribution 34 0 -> 1 : 0 | | 35 2 -> 3 : 0 | | 36 4 -> 7 : 0 | | 37 8 -> 15 : 1 | | 38 16 -> 31 : 0 | | 39 32 -> 63 : 9 | | 40 64 -> 127 : 19491 |****************************************| 41 128 -> 255 : 3064 |****** | 42 256 -> 511 : 940 |* | 43 512 -> 1023 : 365 | | 44 1024 -> 2047 : 312 | | 45 2048 -> 4095 : 119 | | 46 4096 -> 8191 : 31 | | 47 8192 -> 16383 : 84 | | 48 16384 -> 32767 : 31 | | 49 32768 -> 65535 : 5 | | 50 65536 -> 131071 : 3 | | 51 131072 -> 262143 : 0 | | 52 262144 -> 524287 : 1 | | 53 54operation = getattr 55 usecs : count distribution 56 0 -> 1 : 27 |****************************************| 57 2 -> 3 : 2 |** | 58 4 -> 7 : 3 |**** | 59 8 -> 15 : 0 | | 60 16 -> 31 : 0 | | 61 32 -> 63 : 0 | | 62 64 -> 127 : 0 | | 63 128 -> 255 : 0 | | 64 256 -> 511 : 2 |** | 65 512 -> 1023 : 2 |** | 66 67operation = open 68 usecs : count distribution 69 0 -> 1 : 0 | | 70 2 -> 3 : 0 | | 71 4 -> 7 : 0 | | 72 8 -> 15 : 0 | | 73 16 -> 31 : 0 | | 74 32 -> 63 : 0 | | 75 64 -> 127 : 0 | | 76 128 -> 255 : 0 | | 77 256 -> 511 : 2 |****************************************| 78 79 80In this example you can see that the read traffic is rather bi-modal, with about 8126K reads falling within 8 - 30 usecs and about 18K reads spread between 128 - 828191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket. 83The faster read traffic is probably coming from a filesystem cache and the slower 84traffic from disk. The reason why the writes are so consistently fast is because 85this example test was run on a couple of VM's and I believe the hypervisor was 86caching all the write traffic to memory. 87 88This "latency" is measured from when the operation was issued from the VFS 89interface to the file system, to when it completed. This spans everything: 90RPC latency, network latency, file system CPU cycles, file system locks, run 91queue latency, etc. This is a better measure of the latency suffered by 92applications reading from a NFS share and can better expose problems 93experienced by NFS clients. 94 95Note that this only traces the common NFS operations (read, write, open and 96getattr). I chose to include getattr as a significant percentage of NFS 97traffic end up being getattr calls and are a good indicator of problems 98with an NFS server. 99 100An optional interval and a count can be provided, as well as -m to show the 101distributions in milliseconds. For example: 102 103./nfsdist -m 1 5 104Tracing NFS operation latency... Hit Ctrl-C to end. 105 10611:02:39: 107 108operation = write 109 msecs : count distribution 110 0 -> 1 : 1 | | 111 2 -> 3 : 24 |******** | 112 4 -> 7 : 114 |****************************************| 113 8 -> 15 : 9 |*** | 114 16 -> 31 : 1 | | 115 32 -> 63 : 1 | | 116 11711:02:40: 118 119operation = write 120 msecs : count distribution 121 0 -> 1 : 0 | | 122 2 -> 3 : 11 |*** | 123 4 -> 7 : 111 |****************************************| 124 8 -> 15 : 13 |**** | 125 16 -> 31 : 1 | | 126 12711:02:41: 128 129operation = write 130 msecs : count distribution 131 0 -> 1 : 0 | | 132 2 -> 3 : 21 |****** | 133 4 -> 7 : 137 |****************************************| 134 8 -> 15 : 3 | | 135 136This shows a write workload, with writes hovering primarily in the 4-7ms range. 137 138USAGE message: 139 140 141./nfsdist -h 142usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count] 143 144Summarize NFS operation latency 145 146positional arguments: 147 interval output interval, in seconds 148 count number of outputs 149 150optional arguments: 151 -h, --help show this help message and exit 152 -T, --notimestamp don't include timestamp on interval output 153 -m, --milliseconds output in milliseconds 154 -p PID, --pid PID trace this PID only 155 156examples: 157 ./nfsdist # show operation latency as a histogram 158 ./nfsdist -p 181 # trace PID 181 only 159 ./nfsdist 1 10 # print 1 second summaries, 10 times 160 ./nfsdist -m 5 # 5s summaries, milliseconds 161