1Demonstrations of nfsslower, the Linux eBPF/bcc version. 2 3nfsslower show NFS reads, writes, opens and getattrs, slower than a 4threshold. For example: 5 6./nfsslower.py 7Tracing NFS operations that are slower than 10 ms 8TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 911:25:16 dd 21295 W 1048576 15360 14.84 1.test 1011:25:16 dd 21295 W 1048576 16384 12.73 1.test 1111:25:16 dd 21295 W 1048576 17408 24.27 1.test 1211:25:16 dd 21295 W 1048576 18432 22.93 1.test 1311:25:16 dd 21295 W 1048576 19456 14.65 1.test 1411:25:16 dd 21295 W 1048576 20480 12.58 1.test 1511:25:16 dd 21297 W 1048576 6144 10.50 1.test.w 1611:25:16 dd 21297 W 1048576 7168 16.65 1.test.w 1711:25:16 dd 21297 W 1048576 8192 13.01 1.test.w 1811:25:16 dd 21297 W 1048576 9216 14.06 1.test.w 19 20This shows NFS writes from dd each 1MB in size to 2 different files. The 21writes all had latency higher than 10ms. 22 23This "latency" is measured from when the operation was issued from the VFS 24interface to the file system, to when it completed. This spans everything: 25RPC latency, network latency, file system CPU cycles, file system locks, run 26queue latency, etc. This is a better measure of the latency suffered by 27applications reading from a NFS share and can better expose problems 28experienced by NFS clients. 29 30Note that this only traces the common NFS operations (read,write,open and 31getattr). I chose to include getattr as a significant percentage of NFS 32traffic end up being getattr calls and are a good indicator of problems 33with an NFS server. 34 35The threshold can be provided as an argument. E.g. I/O slower than 1 ms: 36 37./nfsslower.py 1 38Tracing NFS operations that are slower than 1 ms 39TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 4011:40:16 cp 21583 R 131072 0 4.35 1.test 4111:40:16 cp 21583 R 131072 256 1.87 1.test 4211:40:16 cp 21583 R 131072 384 2.99 1.test 4311:40:16 cp 21583 R 131072 512 4.19 1.test 4411:40:16 cp 21583 R 131072 640 4.25 1.test 4511:40:16 cp 21583 R 131072 768 4.65 1.test 4611:40:16 cp 21583 R 131072 1280 1.08 1.test 4711:40:16 cp 21583 R 131072 1408 3.29 1.test 4811:40:16 cp 21583 R 131072 1792 3.12 1.test 4911:40:16 cp 21583 R 131072 3712 3.55 1.test 5011:40:16 cp 21583 R 131072 3840 1.12 1.test 5111:40:16 cp 21583 R 131072 4096 3.23 1.test 5211:40:16 cp 21583 R 131072 4224 2.73 1.test 5311:40:16 cp 21583 R 131072 4352 2.73 1.test 5411:40:16 cp 21583 R 131072 4480 6.09 1.test 5511:40:16 cp 21583 R 131072 5120 4.40 1.test 56[...] 57 58This shows all NFS_READS that were more than 1ms. Depending on your 59latency to your fileserver, you might need to tweak this value to 60remove 61 62A threshold of 0 will trace all operations. Warning: the output will be 63verbose, as it will include all file system cache hits. 64 65./nfsslower.py 0 66Tracing NFS operations 6711:56:50 dd 21852 W 1048576 0 0.42 1.test 6811:56:50 dd 21852 W 1048576 1024 0.46 1.test 6911:56:50 dd 21852 W 1048576 2048 0.36 1.test 7011:56:50 cp 21854 G 0 0 0.35 1.test 7111:56:50 cp 21854 O 0 0 0.33 1.test 7211:56:50 cp 21854 G 0 0 0.00 1.test 7311:56:50 cp 21854 R 131072 0 0.07 1.test 7411:56:50 cp 21854 R 131072 128 0.02 1.test 7511:56:50 cp 21854 R 131072 256 0.02 1.test 7611:56:50 cp 21854 R 131072 384 0.02 1.test 7711:56:50 cp 21854 R 131072 512 0.02 1.test 7811:56:50 cp 21854 R 131072 640 0.02 1.test 7911:56:50 cp 21854 R 131072 768 0.02 1.test 8011:56:50 cp 21854 R 131072 896 0.02 1.test 8111:56:50 cp 21854 R 131072 1024 0.02 1.test 8211:56:50 cp 21854 R 131072 1152 0.02 1.test 8311:56:50 cp 21854 R 131072 1280 0.02 1.test 8411:56:50 cp 21854 R 131072 1408 0.02 1.test 8511:56:50 cp 21854 R 131072 1536 0.02 1.test 8611:56:50 cp 21854 R 131072 1664 0.02 1.test 8711:56:50 cp 21854 R 131072 1792 0.02 1.test 8811:56:50 cp 21854 R 131072 1920 0.02 1.test 8911:56:50 cp 21854 R 131072 2048 0.02 1.test 9011:56:50 cp 21854 R 131072 2176 0.04 1.test 9111:56:50 cp 21854 R 131072 2304 0.02 1.test 9211:56:50 cp 21854 R 131072 2432 0.03 1.test 9311:56:50 cp 21854 R 131072 2560 0.03 1.test 9411:56:50 cp 21854 R 131072 2688 0.02 1.test 9511:56:50 cp 21854 R 131072 2816 0.03 1.test 9611:56:50 cp 21854 R 131072 2944 0.02 1.test 9711:56:50 cp 21854 R 0 3072 0.00 1.test 9811:56:50 ls 21855 G 0 0 0.00 1.test 9911:56:50 ls 21856 G 0 0 0.36 music 10011:56:50 ls 21856 G 0 0 0.00 music 10111:56:50 ls 21856 G 0 0 0.00 test 10211:56:50 ls 21856 G 0 0 0.00 ff 10311:56:50 ls 21856 G 0 0 0.00 34.log 10411:56:50 ls 21856 G 0 0 0.00 vmlinuz-linux 10511:56:50 ls 21856 G 0 0 0.00 2.test 10611:56:50 ls 21856 G 0 0 0.00 rt.log 10711:56:50 ls 21856 G 0 0 0.00 1.lod 10811:56:50 ls 21856 G 0 0 0.00 COPYRIGHT.txt 10911:56:50 ls 21856 G 0 0 0.00 gg 11011:56:50 ls 21856 G 0 0 0.00 qw.log 11111:56:50 ls 21856 G 0 0 0.00 README.md 11211:56:50 ls 21856 G 0 0 0.00 1.log 113 114The output now includes open operations ("O"), and reads ("R") wand getattrs ("G"). 115A cp operation 116 117 118A -j option will print just the fields (parsable output, csv): 119 120./nfsslower.py -j 0 121ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE 12287054476520,dd,22754,W,1048576,0,425,1.test 12387054482916,dd,22754,W,1048576,1048576,320,1.test 12487054488179,dd,22754,W,1048576,2097152,389,1.test 12587054511340,cp,22756,G,0,0,371,1.test 12687054511685,cp,22756,O,0,0,306,1.test 12787054511700,cp,22756,G,0,0,2,1.test 12887054512325,cp,22756,R,131072,0,56,1.test 12987054512432,cp,22756,R,131072,131072,22,1.test 13087054512520,cp,22756,R,131072,262144,32,1.test 13187054512600,cp,22756,R,131072,393216,21,1.test 13287054512678,cp,22756,R,131072,524288,21,1.test 13387054512803,cp,22756,R,131072,655360,56,1.test 134 135This may be useful for visualizing with another tool, for example, for 136producing a scatter plot of ENDTIME vs LATENCY, to look for time-based 137patterns. 138 139USAGE message: 140 141usage: nfsslower.py [-h] [-j] [-p PID] [min_ms] 142 143Trace READ, WRITE, OPEN and GETATTR NFS calls slower than a threshold,supports NFSv{3,4} 144 145positional arguments: 146 min_ms Minimum IO duration to trace in ms (default=10ms) 147 148optional arguments: 149 -h, --help show this help message and exit 150 -j, --csv just print fields: comma-separated values 151 -p PID, --pid PID Trace this pid only 152 153 ./nfsslower # trace operations slower than 10ms 154 ./nfsslower 1 # trace operations slower than 1ms 155 ./nfsslower -j 1 # ... 1 ms, parsable output (csv) 156 ./nfsslower 0 # trace all nfs operations 157 ./nfsslower -p 121 # trace pid 121 only 158 159