1Demonstrations of argdist. 2 3 4argdist probes functions you specify and collects parameter values into a 5histogram or a frequency count. This can be used to understand the distribution 6of values a certain parameter takes, filter and print interesting parameters 7without attaching a debugger, and obtain general execution statistics on 8various functions. 9 10For example, suppose you want to find what allocation sizes are common in 11your application: 12 13# ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size' 14[01:42:29] 15p:c:malloc(size_t size):size_t:size 16 COUNT EVENT 17[01:42:30] 18p:c:malloc(size_t size):size_t:size 19 COUNT EVENT 20[01:42:31] 21p:c:malloc(size_t size):size_t:size 22 COUNT EVENT 23 1 size = 16 24[01:42:32] 25p:c:malloc(size_t size):size_t:size 26 COUNT EVENT 27 2 size = 16 28[01:42:33] 29p:c:malloc(size_t size):size_t:size 30 COUNT EVENT 31 3 size = 16 32[01:42:34] 33p:c:malloc(size_t size):size_t:size 34 COUNT EVENT 35 4 size = 16 36^C 37 38It seems that the application is allocating blocks of size 16. The COUNT 39column contains the number of occurrences of a particular event, and the 40EVENT column describes the event. In this case, the "size" parameter was 41probed and its value was 16, repeatedly. 42 43Now, suppose you wanted a histogram of buffer sizes passed to the write() 44function across the system: 45 46# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len' 47[01:45:22] 48p:c:write(int fd, void *buf, size_t len):size_t:len 49 len : count distribution 50 0 -> 1 : 0 | | 51 2 -> 3 : 2 |************* | 52 4 -> 7 : 0 | | 53 8 -> 15 : 2 |************* | 54 16 -> 31 : 0 | | 55 32 -> 63 : 6 |****************************************| 56[01:45:23] 57p:c:write(int fd, void *buf, size_t len):size_t:len 58 len : count distribution 59 0 -> 1 : 0 | | 60 2 -> 3 : 11 |*************** | 61 4 -> 7 : 0 | | 62 8 -> 15 : 4 |***** | 63 16 -> 31 : 0 | | 64 32 -> 63 : 28 |****************************************| 65 64 -> 127 : 12 |***************** | 66[01:45:24] 67p:c:write(int fd, void *buf, size_t len):size_t:len 68 len : count distribution 69 0 -> 1 : 0 | | 70 2 -> 3 : 21 |**************** | 71 4 -> 7 : 0 | | 72 8 -> 15 : 6 |**** | 73 16 -> 31 : 0 | | 74 32 -> 63 : 52 |****************************************| 75 64 -> 127 : 26 |******************** | 76^C 77 78It seems that most writes fall into three buckets: very small writes of 2-3 79bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes. 80 81But these are writes across the board -- what if you wanted to focus on writes 82to STDOUT? 83 84# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1' 85[01:47:17] 86p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 87 len : count distribution 88 0 -> 1 : 0 | | 89 2 -> 3 : 0 | | 90 4 -> 7 : 0 | | 91 8 -> 15 : 1 |****************************************| 92 16 -> 31 : 0 | | 93 32 -> 63 : 1 |****************************************| 94[01:47:18] 95p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 96 len : count distribution 97 0 -> 1 : 0 | | 98 2 -> 3 : 0 | | 99 4 -> 7 : 0 | | 100 8 -> 15 : 2 |************* | 101 16 -> 31 : 0 | | 102 32 -> 63 : 3 |******************** | 103 64 -> 127 : 6 |****************************************| 104[01:47:19] 105p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 106 len : count distribution 107 0 -> 1 : 0 | | 108 2 -> 3 : 0 | | 109 4 -> 7 : 0 | | 110 8 -> 15 : 3 |********* | 111 16 -> 31 : 0 | | 112 32 -> 63 : 5 |*************** | 113 64 -> 127 : 13 |****************************************| 114^C 115 116The "fd==1" part is a filter that is applied to every invocation of write(). 117Only if the filter condition is true, the value is recorded. 118 119You can also use argdist to trace kernel functions. For example, suppose you 120wanted a histogram of kernel allocation (kmalloc) sizes across the system, 121printed twice with 3 second intervals: 122 123# ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size' 124[01:50:00] 125p::__kmalloc(size_t size):size_t:size 126 size : count distribution 127 0 -> 1 : 0 | | 128 2 -> 3 : 0 | | 129 4 -> 7 : 0 | | 130 8 -> 15 : 6 |****************************************| 131[01:50:03] 132p::__kmalloc(size_t size):size_t:size 133 size : count distribution 134 0 -> 1 : 0 | | 135 2 -> 3 : 0 | | 136 4 -> 7 : 0 | | 137 8 -> 15 : 22 |****************************************| 138 16 -> 31 : 0 | | 139 32 -> 63 : 0 | | 140 64 -> 127 : 5 |********* | 141 128 -> 255 : 2 |*** | 142 143Occasionally, numeric information isn't enough and you want to capture strings. 144What are the strings printed by puts() across the system? 145 146# ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str' 147[01:53:54] 148p:c:puts(char *str):char*:str 149 COUNT EVENT 150 2 str = Press ENTER to start. 151 152It looks like the message "Press ENTER to start." was printed twice during the 15310 seconds we were tracing. 154 155What about reads? You could trace gets() across the system and print the 156strings input by the user (note how "r" is used instead of "p" to attach a 157probe to the function's return): 158 159# ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0' 160[02:12:23] 161r:c:gets():char*:$retval:$retval!=0 162 COUNT EVENT 163 1 (char*)$retval = hi there 164 3 (char*)$retval = sasha 165 8 (char*)$retval = hello 166 167Similarly, we could get a histogram of the error codes returned by read(): 168 169# ./argdist -i 10 -c 1 -H 'r:c:read()' 170[02:15:36] 171r:c:read() 172 retval : count distribution 173 0 -> 1 : 29 |****************************************| 174 2 -> 3 : 11 |*************** | 175 4 -> 7 : 0 | | 176 8 -> 15 : 3 |**** | 177 16 -> 31 : 2 |** | 178 32 -> 63 : 22 |****************************** | 179 64 -> 127 : 5 |****** | 180 128 -> 255 : 0 | | 181 256 -> 511 : 1 |* | 182 512 -> 1023 : 1 |* | 183 1024 -> 2047 : 0 | | 184 2048 -> 4095 : 2 |** | 185 186In return probes, you can also trace the latency of the function (unless it is 187recursive) and the parameters it had on entry. For example, we can identify 188which processes are performing slow synchronous filesystem reads -- say, 189longer than 0.1ms (100,000ns): 190 191# ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000' 192[01:08:48] 193r::__vfs_read():u32:$PID:$latency > 100000 194 COUNT EVENT 195 1 $PID = 10457 196 21 $PID = 2780 197[01:08:49] 198r::__vfs_read():u32:$PID:$latency > 100000 199 COUNT EVENT 200 1 $PID = 10457 201 21 $PID = 2780 202^C 203 204It looks like process 2780 performed 21 slow reads. 205 206Occasionally, entry parameter values are also interesting. For example, you 207might be curious how long it takes malloc() to allocate memory -- nanoseconds 208per byte allocated. Let's go: 209 210# ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10 211[01:11:13] 212 ns per byte : count distribution 213 0 -> 1 : 0 | | 214 2 -> 3 : 4 |***************** | 215 4 -> 7 : 3 |************* | 216 8 -> 15 : 2 |******** | 217 16 -> 31 : 1 |**** | 218 32 -> 63 : 0 | | 219 64 -> 127 : 7 |******************************* | 220 128 -> 255 : 1 |**** | 221 256 -> 511 : 0 | | 222 512 -> 1023 : 1 |**** | 223 1024 -> 2047 : 1 |**** | 224 2048 -> 4095 : 9 |****************************************| 225 4096 -> 8191 : 1 |**** | 226 227It looks like a tri-modal distribution. Some allocations are extremely cheap, 228and take 2-15 nanoseconds per byte. Other allocations are slower, and take 22964-127 nanoseconds per byte. And some allocations are slower still, and take 230multiple microseconds per byte. 231 232You could also group results by more than one field. For example, __kmalloc 233takes an additional flags parameter that describes how to allocate memory: 234 235# ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size' 236[03:42:29] 237p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 238 COUNT EVENT 239 1 flags = 16, size = 152 240 2 flags = 131280, size = 8 241 7 flags = 131280, size = 16 242[03:42:30] 243p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 244 COUNT EVENT 245 1 flags = 16, size = 152 246 6 flags = 131280, size = 8 247 19 flags = 131280, size = 16 248[03:42:31] 249p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 250 COUNT EVENT 251 2 flags = 16, size = 152 252 10 flags = 131280, size = 8 253 31 flags = 131280, size = 16 254[03:42:32] 255p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 256 COUNT EVENT 257 2 flags = 16, size = 152 258 14 flags = 131280, size = 8 259 43 flags = 131280, size = 16 260^C 261 262The flags value must be expanded by hand, but it's still helpful to eliminate 263certain kinds of allocations or visually group them together. 264 265argdist also has basic support for kernel tracepoints. It is sometimes more 266convenient to use tracepoints because they are documented and don't vary a lot 267between kernel versions. For example, let's trace the net:net_dev_start_xmit 268tracepoint and print out the protocol field from the tracepoint structure: 269 270# argdist -C 't:net:net_dev_start_xmit():u16:args->protocol' 271[13:01:49] 272t:net:net_dev_start_xmit():u16:args->protocol 273 COUNT EVENT 274 8 args->protocol = 2048 275^C 276 277Note that to discover the format of the net:net_dev_start_xmit tracepoint, you 278use the tplist tool (tplist -v net:net_dev_start_xmit). 279 280 281Occasionally, it is useful to filter certain expressions by string. This is not 282trivially supported by BPF, but argdist provides a STRCMP helper you can use in 283filter expressions. For example, to get a histogram of latencies opening a 284specific file, run this: 285 286# argdist -c -H 'r:c:open(char *file):u64:$latency/1000:STRCMP("test.txt",$entry(file))' 287[02:16:38] 288[02:16:39] 289[02:16:40] 290 $latency/1000 : count distribution 291 0 -> 1 : 0 | | 292 2 -> 3 : 0 | | 293 4 -> 7 : 0 | | 294 8 -> 15 : 0 | | 295 16 -> 31 : 2 |****************************************| 296[02:16:41] 297 $latency/1000 : count distribution 298 0 -> 1 : 0 | | 299 2 -> 3 : 0 | | 300 4 -> 7 : 0 | | 301 8 -> 15 : 1 |********** | 302 16 -> 31 : 4 |****************************************| 303[02:16:42] 304 $latency/1000 : count distribution 305 0 -> 1 : 0 | | 306 2 -> 3 : 0 | | 307 4 -> 7 : 0 | | 308 8 -> 15 : 1 |******** | 309 16 -> 31 : 5 |****************************************| 310[02:16:43] 311 $latency/1000 : count distribution 312 0 -> 1 : 0 | | 313 2 -> 3 : 0 | | 314 4 -> 7 : 0 | | 315 8 -> 15 : 1 |******** | 316 16 -> 31 : 5 |****************************************| 317 318 319Here's a final example that finds how many write() system calls are performed 320by each process on the system: 321 322# argdist -c -C 'p:c:write():int:$PID;write per process' -n 2 323[06:47:18] 324write by process 325 COUNT EVENT 326 3 $PID = 8889 327 7 $PID = 7615 328 7 $PID = 2480 329[06:47:19] 330write by process 331 COUNT EVENT 332 9 $PID = 8889 333 23 $PID = 7615 334 23 $PID = 2480 335 336 337USAGE message: 338 339# argdist -h 340usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] 341 [-c] [-T TOP] [-H specifier] [-C[specifier] [-I header] 342 343Trace a function and display a summary of its parameter values. 344 345optional arguments: 346 -h, --help show this help message and exit 347 -p PID, --pid PID id of the process to trace (optional) 348 -z STRING_SIZE, --string-size STRING_SIZE 349 maximum string size to read from char* arguments 350 -i INTERVAL, --interval INTERVAL 351 output interval, in seconds (default 1 second) 352 -d DURATION, --duration DURATION 353 total duration of trace, in seconds 354 -n COUNT, --number COUNT 355 number of outputs 356 -v, --verbose print resulting BPF program code before executing 357 -c, --cumulative do not clear histograms and freq counts at each interval 358 -T TOP, --top TOP number of top results to show (not applicable to 359 histograms) 360 -H specifier, --histogram specifier 361 probe specifier to capture histogram of (see examples 362 below) 363 -C specifier, --count specifier 364 probe specifier to capture count of (see examples 365 below) 366 -I header, --include header 367 additional header files to include in the BPF program 368 as either full path, or relative to current working directory, 369 or relative to default kernel header search path 370 371Probe specifier syntax: 372 {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label] 373Where: 374 p,r,t,u -- probe at function entry, function exit, kernel tracepoint, 375 or USDT probe 376 in exit probes: can use $retval, $entry(param), $latency 377 library -- the library that contains the function 378 (leave empty for kernel functions) 379 category -- the category of the kernel tracepoint (e.g. net, sched) 380 signature -- the function's parameters, as in the C header 381 type -- the type of the expression to collect (supports multiple) 382 expr -- the expression to collect (supports multiple) 383 filter -- the filter that is applied to collected values 384 label -- the label for this probe in the resulting output 385 386EXAMPLES: 387 388argdist -H 'p::__kmalloc(u64 size):u64:size' 389 Print a histogram of allocation sizes passed to kmalloc 390 391argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16' 392 Print a frequency count of how many times process 1005 called malloc 393 with an allocation size of 16 bytes 394 395argdist -C 'r:c:gets():char*:$retval#snooped strings' 396 Snoop on all strings returned by gets() 397 398argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte' 399 Print a histogram of nanoseconds per byte from kmalloc allocations 400 401argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC' 402 Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC 403 404argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5 405 Print frequency counts of how many times writes were issued to a 406 particular file descriptor number, in process 1005, but only show 407 the top 5 busiest fds 408 409argdist -p 1005 -H 'r:c:read()' 410 Print a histogram of error codes returned by read() in process 1005 411 412argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000' 413 Print frequency of reads by process where the latency was >0.1ms 414 415argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000' 416 Print a histogram of read sizes that were longer than 1ms 417 418argdist -H \ 419 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1' 420 Print a histogram of buffer sizes passed to write() across all 421 processes, where the file descriptor was 1 (STDOUT) 422 423argdist -C 'p:c:fork()#fork calls' 424 Count fork() calls in libc across all processes 425 Can also use funccount.py, which is easier and more flexible 426 427argdist -H 't:block:block_rq_complete():u32:args->nr_sector' 428 Print histogram of number of sectors in completing block I/O requests 429 430argdist -C 't:irq:irq_handler_entry():int:args->irq' 431 Aggregate interrupts by interrupt request (IRQ) 432 433argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337 434 Print frequency of function addresses used as a pthread start function, 435 relying on the USDT pthread_start probe in process 1337 436 437argdist -H 'p:c:sleep(u32 seconds):u32:seconds' \ 438 -H 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec' 439 Print histograms of sleep() and nanosleep() parameter values 440 441argdist -p 2780 -z 120 \ 442 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1' 443 Spy on writes to STDOUT performed by process 2780, up to a string size 444 of 120 characters 445 446argdist -I 'kernel/sched/sched.h' \ 447 -C 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq):s64:cfs_rq->runtime_remaining' 448 Trace on the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined 449 in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel 450 package. So this command needs to run at the kernel source tree root directory 451 so that the added header file can be found by the compiler. 452