• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Demonstrations of argdist.
2
3
4argdist probes functions you specify and collects parameter values into a
5histogram or a frequency count. This can be used to understand the distribution
6of values a certain parameter takes, filter and print interesting parameters
7without attaching a debugger, and obtain general execution statistics on
8various functions.
9
10For example, suppose you want to find what allocation sizes are common in
11your application:
12
13# ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size'
14[01:42:29]
15p:c:malloc(size_t size):size_t:size
16        COUNT      EVENT
17[01:42:30]
18p:c:malloc(size_t size):size_t:size
19        COUNT      EVENT
20[01:42:31]
21p:c:malloc(size_t size):size_t:size
22        COUNT      EVENT
23        1          size = 16
24[01:42:32]
25p:c:malloc(size_t size):size_t:size
26        COUNT      EVENT
27        2          size = 16
28[01:42:33]
29p:c:malloc(size_t size):size_t:size
30        COUNT      EVENT
31        3          size = 16
32[01:42:34]
33p:c:malloc(size_t size):size_t:size
34        COUNT      EVENT
35        4          size = 16
36^C
37
38It seems that the application is allocating blocks of size 16. The COUNT
39column contains the number of occurrences of a particular event, and the
40EVENT column describes the event. In this case, the "size" parameter was
41probed and its value was 16, repeatedly.
42
43Now, suppose you wanted a histogram of buffer sizes passed to the write()
44function across the system:
45
46# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
47[01:45:22]
48p:c:write(int fd, void *buf, size_t len):size_t:len
49     len                 : count     distribution
50         0 -> 1          : 0        |                                        |
51         2 -> 3          : 2        |*************                           |
52         4 -> 7          : 0        |                                        |
53         8 -> 15         : 2        |*************                           |
54        16 -> 31         : 0        |                                        |
55        32 -> 63         : 6        |****************************************|
56[01:45:23]
57p:c:write(int fd, void *buf, size_t len):size_t:len
58     len                 : count     distribution
59         0 -> 1          : 0        |                                        |
60         2 -> 3          : 11       |***************                         |
61         4 -> 7          : 0        |                                        |
62         8 -> 15         : 4        |*****                                   |
63        16 -> 31         : 0        |                                        |
64        32 -> 63         : 28       |****************************************|
65        64 -> 127        : 12       |*****************                       |
66[01:45:24]
67p:c:write(int fd, void *buf, size_t len):size_t:len
68     len                 : count     distribution
69         0 -> 1          : 0        |                                        |
70         2 -> 3          : 21       |****************                        |
71         4 -> 7          : 0        |                                        |
72         8 -> 15         : 6        |****                                    |
73        16 -> 31         : 0        |                                        |
74        32 -> 63         : 52       |****************************************|
75        64 -> 127        : 26       |********************                    |
76^C
77
78It seems that most writes fall into three buckets: very small writes of 2-3
79bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.
80
81But these are writes across the board -- what if you wanted to focus on writes
82to STDOUT?
83
84# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
85[01:47:17]
86p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
87     len                 : count     distribution
88         0 -> 1          : 0        |                                        |
89         2 -> 3          : 0        |                                        |
90         4 -> 7          : 0        |                                        |
91         8 -> 15         : 1        |****************************************|
92        16 -> 31         : 0        |                                        |
93        32 -> 63         : 1        |****************************************|
94[01:47:18]
95p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
96     len                 : count     distribution
97         0 -> 1          : 0        |                                        |
98         2 -> 3          : 0        |                                        |
99         4 -> 7          : 0        |                                        |
100         8 -> 15         : 2        |*************                           |
101        16 -> 31         : 0        |                                        |
102        32 -> 63         : 3        |********************                    |
103        64 -> 127        : 6        |****************************************|
104[01:47:19]
105p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
106     len                 : count     distribution
107         0 -> 1          : 0        |                                        |
108         2 -> 3          : 0        |                                        |
109         4 -> 7          : 0        |                                        |
110         8 -> 15         : 3        |*********                               |
111        16 -> 31         : 0        |                                        |
112        32 -> 63         : 5        |***************                         |
113        64 -> 127        : 13       |****************************************|
114^C
115
116The "fd==1" part is a filter that is applied to every invocation of write().
117Only if the filter condition is true, the value is recorded.
118
119You can also use argdist to trace kernel functions. For example, suppose you
120wanted a histogram of kernel allocation (kmalloc) sizes across the system,
121printed twice with 3 second intervals:
122
123# ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
124[01:50:00]
125p::__kmalloc(size_t size):size_t:size
126     size                : count     distribution
127         0 -> 1          : 0        |                                        |
128         2 -> 3          : 0        |                                        |
129         4 -> 7          : 0        |                                        |
130         8 -> 15         : 6        |****************************************|
131[01:50:03]
132p::__kmalloc(size_t size):size_t:size
133     size                : count     distribution
134         0 -> 1          : 0        |                                        |
135         2 -> 3          : 0        |                                        |
136         4 -> 7          : 0        |                                        |
137         8 -> 15         : 22       |****************************************|
138        16 -> 31         : 0        |                                        |
139        32 -> 63         : 0        |                                        |
140        64 -> 127        : 5        |*********                               |
141       128 -> 255        : 2        |***                                     |
142
143Occasionally, numeric information isn't enough and you want to capture strings.
144What are the strings printed by puts() across the system?
145
146# ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
147[01:53:54]
148p:c:puts(char *str):char*:str
149        COUNT      EVENT
150        2          str = Press ENTER to start.
151
152It looks like the message "Press ENTER to start." was printed twice during the
15310 seconds we were tracing.
154
155What about reads? You could trace gets() across the system and print the
156strings input by the user (note how "r" is used instead of "p" to attach a
157probe to the function's return):
158
159# ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0'
160[02:12:23]
161r:c:gets():char*:$retval:$retval!=0
162        COUNT      EVENT
163        1          (char*)$retval = hi there
164        3          (char*)$retval = sasha
165        8          (char*)$retval = hello
166
167Similarly, we could get a histogram of the error codes returned by read():
168
169# ./argdist -i 10 -c 1 -H 'r:c:read()'
170[02:15:36]
171r:c:read()
172     retval              : count     distribution
173         0 -> 1          : 29       |****************************************|
174         2 -> 3          : 11       |***************                         |
175         4 -> 7          : 0        |                                        |
176         8 -> 15         : 3        |****                                    |
177        16 -> 31         : 2        |**                                      |
178        32 -> 63         : 22       |******************************          |
179        64 -> 127        : 5        |******                                  |
180       128 -> 255        : 0        |                                        |
181       256 -> 511        : 1        |*                                       |
182       512 -> 1023       : 1        |*                                       |
183      1024 -> 2047       : 0        |                                        |
184      2048 -> 4095       : 2        |**                                      |
185
186In return probes, you can also trace the latency of the function (unless it is
187recursive) and the parameters it had on entry. For example, we can identify
188which processes are performing slow synchronous filesystem reads -- say,
189longer than 0.1ms (100,000ns):
190
191# ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
192[01:08:48]
193r::__vfs_read():u32:$PID:$latency > 100000
194        COUNT      EVENT
195        1          $PID = 10457
196        21         $PID = 2780
197[01:08:49]
198r::__vfs_read():u32:$PID:$latency > 100000
199        COUNT      EVENT
200        1          $PID = 10457
201        21         $PID = 2780
202^C
203
204It looks like process 2780 performed 21 slow reads.
205
206Occasionally, entry parameter values are also interesting. For example, you
207might be curious how long it takes malloc() to allocate memory -- nanoseconds
208per byte allocated. Let's go:
209
210# ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10
211[01:11:13]
212     ns per byte         : count     distribution
213         0 -> 1          : 0        |                                        |
214         2 -> 3          : 4        |*****************                       |
215         4 -> 7          : 3        |*************                           |
216         8 -> 15         : 2        |********                                |
217        16 -> 31         : 1        |****                                    |
218        32 -> 63         : 0        |                                        |
219        64 -> 127        : 7        |*******************************         |
220       128 -> 255        : 1        |****                                    |
221       256 -> 511        : 0        |                                        |
222       512 -> 1023       : 1        |****                                    |
223      1024 -> 2047       : 1        |****                                    |
224      2048 -> 4095       : 9        |****************************************|
225      4096 -> 8191       : 1        |****                                    |
226
227It looks like a tri-modal distribution. Some allocations are extremely cheap,
228and take 2-15 nanoseconds per byte. Other allocations are slower, and take
22964-127 nanoseconds per byte. And some allocations are slower still, and take
230multiple microseconds per byte.
231
232You could also group results by more than one field. For example, __kmalloc
233takes an additional flags parameter that describes how to allocate memory:
234
235# ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size'
236[03:42:29]
237p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
238        COUNT      EVENT
239        1          flags = 16, size = 152
240        2          flags = 131280, size = 8
241        7          flags = 131280, size = 16
242[03:42:30]
243p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
244        COUNT      EVENT
245        1          flags = 16, size = 152
246        6          flags = 131280, size = 8
247        19         flags = 131280, size = 16
248[03:42:31]
249p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
250        COUNT      EVENT
251        2          flags = 16, size = 152
252        10         flags = 131280, size = 8
253        31         flags = 131280, size = 16
254[03:42:32]
255p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
256        COUNT      EVENT
257        2          flags = 16, size = 152
258        14         flags = 131280, size = 8
259        43         flags = 131280, size = 16
260^C
261
262The flags value must be expanded by hand, but it's still helpful to eliminate
263certain kinds of allocations or visually group them together.
264
265argdist also has basic support for kernel tracepoints. It is sometimes more
266convenient to use tracepoints because they are documented and don't vary a lot
267between kernel versions. For example, let's trace the net:net_dev_start_xmit
268tracepoint and print out the protocol field from the tracepoint structure:
269
270# argdist -C 't:net:net_dev_start_xmit():u16:args->protocol'
271[13:01:49]
272t:net:net_dev_start_xmit():u16:args->protocol
273        COUNT      EVENT
274        8          args->protocol = 2048
275^C
276
277Note that to discover the format of the net:net_dev_start_xmit tracepoint, you
278use the tplist tool (tplist -v net:net_dev_start_xmit).
279
280
281Occasionally, it is useful to filter certain expressions by string. This is not
282trivially supported by BPF, but argdist provides a STRCMP helper you can use in
283filter expressions. For example, to get a histogram of latencies opening a
284specific file, run this:
285
286# argdist -c -H 'r:c:open(char *file):u64:$latency/1000:STRCMP("test.txt",$entry(file))'
287[02:16:38]
288[02:16:39]
289[02:16:40]
290     $latency/1000       : count     distribution
291         0 -> 1          : 0        |                                        |
292         2 -> 3          : 0        |                                        |
293         4 -> 7          : 0        |                                        |
294         8 -> 15         : 0        |                                        |
295        16 -> 31         : 2        |****************************************|
296[02:16:41]
297     $latency/1000       : count     distribution
298         0 -> 1          : 0        |                                        |
299         2 -> 3          : 0        |                                        |
300         4 -> 7          : 0        |                                        |
301         8 -> 15         : 1        |**********                              |
302        16 -> 31         : 4        |****************************************|
303[02:16:42]
304     $latency/1000       : count     distribution
305         0 -> 1          : 0        |                                        |
306         2 -> 3          : 0        |                                        |
307         4 -> 7          : 0        |                                        |
308         8 -> 15         : 1        |********                                |
309        16 -> 31         : 5        |****************************************|
310[02:16:43]
311     $latency/1000       : count     distribution
312         0 -> 1          : 0        |                                        |
313         2 -> 3          : 0        |                                        |
314         4 -> 7          : 0        |                                        |
315         8 -> 15         : 1        |********                                |
316        16 -> 31         : 5        |****************************************|
317
318
319Here's a final example that finds how many write() system calls are performed
320by each process on the system:
321
322# argdist -c -C 'p:c:write():int:$PID;write per process' -n 2
323[06:47:18]
324write by process
325        COUNT      EVENT
326        3          $PID = 8889
327        7          $PID = 7615
328        7          $PID = 2480
329[06:47:19]
330write by process
331        COUNT      EVENT
332        9          $PID = 8889
333        23         $PID = 7615
334        23         $PID = 2480
335
336
337USAGE message:
338
339# argdist -h
340usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v]
341               [-c] [-T TOP] [-H specifier] [-C[specifier] [-I header]
342
343Trace a function and display a summary of its parameter values.
344
345optional arguments:
346  -h, --help            show this help message and exit
347  -p PID, --pid PID     id of the process to trace (optional)
348  -z STRING_SIZE, --string-size STRING_SIZE
349                        maximum string size to read from char* arguments
350  -i INTERVAL, --interval INTERVAL
351                        output interval, in seconds (default 1 second)
352  -d DURATION, --duration DURATION
353			total duration of trace, in seconds
354  -n COUNT, --number COUNT
355                        number of outputs
356  -v, --verbose         print resulting BPF program code before executing
357  -c, --cumulative      do not clear histograms and freq counts at each interval
358  -T TOP, --top TOP     number of top results to show (not applicable to
359                        histograms)
360  -H specifier, --histogram specifier
361                        probe specifier to capture histogram of (see examples
362                        below)
363  -C specifier, --count specifier
364                        probe specifier to capture count of (see examples
365                        below)
366  -I header, --include header
367                        additional header files to include in the BPF program
368                        as either full path, or relative to current working directory,
369                        or relative to default kernel header search path
370
371Probe specifier syntax:
372        {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
373Where:
374        p,r,t,u    -- probe at function entry, function exit, kernel tracepoint,
375                      or USDT probe
376                      in exit probes: can use $retval, $entry(param), $latency
377        library    -- the library that contains the function
378                      (leave empty for kernel functions)
379        category   -- the category of the kernel tracepoint (e.g. net, sched)
380        signature  -- the function's parameters, as in the C header
381        type       -- the type of the expression to collect (supports multiple)
382        expr       -- the expression to collect (supports multiple)
383        filter     -- the filter that is applied to collected values
384        label      -- the label for this probe in the resulting output
385
386EXAMPLES:
387
388argdist -H 'p::__kmalloc(u64 size):u64:size'
389        Print a histogram of allocation sizes passed to kmalloc
390
391argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
392        Print a frequency count of how many times process 1005 called malloc
393        with an allocation size of 16 bytes
394
395argdist -C 'r:c:gets():char*:$retval#snooped strings'
396        Snoop on all strings returned by gets()
397
398argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
399        Print a histogram of nanoseconds per byte from kmalloc allocations
400
401argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
402        Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
403
404argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
405        Print frequency counts of how many times writes were issued to a
406        particular file descriptor number, in process 1005, but only show
407        the top 5 busiest fds
408
409argdist -p 1005 -H 'r:c:read()'
410        Print a histogram of error codes returned by read() in process 1005
411
412argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
413        Print frequency of reads by process where the latency was >0.1ms
414
415argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
416        Print a histogram of read sizes that were longer than 1ms
417
418argdist -H \
419        'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
420        Print a histogram of buffer sizes passed to write() across all
421        processes, where the file descriptor was 1 (STDOUT)
422
423argdist -C 'p:c:fork()#fork calls'
424        Count fork() calls in libc across all processes
425        Can also use funccount.py, which is easier and more flexible
426
427argdist -H 't:block:block_rq_complete():u32:args->nr_sector'
428        Print histogram of number of sectors in completing block I/O requests
429
430argdist -C 't:irq:irq_handler_entry():int:args->irq'
431        Aggregate interrupts by interrupt request (IRQ)
432
433argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
434        Print frequency of function addresses used as a pthread start function,
435        relying on the USDT pthread_start probe in process 1337
436
437argdist -H 'p:c:sleep(u32 seconds):u32:seconds' \
438        -H 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
439        Print histograms of sleep() and nanosleep() parameter values
440
441argdist -p 2780 -z 120 \
442        -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
443        Spy on writes to STDOUT performed by process 2780, up to a string size
444        of 120 characters
445
446argdist -I 'kernel/sched/sched.h' \
447        -C 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq):s64:cfs_rq->runtime_remaining'
448        Trace on the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined
449        in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel
450        package.  So this command needs to run at the kernel source tree root directory
451        so that the added header file can be found by the compiler.
452