• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Demonstrations of stackcount, the Linux eBPF/bcc version.
2
3
4This program traces functions and frequency counts them with their entire
5stack trace, summarized in-kernel for efficiency. For example, counting
6stack traces that led to the submit_bio() kernel function, which creates
7block device I/O:
8
9# ./stackcount submit_bio
10Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
11^C
12  submit_bio
13  ext4_writepages
14  do_writepages
15  __filemap_fdatawrite_range
16  filemap_flush
17  ext4_alloc_da_blocks
18  ext4_release_file
19  __fput
20  ____fput
21  task_work_run
22  exit_to_usermode_loop
23  syscall_return_slowpath
24  entry_SYSCALL_64_fastpath
25  [unknown]
26  [unknown]
27    tar [15069]
28    5
29
30  submit_bio
31  ext4_bio_write_page
32  mpage_submit_page
33  mpage_map_and_submit_buffers
34  ext4_writepages
35  do_writepages
36  __filemap_fdatawrite_range
37  filemap_flush
38  ext4_alloc_da_blocks
39  ext4_release_file
40  __fput
41  ____fput
42  task_work_run
43  exit_to_usermode_loop
44  syscall_return_slowpath
45  entry_SYSCALL_64_fastpath
46  [unknown]
47  [unknown]
48    tar [15069]
49    15
50
51  submit_bio
52  ext4_readpages
53  __do_page_cache_readahead
54  ondemand_readahead
55  page_cache_async_readahead
56  generic_file_read_iter
57  __vfs_read
58  vfs_read
59  sys_read
60  entry_SYSCALL_64_fastpath
61  [unknown]
62    tar [15069]
63    113
64
65Detaching...
66
67The output shows unique stack traces, in order from leaf (on-CPU) to root,
68followed by their occurrence count. The last stack trace in the above output
69shows syscall handling, sys_read(), vfs_read(), and then "readahead" functions:
70looks like an application issued file read has triggered read ahead. The
71application can be seen after the stack trace, in this case, "tar [15069]"
72for the "tar" command, PID 15069.
73
74The order of printed stack traces is from least to most frequent. The most
75frequent in this case, the ext4_rename() stack, was taken 113 times during
76tracing.
77
78The "[unknown]" frames are from user-level, since this simple workload is
79the tar command, which apparently has been compiled without frame pointers.
80It's a common compiler optimization, but it breaks frame pointer-based stack
81walkers. Similar broken stacks will be seen by other profilers and debuggers
82that use frame pointers. Hopefully your application preserves them so that
83the user-level stack trace is visible. So how does one get frame pointers, if
84your application doesn't have them to start with? For the current bcc (until
85it supports other stack walkers), you need to be running a application binaries
86that preserves frame pointers, eg, using gcc's -fno-omit-frame-pointer. That's
87about all I'll say here: this is a big topic that is not bcc/BPF specific.
88
89It can be useful to trace the path to submit_bio to explain unusual rates of
90disk IOPS. These could have in-kernel origins (eg, background scrub).
91
92
93Now adding the -d option to delimit kernel and user stacks:
94
95# ./stackcount -d submit_bio
96Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
97^C
98  submit_bio
99  submit_bh
100  journal_submit_commit_record
101  jbd2_journal_commit_transaction
102  kjournald2
103  kthread
104  ret_from_fork
105    --
106    jbd2/xvda1-8 [405]
107    1
108
109  submit_bio
110  submit_bh
111  jbd2_journal_commit_transaction
112  kjournald2
113  kthread
114  ret_from_fork
115    --
116    jbd2/xvda1-8 [405]
117    2
118
119  submit_bio
120  ext4_writepages
121  do_writepages
122  __filemap_fdatawrite_range
123  filemap_flush
124  ext4_alloc_da_blocks
125  ext4_release_file
126  __fput
127  ____fput
128  task_work_run
129  exit_to_usermode_loop
130  syscall_return_slowpath
131  entry_SYSCALL_64_fastpath
132    --
133  [unknown]
134  [unknown]
135    tar [15187]
136    5
137
138  submit_bio
139  ext4_bio_write_page
140  mpage_submit_page
141  mpage_map_and_submit_buffers
142  ext4_writepages
143  do_writepages
144  __filemap_fdatawrite_range
145  filemap_flush
146  ext4_alloc_da_blocks
147  ext4_release_file
148  __fput
149  ____fput
150  task_work_run
151  exit_to_usermode_loop
152  syscall_return_slowpath
153  entry_SYSCALL_64_fastpath
154    --
155  [unknown]
156  [unknown]
157    tar [15187]
158    15
159
160  submit_bio
161  ext4_readpages
162  __do_page_cache_readahead
163  ondemand_readahead
164  page_cache_async_readahead
165  generic_file_read_iter
166  __vfs_read
167  vfs_read
168  sys_read
169  entry_SYSCALL_64_fastpath
170    --
171  [unknown]
172  [unknown]
173  [unknown]
174    tar [15187]
175    171
176
177Detaching...
178
179A "--" is printed between the kernel and user stacks.
180
181
182As a different example, here is the kernel function hrtimer_init_sleeper():
183
184# ./stackcount.py -d hrtimer_init_sleeper
185Tracing 1 functions for "hrtimer_init_sleeper"... Hit Ctrl-C to end.
186^C
187  hrtimer_init_sleeper
188  do_futex
189  SyS_futex
190  entry_SYSCALL_64_fastpath
191    --
192  [unknown]
193    containerd [16020]
194    1
195
196  hrtimer_init_sleeper
197  do_futex
198  SyS_futex
199  entry_SYSCALL_64_fastpath
200    --
201  __pthread_cond_timedwait
202  Monitor::IWait(Thread*, long)
203  Monitor::wait(bool, long, bool)
204  CompileQueue::get()
205  CompileBroker::compiler_thread_loop()
206  JavaThread::thread_main_inner()
207  JavaThread::run()
208  java_start(Thread*)
209  start_thread
210    java [4996]
211    1
212
213  hrtimer_init_sleeper
214  do_futex
215  SyS_futex
216  entry_SYSCALL_64_fastpath
217    --
218  [unknown]
219  [unknown]
220    containerd [16020]
221    1
222
223  hrtimer_init_sleeper
224  do_futex
225  SyS_futex
226  entry_SYSCALL_64_fastpath
227    --
228  __pthread_cond_timedwait
229  VMThread::loop()
230  VMThread::run()
231  java_start(Thread*)
232  start_thread
233    java [4996]
234    3
235
236  hrtimer_init_sleeper
237  do_futex
238  SyS_futex
239  entry_SYSCALL_64_fastpath
240    --
241  [unknown]
242    dockerd [16008]
243    4
244
245  hrtimer_init_sleeper
246  do_futex
247  SyS_futex
248  entry_SYSCALL_64_fastpath
249    --
250  [unknown]
251  [unknown]
252    dockerd [16008]
253    4
254
255  hrtimer_init_sleeper
256  do_futex
257  SyS_futex
258  entry_SYSCALL_64_fastpath
259    --
260  __pthread_cond_timedwait
261  Lio/netty/util/ThreadDeathWatcher$Watcher;::run
262  Interpreter
263  Interpreter
264  call_stub
265  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
266  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
267  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)
268  thread_entry(JavaThread*, Thread*)
269  JavaThread::thread_main_inner()
270  JavaThread::run()
271  java_start(Thread*)
272  start_thread
273    java [4996]
274    4
275
276  hrtimer_init_sleeper
277  do_futex
278  SyS_futex
279  entry_SYSCALL_64_fastpath
280    --
281  __pthread_cond_timedwait
282  clock_gettime
283  [unknown]
284    java [4996]
285    79
286
287Detaching...
288
289I was just trying to find a more interesting example. This output includes
290some Java stacks where user-level has been walked correctly (even includes a
291JIT symbol translation). dockerd and containerd don't have frame pointers
292(grumble), but Java does (which is running with -XX:+PreserveFramePointer).
293
294
295Here's another kernel function, ip_output():
296
297# ./stackcount.py -d ip_output
298Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
299^C
300  ip_output
301  ip_queue_xmit
302  tcp_transmit_skb
303  tcp_write_xmit
304  __tcp_push_pending_frames
305  tcp_push
306  tcp_sendmsg
307  inet_sendmsg
308  sock_sendmsg
309  sock_write_iter
310  __vfs_write
311  vfs_write
312  SyS_write
313  entry_SYSCALL_64_fastpath
314    --
315  __write_nocancel
316  [unknown]
317    sshd [15015]
318    5
319
320  ip_output
321  ip_queue_xmit
322  tcp_transmit_skb
323  tcp_write_xmit
324  __tcp_push_pending_frames
325  tcp_push
326  tcp_sendmsg
327  inet_sendmsg
328  sock_sendmsg
329  sock_write_iter
330  __vfs_write
331  vfs_write
332  SyS_write
333  entry_SYSCALL_64_fastpath
334    --
335  __write_nocancel
336  [unknown]
337  [unknown]
338    sshd [8234]
339    5
340
341  ip_output
342  ip_queue_xmit
343  tcp_transmit_skb
344  tcp_write_xmit
345  __tcp_push_pending_frames
346  tcp_push
347  tcp_sendmsg
348  inet_sendmsg
349  sock_sendmsg
350  sock_write_iter
351  __vfs_write
352  vfs_write
353  SyS_write
354  entry_SYSCALL_64_fastpath
355    --
356  __write_nocancel
357    sshd [15015]
358    7
359
360Detaching...
361
362This time just sshd is triggering ip_output() calls.
363
364
365Watch what happens if I filter on kernel stacks only (-K) for ip_output():
366
367# ./stackcount.py -K ip_output
368Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
369^C
370  ip_output
371  ip_queue_xmit
372  tcp_transmit_skb
373  tcp_write_xmit
374  __tcp_push_pending_frames
375  tcp_push
376  tcp_sendmsg
377  inet_sendmsg
378  sock_sendmsg
379  sock_write_iter
380  __vfs_write
381  vfs_write
382  SyS_write
383  entry_SYSCALL_64_fastpath
384    13
385
386Detaching...
387
388They have grouped together as a single unique stack, since the kernel part
389was the same.
390
391
392Here is just the user stacks, fetched during the kernel function ip_output():
393
394# ./stackcount.py -U ip_output
395Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
396^C
397  [unknown]
398    snmpd [1645]
399    1
400
401  __write_nocancel
402  [unknown]
403  [unknown]
404    sshd [8234]
405    3
406
407  __write_nocancel
408    sshd [15015]
409    4
410
411I should really run a custom sshd with frame pointers so we can see its
412stack trace...
413
414
415User-space functions can also be traced if a library name is provided. For
416example, to quickly identify code locations that allocate heap memory for
417PID 4902 (using -p), by tracing malloc from libc ("c:malloc"):
418
419# ./stackcount -p 4902 c:malloc
420Tracing 1 functions for "malloc"... Hit Ctrl-C to end.
421^C
422  malloc
423  rbtree_new
424  main
425  [unknown]
426    12
427
428  malloc
429  _rbtree_node_new_internal
430  _rbtree_node_insert
431  rbtree_insert
432  main
433  [unknown]
434    1189
435
436Detaching...
437
438Kernel stacks are absent as this didn't enter kernel code.
439
440Note that user-space uses of stackcount can be somewhat more limited because
441a lot of user-space libraries and binaries are compiled without frame-pointers
442as discussed earlier (the -fomit-frame-pointer compiler default) or are used
443without debuginfo.
444
445
446In addition to kernel and user-space functions, kernel tracepoints and USDT
447tracepoints are also supported.
448
449For example, to determine where threads are being created in a particular
450process, use the pthread_create USDT tracepoint:
451
452# ./stackcount -p $(pidof parprimes) u:pthread:pthread_create
453Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end.
454^C
455
456    parprimes [11923]
457  pthread_create@@GLIBC_2.2.5
458  main
459  __libc_start_main
460  [unknown]
461    7
462
463You can use "readelf -n file" to see if it has USDT tracepoints.
464
465
466Similarly, to determine where context switching is happening in the kernel,
467use the sched:sched_switch kernel tracepoint:
468
469# ./stackcount t:sched:sched_switch
470  __schedule
471  schedule
472  worker_thread
473  kthread
474  ret_from_fork
475    kworker/0:2 [25482]
476    1
477
478  __schedule
479  schedule
480  schedule_hrtimeout_range_clock
481  schedule_hrtimeout_range
482  ep_poll
483  SyS_epoll_wait
484  entry_SYSCALL_64_fastpath
485  epoll_wait
486  Lsun/nio/ch/SelectorImpl;::lockAndDoSelect
487  Lsun/nio/ch/SelectorImpl;::select
488  Lio/netty/channel/nio/NioEventLoop;::select
489  Lio/netty/channel/nio/NioEventLoop;::run
490  Interpreter
491  Interpreter
492  call_stub
493  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
494  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
495  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)
496  thread_entry(JavaThread*, Thread*)
497  JavaThread::thread_main_inner()
498  JavaThread::run()
499  java_start(Thread*)
500  start_thread
501    java [4996]
502    1
503
504... (omitted for brevity)
505
506  __schedule
507  schedule
508  schedule_preempt_disabled
509  cpu_startup_entry
510  xen_play_dead
511  arch_cpu_idle_dead
512  cpu_startup_entry
513  cpu_bringup_and_idle
514    swapper/1 [0]
515    289
516
517
518A -i option can be used to set an output interval, and -T to include a
519timestamp. For example:
520
521# ./stackcount.py -Tdi 1 submit_bio
522Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
523
52406:05:13
525
52606:05:14
527  submit_bio
528  xfs_do_writepage
529  write_cache_pages
530  xfs_vm_writepages
531  do_writepages
532  __writeback_single_inode
533  writeback_sb_inodes
534  __writeback_inodes_wb
535  wb_writeback
536  wb_workfn
537  process_one_work
538  worker_thread
539  kthread
540  ret_from_fork
541    --
542    kworker/u16:1 [15686]
543    1
544
545  submit_bio
546  process_one_work
547  worker_thread
548  kthread
549  ret_from_fork
550    --
551    kworker/u16:0 [16007]
552    1
553
554  submit_bio
555  xfs_buf_submit
556  xlog_bdstrat
557  xlog_sync
558  xlog_state_release_iclog
559  _xfs_log_force
560  xfs_log_force
561  xfs_fs_sync_fs
562  sync_fs_one_sb
563  iterate_supers
564  sys_sync
565  entry_SYSCALL_64_fastpath
566    --
567  [unknown]
568    sync [16039]
569    1
570
571  submit_bio
572  submit_bh
573  journal_submit_commit_record
574  jbd2_journal_commit_transaction
575  kjournald2
576  kthread
577  ret_from_fork
578    --
579    jbd2/xvda1-8 [405]
580    1
581
582  submit_bio
583  process_one_work
584  worker_thread
585  kthread
586  ret_from_fork
587    --
588    kworker/0:2 [25482]
589    2
590
591  submit_bio
592  ext4_writepages
593  do_writepages
594  __writeback_single_inode
595  writeback_sb_inodes
596  __writeback_inodes_wb
597  wb_writeback
598  wb_workfn
599  process_one_work
600  worker_thread
601  kthread
602  ret_from_fork
603    --
604    kworker/u16:0 [16007]
605    4
606
607  submit_bio
608  xfs_vm_writepages
609  do_writepages
610  __writeback_single_inode
611  writeback_sb_inodes
612  __writeback_inodes_wb
613  wb_writeback
614  wb_workfn
615  process_one_work
616  worker_thread
617  kthread
618  ret_from_fork
619    --
620    kworker/u16:1 [15686]
621    5
622
623  submit_bio
624  __block_write_full_page
625  block_write_full_page
626  blkdev_writepage
627  __writepage
628  write_cache_pages
629  generic_writepages
630  blkdev_writepages
631  do_writepages
632  __filemap_fdatawrite_range
633  filemap_fdatawrite
634  fdatawrite_one_bdev
635  iterate_bdevs
636  sys_sync
637  entry_SYSCALL_64_fastpath
638    --
639  [unknown]
640    sync [16039]
641    7
642
643  submit_bio
644  submit_bh
645  jbd2_journal_commit_transaction
646  kjournald2
647  kthread
648  ret_from_fork
649    --
650    jbd2/xvda1-8 [405]
651    8
652
653  submit_bio
654  ext4_bio_write_page
655  mpage_submit_page
656  mpage_map_and_submit_buffers
657  ext4_writepages
658  do_writepages
659  __writeback_single_inode
660  writeback_sb_inodes
661  __writeback_inodes_wb
662  wb_writeback
663  wb_workfn
664  process_one_work
665  worker_thread
666  kthread
667  ret_from_fork
668    --
669    kworker/u16:0 [16007]
670    8
671
672  submit_bio
673  __block_write_full_page
674  block_write_full_page
675  blkdev_writepage
676  __writepage
677  write_cache_pages
678  generic_writepages
679  blkdev_writepages
680  do_writepages
681  __writeback_single_inode
682  writeback_sb_inodes
683  __writeback_inodes_wb
684  wb_writeback
685  wb_workfn
686  process_one_work
687  worker_thread
688  kthread
689  ret_from_fork
690    --
691    kworker/u16:0 [16007]
692    60
693
694
69506:05:15
696
69706:05:16
698
699Detaching...
700
701This only included output for the 06:05:14 interval. The other internals
702did not span block device I/O.
703
704
705The -s output prints the return instruction offset for each function (aka
706symbol offset). Eg:
707
708# ./stackcount.py -s tcp_sendmsg
709Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
710^C
711  tcp_sendmsg+0x1
712  sock_sendmsg+0x38
713  sock_write_iter+0x85
714  __vfs_write+0xe3
715  vfs_write+0xb8
716  SyS_write+0x55
717  entry_SYSCALL_64_fastpath+0x1e
718  __write_nocancel+0x7
719    sshd [15015]
720    3
721
722  tcp_sendmsg+0x1
723  sock_sendmsg+0x38
724  sock_write_iter+0x85
725  __vfs_write+0xe3
726  vfs_write+0xb8
727  SyS_write+0x55
728  entry_SYSCALL_64_fastpath+0x1e
729  __write_nocancel+0x7
730    sshd [8234]
731    3
732
733Detaching...
734
735If it wasn't clear how one function called another, knowing the instruction
736offset can help you locate the lines of code from a disassembly dump.
737
738
739The -v output is verbose, and shows raw addresses:
740
741./stackcount.py -v tcp_sendmsg
742Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
743^C
744  ffffffff817b05c1 tcp_sendmsg
745  ffffffff8173ea48 sock_sendmsg
746  ffffffff8173eae5 sock_write_iter
747  ffffffff81232b33 __vfs_write
748  ffffffff812331b8 vfs_write
749  ffffffff81234625 SyS_write
750  ffffffff818739bb entry_SYSCALL_64_fastpath
751  7f120511e6e0     __write_nocancel
752    sshd [8234]
753    3
754
755  ffffffff817b05c1 tcp_sendmsg
756  ffffffff8173ea48 sock_sendmsg
757  ffffffff8173eae5 sock_write_iter
758  ffffffff81232b33 __vfs_write
759  ffffffff812331b8 vfs_write
760  ffffffff81234625 SyS_write
761  ffffffff818739bb entry_SYSCALL_64_fastpath
762  7f919c5a26e0     __write_nocancel
763    sshd [15015]
764    11
765
766Detaching...
767
768
769A wildcard can also be used. Eg, all functions beginning with "tcp_send",
770kernel stacks only (-K) with offsets (-s):
771
772# ./stackcount -Ks 'tcp_send*'
773Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end.
774^C
775  tcp_send_delayed_ack0x1
776  tcp_rcv_established0x3b1
777  tcp_v4_do_rcv0x130
778  tcp_v4_rcv0x8e0
779  ip_local_deliver_finish0x9f
780  ip_local_deliver0x51
781  ip_rcv_finish0x8a
782  ip_rcv0x29d
783  __netif_receive_skb_core0x637
784  __netif_receive_skb0x18
785  netif_receive_skb_internal0x23
786    1
787
788  tcp_send_delayed_ack0x1
789  tcp_rcv_established0x222
790  tcp_v4_do_rcv0x130
791  tcp_v4_rcv0x8e0
792  ip_local_deliver_finish0x9f
793  ip_local_deliver0x51
794  ip_rcv_finish0x8a
795  ip_rcv0x29d
796  __netif_receive_skb_core0x637
797  __netif_receive_skb0x18
798  netif_receive_skb_internal0x23
799    4
800
801  tcp_send_mss0x1
802  inet_sendmsg0x67
803  sock_sendmsg0x38
804  sock_write_iter0x78
805  __vfs_write0xaa
806  vfs_write0xa9
807  sys_write0x46
808  entry_SYSCALL_64_fastpath0x16
809    7
810
811  tcp_sendmsg0x1
812  sock_sendmsg0x38
813  sock_write_iter0x78
814  __vfs_write0xaa
815  vfs_write0xa9
816  sys_write0x46
817  entry_SYSCALL_64_fastpath0x16
818    7
819
820Detaching...
821
822Use -r to allow regular expressions.
823
824
825The -f option will emit folded output, which can be used as input to other
826tools including flame graphs. For example, with delimiters as well:
827
828# ./stackcount.py -df t:sched:sched_switch
829^Csnmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
830kworker/7:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
831watchdog/0;-;ret_from_fork;kthread;smpboot_thread_fn;schedule;__schedule 1
832snmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
833svscan;[unknown];-;entry_SYSCALL_64_fastpath;SyS_nanosleep;hrtimer_nanosleep;do_nanosleep;schedule;__schedule 1
834python;[unknown];__select_nocancel;-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
835kworker/2:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
836[...]
837
838Flame graphs visualize stack traces. For information about them and links
839to open source software, see http://www.brendangregg.com/flamegraphs.html .
840This folded output can be piped directly into flamegraph.pl (the Perl version).
841
842
843USAGE message:
844
845# ./stackcount -h
846usage: stackcount [-h] [-p PID] [-i INTERVAL] [-D DURATION] [-T] [-r] [-s]
847                  [-P] [-K] [-U] [-v] [-d] [-f] [--debug]
848                  pattern
849
850Count events and their stack traces
851
852positional arguments:
853  pattern               search expression for events
854
855optional arguments:
856  -h, --help            show this help message and exit
857  -p PID, --pid PID     trace this PID only
858  -i INTERVAL, --interval INTERVAL
859                        summary interval, seconds
860  -D DURATION, --duration DURATION
861                        total duration of trace, seconds
862  -T, --timestamp       include timestamp on output
863  -r, --regexp          use regular expressions. Default is "*" wildcards
864                        only.
865  -s, --offset          show address offsets
866  -P, --perpid          display stacks separately for each process
867  -K, --kernel-stacks-only
868                        kernel stack only
869  -U, --user-stacks-only
870                        user stack only
871  -v, --verbose         show raw addresses
872  -d, --delimited       insert delimiter between kernel/user stacks
873  -f, --folded          output folded format
874  --debug               print BPF program before starting (for debugging
875                        purposes)
876
877examples:
878    ./stackcount submit_bio         # count kernel stack traces for submit_bio
879    ./stackcount -d ip_output       # include a user/kernel stack delimiter
880    ./stackcount -s ip_output       # show symbol offsets
881    ./stackcount -sv ip_output      # show offsets and raw addresses (verbose)
882    ./stackcount 'tcp_send*'        # count stacks for funcs matching tcp_send*
883    ./stackcount -r '^tcp_send.*'   # same as above, using regular expressions
884    ./stackcount -Ti 5 ip_output    # output every 5 seconds, with timestamps
885    ./stackcount -p 185 ip_output   # count ip_output stacks for PID 185 only
886    ./stackcount -p 185 c:malloc    # count stacks for malloc in PID 185
887    ./stackcount t:sched:sched_fork # count stacks for sched_fork tracepoint
888    ./stackcount -p 185 u:node:*    # count stacks for all USDT probes in node
889    ./stackcount -K t:sched:sched_switch   # kernel stacks only
890    ./stackcount -U t:sched:sched_switch   # user stacks only
891