1# bcc Python Developer Tutorial 2 3This tutorial is about developing [bcc](https://github.com/iovisor/bcc) tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences. 4 5Also see the bcc developer's [reference_guide.md](reference_guide.md), and a tutorial for end-users of tools: [tutorial.md](tutorial.md). There is also a lua interface for bcc. 6 7## Observability 8 9This observability tutorial contains 17 lessons, and 46 enumerated things to learn. 10 11### Lesson 1. Hello World 12 13Start by running [examples/hello_world.py](../examples/hello_world.py), while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see [INSTALL.md](../INSTALL.md). 14 15``` 16# ./examples/hello_world.py 17 bash-13364 [002] d... 24573433.052937: : Hello, World! 18 bash-13364 [003] d... 24573436.642808: : Hello, World! 19[...] 20``` 21 22Here's the code for hello_world.py: 23 24```Python 25from bcc import BPF 26BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print() 27``` 28 29There are six things to learn from this: 30 311. ```text='...'```: This defines a BPF program inline. The program is written in C. 32 331. ```kprobe__sys_clone()```: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with ``kprobe__``, the rest is treated as a kernel function name to instrument, in this case, ```sys_clone()```. 34 351. ```void *ctx```: ctx has arguments, but since we aren't using them here, we'll just cast it to ```void *```. 36 371. ```bpf_trace_printk()```: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later. 38 391. ```return 0;```: Necessary formality (if you want to know why, see [#139](https://github.com/iovisor/bcc/issues/139)). 40 411. ```.trace_print()```: A bcc routine that reads trace_pipe and prints the output. 42 43### Lesson 2. sys_sync() 44 45Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running ```sync``` in another session while tracing. The hello_world.py program has everything you need for this. 46 47Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python. 48 49### Lesson 3. hello_fields.py 50 51This program is in [examples/tracing/hello_fields.py](../examples/tracing/hello_fields.py). Sample output (run commands in another session): 52 53``` 54# ./examples/tracing/hello_fields.py 55TIME(s) COMM PID MESSAGE 5624585001.174885999 sshd 1432 Hello, World! 5724585001.195710000 sshd 15780 Hello, World! 5824585001.991976000 systemd-udevd 484 Hello, World! 5924585002.276147000 bash 15787 Hello, World! 60``` 61 62Code: 63 64```Python 65from bcc import BPF 66 67# define BPF program 68prog = """ 69int hello(void *ctx) { 70 bpf_trace_printk("Hello, World!\\n"); 71 return 0; 72} 73""" 74 75# load BPF program 76b = BPF(text=prog) 77b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello") 78 79# header 80print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE")) 81 82# format output 83while 1: 84 try: 85 (task, pid, cpu, flags, ts, msg) = b.trace_fields() 86 except ValueError: 87 continue 88 print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) 89``` 90 91This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn: 92 931. ```prog =```: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments. 94 951. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it. 96 971. ```b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")```: Creates a kprobe for the kernel clone system call function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions. 98 991. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT(). 100 101### Lesson 4. sync_timing.py 102 103Remember the days of sysadmins typing ```sync``` three times on a slow console before ```reboot```, to give the first asynchronous sync time to complete? Then someone thought ```sync;sync;sync``` was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway. 104 105The following example times how quickly the ```do_sync``` function is called, and prints output if it has been called more recently than one second ago. A ```sync;sync;sync``` will print output for the 2nd and 3rd sync's: 106 107``` 108# ./examples/tracing/sync_timing.py 109Tracing for quick sync's... Ctrl-C to end 110At time 0.00 s: multiple syncs detected, last 95 ms ago 111At time 0.10 s: multiple syncs detected, last 96 ms ago 112``` 113 114This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py): 115 116```Python 117from __future__ import print_function 118from bcc import BPF 119 120# load BPF program 121b = BPF(text=""" 122#include <uapi/linux/ptrace.h> 123 124BPF_HASH(last); 125 126int do_trace(struct pt_regs *ctx) { 127 u64 ts, *tsp, delta, key = 0; 128 129 // attempt to read stored timestamp 130 tsp = last.lookup(&key); 131 if (tsp != NULL) { 132 delta = bpf_ktime_get_ns() - *tsp; 133 if (delta < 1000000000) { 134 // output if time is less than 1 second 135 bpf_trace_printk("%d\\n", delta / 1000000); 136 } 137 last.delete(&key); 138 } 139 140 // update stored timestamp 141 ts = bpf_ktime_get_ns(); 142 last.update(&key, &ts); 143 return 0; 144} 145""") 146 147b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace") 148print("Tracing for quick sync's... Ctrl-C to end") 149 150# format output 151start = 0 152while 1: 153 (task, pid, cpu, flags, ts, ms) = b.trace_fields() 154 if start == 0: 155 start = ts 156 ts = ts - start 157 print("At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms)) 158``` 159 160Things to learn: 161 1621. ```bpf_ktime_get_ns()```: Returns the time as nanoseconds. 1631. ```BPF_HASH(last)```: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64. 1641. ```key = 0```: We'll only store one key/value pair in this hash, where the key is hardwired to zero. 1651. ```last.lookup(&key)```: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer. 1661. ```if (tsp != NULL) {```: The verifier requires that pointer values derived from a map lookup must be checked for a null value before they can be dereferenced and used. 1671. ```last.delete(&key)```: Delete the key from the hash. This is currently required because of [a kernel bug in `.update()`](https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02) (fixed in 4.8.10). 1681. ```last.update(&key, &ts)```: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp. 169 170### Lesson 5. sync_count.py 171 172Modify the sync_timing.py program (prior lesson) to store the count of all kernel sync system calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash. 173 174### Lesson 6. disksnoop.py 175 176Browse the [examples/tracing/disksnoop.py](../examples/tracing/disksnoop.py) program to see what is new. Here is some sample output: 177 178``` 179# ./disksnoop.py 180TIME(s) T BYTES LAT(ms) 18116458043.436012 W 4096 3.13 18216458043.437326 W 4096 4.44 18316458044.126545 R 4096 42.82 18416458044.129872 R 4096 3.24 185[...] 186``` 187 188And a code snippet: 189 190```Python 191[...] 192REQ_WRITE = 1 # from include/linux/blk_types.h 193 194# load BPF program 195b = BPF(text=""" 196#include <uapi/linux/ptrace.h> 197#include <linux/blkdev.h> 198 199BPF_HASH(start, struct request *); 200 201void trace_start(struct pt_regs *ctx, struct request *req) { 202 // stash start timestamp by request ptr 203 u64 ts = bpf_ktime_get_ns(); 204 205 start.update(&req, &ts); 206} 207 208void trace_completion(struct pt_regs *ctx, struct request *req) { 209 u64 *tsp, delta; 210 211 tsp = start.lookup(&req); 212 if (tsp != 0) { 213 delta = bpf_ktime_get_ns() - *tsp; 214 bpf_trace_printk("%d %x %d\\n", req->__data_len, 215 req->cmd_flags, delta / 1000); 216 start.delete(&req); 217 } 218} 219""") 220 221b.attach_kprobe(event="blk_start_request", fn_name="trace_start") 222b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start") 223b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion") 224[...] 225``` 226 227Things to learn: 228 2291. ```REQ_WRITE```: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes. 2301. ```trace_start(struct pt_regs *ctx, struct request *req)```: This function will later be attached to kprobes. The arguments to kprobe functions are ```struct pt_regs *ctx```, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is ```struct request *```. 2311. ```start.update(&req, &ts)```: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return). 2321. ```req->__data_len```: We're dereferencing members of ```struct request```. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of ```bpf_probe_read_kernel()``` calls. Sometimes bcc can't handle a complex dereference, and you need to call ```bpf_probe_read_kernel()``` directly. 233 234This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next. 235 236### Lesson 7. hello_perf_output.py 237 238Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session: 239 240``` 241# ./hello_perf_output.py 242TIME(s) COMM PID MESSAGE 2430.000000000 bash 22986 Hello, perf_output! 2440.021080275 systemd-udevd 484 Hello, perf_output! 2450.021359520 systemd-udevd 484 Hello, perf_output! 2460.021590610 systemd-udevd 484 Hello, perf_output! 247[...] 248``` 249 250Code is [examples/tracing/hello_perf_output.py](../examples/tracing/hello_perf_output.py): 251 252```Python 253from bcc import BPF 254 255# define BPF program 256prog = """ 257#include <linux/sched.h> 258 259// define output data structure in C 260struct data_t { 261 u32 pid; 262 u64 ts; 263 char comm[TASK_COMM_LEN]; 264}; 265BPF_PERF_OUTPUT(events); 266 267int hello(struct pt_regs *ctx) { 268 struct data_t data = {}; 269 270 data.pid = bpf_get_current_pid_tgid(); 271 data.ts = bpf_ktime_get_ns(); 272 bpf_get_current_comm(&data.comm, sizeof(data.comm)); 273 274 events.perf_submit(ctx, &data, sizeof(data)); 275 276 return 0; 277} 278""" 279 280# load BPF program 281b = BPF(text=prog) 282b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello") 283 284# header 285print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE")) 286 287# process event 288start = 0 289def print_event(cpu, data, size): 290 global start 291 event = b["events"].event(data) 292 if start == 0: 293 start = event.ts 294 time_s = (float(event.ts - start)) / 1000000000 295 print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid, 296 "Hello, perf_output!")) 297 298# loop with callback to print_event 299b["events"].open_perf_buffer(print_event) 300while 1: 301 b.perf_buffer_poll() 302``` 303 304Things to learn: 305 3061. ```struct data_t```: This defines the C struct we'll use to pass data from kernel to user space. 3071. ```BPF_PERF_OUTPUT(events)```: This names our output channel "events". 3081. ```struct data_t data = {};```: Create an empty data_t struct that we'll then populate. 3091. ```bpf_get_current_pid_tgid()```: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user. 3101. ```bpf_get_current_comm()```: Populates the first argument address with the current process name. 3111. ```events.perf_submit()```: Submit the event for user space to read via a perf ring buffer. 3121. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream. 3131. ```b["events"].event(data)```: Now get the event as a Python object, auto-generated from the C declaration. 3141. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream. 3151. ```while 1: b.perf_buffer_poll()```: Block waiting for events. 316 317### Lesson 8. sync_perf_output.py 318 319Rewrite sync_timing.py, from a prior lesson, to use ```BPF_PERF_OUTPUT```. 320 321### Lesson 9. bitehist.py 322 323The following tool records a histogram of disk I/O sizes. Sample output: 324 325``` 326# ./bitehist.py 327Tracing... Hit Ctrl-C to end. 328^C 329 kbytes : count distribution 330 0 -> 1 : 3 | | 331 2 -> 3 : 0 | | 332 4 -> 7 : 211 |********** | 333 8 -> 15 : 0 | | 334 16 -> 31 : 0 | | 335 32 -> 63 : 0 | | 336 64 -> 127 : 1 | | 337 128 -> 255 : 800 |**************************************| 338``` 339 340Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py): 341 342```Python 343from __future__ import print_function 344from bcc import BPF 345from time import sleep 346 347# load BPF program 348b = BPF(text=""" 349#include <uapi/linux/ptrace.h> 350#include <linux/blkdev.h> 351 352BPF_HISTOGRAM(dist); 353 354int kprobe__blk_account_io_done(struct pt_regs *ctx, struct request *req) 355{ 356 dist.increment(bpf_log2l(req->__data_len / 1024)); 357 return 0; 358} 359""") 360 361# header 362print("Tracing... Hit Ctrl-C to end.") 363 364# trace until Ctrl-C 365try: 366 sleep(99999999) 367except KeyboardInterrupt: 368 print() 369 370# output 371b["dist"].print_log2_hist("kbytes") 372``` 373 374A recap from earlier lessons: 375 376- ```kprobe__```: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe. 377- ```struct pt_regs *ctx, struct request *req```: Arguments to kprobe. The ```ctx``` is registers and BPF context, the ```req``` is the first argument to the instrumented function: ```blk_account_io_done()```. 378- ```req->__data_len```: Dereferencing that member. 379 380New things to learn: 381 3821. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist". 3831. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument. 3841. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram. 3851. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient. 386 387### Lesson 10. disklatency.py 388 389Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson. 390 391### Lesson 11. vfsreadlat.py 392 393This example is split into separate Python and C files. Example output: 394 395``` 396# ./vfsreadlat.py 1 397Tracing... Hit Ctrl-C to end. 398 usecs : count distribution 399 0 -> 1 : 0 | | 400 2 -> 3 : 2 |*********** | 401 4 -> 7 : 7 |****************************************| 402 8 -> 15 : 4 |********************** | 403 404 usecs : count distribution 405 0 -> 1 : 29 |****************************************| 406 2 -> 3 : 28 |************************************** | 407 4 -> 7 : 4 |***** | 408 8 -> 15 : 8 |*********** | 409 16 -> 31 : 0 | | 410 32 -> 63 : 0 | | 411 64 -> 127 : 0 | | 412 128 -> 255 : 0 | | 413 256 -> 511 : 2 |** | 414 512 -> 1023 : 0 | | 415 1024 -> 2047 : 0 | | 416 2048 -> 4095 : 0 | | 417 4096 -> 8191 : 4 |***** | 418 8192 -> 16383 : 6 |******** | 419 16384 -> 32767 : 9 |************ | 420 32768 -> 65535 : 6 |******** | 421 65536 -> 131071 : 2 |** | 422 423 usecs : count distribution 424 0 -> 1 : 11 |****************************************| 425 2 -> 3 : 2 |******* | 426 4 -> 7 : 10 |************************************ | 427 8 -> 15 : 8 |***************************** | 428 16 -> 31 : 1 |*** | 429 32 -> 63 : 2 |******* | 430[...] 431``` 432 433Browse the code in [examples/tracing/vfsreadlat.py](../examples/tracing/vfsreadlat.py) and [examples/tracing/vfsreadlat.c](../examples/tracing/vfsreadlat.c). Things to learn: 434 4351. ```b = BPF(src_file = "vfsreadlat.c")```: Read the BPF C program from a separate source file. 4361. ```b.attach_kretprobe(event="vfs_read", fn_name="do_return")```: Attaches the BPF C function ```do_return()``` to the return of the kernel function ```vfs_read()```. This is a kretprobe: instrumenting the return from a function, rather than its entry. 4371. ```b["dist"].clear()```: Clears the histogram. 438 439### Lesson 12. urandomread.py 440 441Tracing while a ```dd if=/dev/urandom of=/dev/null bs=8k count=5``` is run: 442 443``` 444# ./urandomread.py 445TIME(s) COMM PID GOTBITS 44624652832.956994001 smtp 24690 384 44724652837.726500999 dd 24692 65536 44824652837.727111001 dd 24692 65536 44924652837.727703001 dd 24692 65536 45024652837.728294998 dd 24692 65536 45124652837.728888001 dd 24692 65536 452``` 453 454Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py): 455 456```Python 457from __future__ import print_function 458from bcc import BPF 459 460# load BPF program 461b = BPF(text=""" 462TRACEPOINT_PROBE(random, urandom_read) { 463 // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format 464 bpf_trace_printk("%d\\n", args->got_bits); 465 return 0; 466} 467""") 468 469# header 470print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS")) 471 472# format output 473while 1: 474 try: 475 (task, pid, cpu, flags, ts, msg) = b.trace_fields() 476 except ValueError: 477 continue 478 print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) 479``` 480 481Things to learn: 482 4831. ```TRACEPOINT_PROBE(random, urandom_read)```: Instrument the kernel tracepoint ```random:urandom_read```. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run ```perf list``` for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints. 4841. ```args->got_bits```: ```args``` is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg: 485 486``` 487# cat /sys/kernel/debug/tracing/events/random/urandom_read/format 488name: urandom_read 489ID: 972 490format: 491 field:unsigned short common_type; offset:0; size:2; signed:0; 492 field:unsigned char common_flags; offset:2; size:1; signed:0; 493 field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 494 field:int common_pid; offset:4; size:4; signed:1; 495 496 field:int got_bits; offset:8; size:4; signed:1; 497 field:int pool_left; offset:12; size:4; signed:1; 498 field:int input_left; offset:16; size:4; signed:1; 499 500print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left 501``` 502 503In this case, we were printing the ```got_bits``` member. 504 505### Lesson 13. disksnoop.py fixed 506 507Convert disksnoop.py from a previous lesson to use the ```block:block_rq_issue``` and ```block:block_rq_complete``` tracepoints. 508 509### Lesson 14. strlen_count.py 510 511This program instruments a user-level function, the ```strlen()``` library function, and frequency counts its string argument. Example output: 512 513``` 514# ./strlen_count.py 515Tracing strlen()... Hit Ctrl-C to end. 516^C COUNT STRING 517 1 " " 518 1 "/bin/ls" 519 1 "." 520 1 "cpudist.py.1" 521 1 ".bashrc" 522 1 "ls --color=auto" 523 1 "key_t" 524[...] 525 10 "a7:~# " 526 10 "/root" 527 12 "LC_ALL" 528 12 "en_US.UTF-8" 529 13 "en_US.UTF-8" 530 20 "~" 531 70 "#%^,~:-=?+/}" 532 340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# " 533``` 534 535These are various strings that are being processed by this library function while tracing, along with their frequency counts. ```strlen()``` was called on "LC_ALL" 12 times, for example. 536 537Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py): 538 539```Python 540from __future__ import print_function 541from bcc import BPF 542from time import sleep 543 544# load BPF program 545b = BPF(text=""" 546#include <uapi/linux/ptrace.h> 547 548struct key_t { 549 char c[80]; 550}; 551BPF_HASH(counts, struct key_t); 552 553int count(struct pt_regs *ctx) { 554 if (!PT_REGS_PARM1(ctx)) 555 return 0; 556 557 struct key_t key = {}; 558 u64 zero = 0, *val; 559 560 bpf_probe_read_user(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx)); 561 // could also use `counts.increment(key)` 562 val = counts.lookup_or_try_init(&key, &zero); 563 if (val) { 564 (*val)++; 565 } 566 return 0; 567}; 568""") 569b.attach_uprobe(name="c", sym="strlen", fn_name="count") 570 571# header 572print("Tracing strlen()... Hit Ctrl-C to end.") 573 574# sleep until Ctrl-C 575try: 576 sleep(99999999) 577except KeyboardInterrupt: 578 pass 579 580# print output 581print("%10s %s" % ("COUNT", "STRING")) 582counts = b.get_table("counts") 583for k, v in sorted(counts.items(), key=lambda counts: counts[1].value): 584 print("%10d \"%s\"" % (v.value, k.c.encode('string-escape'))) 585``` 586 587Things to learn: 588 5891. ```PT_REGS_PARM1(ctx)```: This fetches the first argument to ```strlen()```, which is the string. 5901. ```b.attach_uprobe(name="c", sym="strlen", fn_name="count")```: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function ```strlen()```, and on execution call our C function ```count()```. 591 592### Lesson 15. nodejs_http_server.py 593 594This program instruments a user statically-defined tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output: 595 596``` 597# ./nodejs_http_server.py 24728 598TIME(s) COMM PID ARGS 59924653324.561322998 node 24728 path:/index.html 60024653335.343401998 node 24728 path:/images/welcome.png 60124653340.510164998 node 24728 path:/images/favicon.png 602``` 603 604Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py): 605 606```Python 607from __future__ import print_function 608from bcc import BPF, USDT 609import sys 610 611if len(sys.argv) < 2: 612 print("USAGE: nodejs_http_server PID") 613 exit() 614pid = sys.argv[1] 615debug = 0 616 617# load BPF program 618bpf_text = """ 619#include <uapi/linux/ptrace.h> 620int do_trace(struct pt_regs *ctx) { 621 uint64_t addr; 622 char path[128]={0}; 623 bpf_usdt_readarg(6, ctx, &addr); 624 bpf_probe_read_user(&path, sizeof(path), (void *)addr); 625 bpf_trace_printk("path:%s\\n", path); 626 return 0; 627}; 628""" 629 630# enable USDT probe from given PID 631u = USDT(pid=int(pid)) 632u.enable_probe(probe="http__server__request", fn_name="do_trace") 633if debug: 634 print(u.get_text()) 635 print(bpf_text) 636 637# initialize BPF 638b = BPF(text=bpf_text, usdt_contexts=[u]) 639``` 640 641Things to learn: 642 6431. ```bpf_usdt_readarg(6, ctx, &addr)```: Read the address of argument 6 from the USDT probe into ```addr```. 6441. ```bpf_probe_read_user(&path, sizeof(path), (void *)addr)```: Now the string ```addr``` points to into our ```path``` variable. 6451. ```u = USDT(pid=int(pid))```: Initialize USDT tracing for the given PID. 6461. ```u.enable_probe(probe="http__server__request", fn_name="do_trace")```: Attach our ```do_trace()``` BPF C function to the Node.js ```http__server__request``` USDT probe. 6471. ```b = BPF(text=bpf_text, usdt_contexts=[u])```: Need to pass in our USDT object, ```u```, to BPF object creation. 648 649### Lesson 16. task_switch.c 650 651This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned. 652 653This is a slightly more complex tracing example than Hello World. This program 654will be invoked for every task change in the kernel, and record in a BPF map 655the new and old pids. 656 657The C program below introduces a new concept: the prev argument. This 658argument is treated specially by the BCC frontend, such that accesses 659to this variable are read from the saved context that is passed by the 660kprobe infrastructure. The prototype of the args starting from 661position 1 should match the prototype of the kernel function being 662kprobed. If done so, the program will have seamless access to the 663function parameters. 664 665```c 666#include <uapi/linux/ptrace.h> 667#include <linux/sched.h> 668 669struct key_t { 670 u32 prev_pid; 671 u32 curr_pid; 672}; 673 674BPF_HASH(stats, struct key_t, u64, 1024); 675int count_sched(struct pt_regs *ctx, struct task_struct *prev) { 676 struct key_t key = {}; 677 u64 zero = 0, *val; 678 679 key.curr_pid = bpf_get_current_pid_tgid(); 680 key.prev_pid = prev->pid; 681 682 // could also use `stats.increment(key);` 683 val = stats.lookup_or_try_init(&key, &zero); 684 if (val) { 685 (*val)++; 686 } 687 return 0; 688} 689``` 690 691The userspace component loads the file shown above, and attaches it to the 692`finish_task_switch` kernel function. 693The `[]` operator of the BPF object gives access to each BPF_HASH in the 694program, allowing pass-through access to the values residing in the kernel. Use 695the object as you would any other python dict object: read, update, and deletes 696are all allowed. 697```python 698from bcc import BPF 699from time import sleep 700 701b = BPF(src_file="task_switch.c") 702b.attach_kprobe(event="finish_task_switch", fn_name="count_sched") 703 704# generate many schedule events 705for i in range(0, 100): sleep(0.01) 706 707for k, v in b["stats"].items(): 708 print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value)) 709``` 710 711These programs can be found in the files [examples/tracing/task_switch.c](../examples/tracing/task_switch.c) and [examples/tracing/task_switch.py](../examples/tracing/task_switch.py) respectively. 712 713### Lesson 17. Further Study 714 715For further study, see Sasha Goldshtein's [linux-tracing-workshop](https://github.com/goldshtn/linux-tracing-workshop), which contains additional labs. There are also many tools in bcc /tools to study. 716 717Please read [CONTRIBUTING-SCRIPTS.md](../CONTRIBUTING-SCRIPTS.md) if you wish to contribute tools to bcc. At the bottom of the main [README.md](../README.md), you'll also find methods for contacting us. Good luck, and happy tracing! 718 719## Networking 720 721To do. 722