• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# bcc Python Developer Tutorial
2
3This tutorial is about developing [bcc](https://github.com/iovisor/bcc) tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences.
4
5Also see the bcc developer's [reference_guide.md](reference_guide.md), and a tutorial for end-users of tools: [tutorial.md](tutorial.md). There is also a lua interface for bcc.
6
7## Observability
8
9This observability tutorial contains 17 lessons, and 46 enumerated things to learn.
10
11### Lesson 1. Hello World
12
13Start by running [examples/hello_world.py](../examples/hello_world.py), while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see [INSTALL.md](../INSTALL.md).
14
15```
16# ./examples/hello_world.py
17            bash-13364 [002] d... 24573433.052937: : Hello, World!
18            bash-13364 [003] d... 24573436.642808: : Hello, World!
19[...]
20```
21
22Here's the code for hello_world.py:
23
24```Python
25from bcc import BPF
26BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
27```
28
29There are six things to learn from this:
30
311. ```text='...'```: This defines a BPF program inline. The program is written in C.
32
331. ```kprobe__sys_clone()```: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with ``kprobe__``, the rest is treated as a kernel function name to instrument, in this case, ```sys_clone()```.
34
351. ```void *ctx```: ctx has arguments, but since we aren't using them here, we'll just cast it to ```void *```.
36
371. ```bpf_trace_printk()```: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later.
38
391. ```return 0;```: Necessary formality (if you want to know why, see [#139](https://github.com/iovisor/bcc/issues/139)).
40
411. ```.trace_print()```: A bcc routine that reads trace_pipe and prints the output.
42
43### Lesson 2. sys_sync()
44
45Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running ```sync``` in another session while tracing. The hello_world.py program has everything you need for this.
46
47Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python.
48
49### Lesson 3. hello_fields.py
50
51This program is in [examples/tracing/hello_fields.py](../examples/tracing/hello_fields.py). Sample output (run commands in another session):
52
53```
54# ./examples/tracing/hello_fields.py
55TIME(s)            COMM             PID    MESSAGE
5624585001.174885999 sshd             1432   Hello, World!
5724585001.195710000 sshd             15780  Hello, World!
5824585001.991976000 systemd-udevd    484    Hello, World!
5924585002.276147000 bash             15787  Hello, World!
60```
61
62Code:
63
64```Python
65from bcc import BPF
66
67# define BPF program
68prog = """
69int hello(void *ctx) {
70    bpf_trace_printk("Hello, World!\\n");
71    return 0;
72}
73"""
74
75# load BPF program
76b = BPF(text=prog)
77b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
78
79# header
80print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
81
82# format output
83while 1:
84    try:
85        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
86    except ValueError:
87        continue
88    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
89```
90
91This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn:
92
931. ```prog =```: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments.
94
951. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it.
96
971. ```b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")```: Creates a kprobe for the kernel clone system call function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions.
98
991. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT().
100
101### Lesson 4. sync_timing.py
102
103Remember the days of sysadmins typing ```sync``` three times on a slow console before ```reboot```, to give the first asynchronous sync time to complete? Then someone thought ```sync;sync;sync``` was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway.
104
105The following example times how quickly the ```do_sync``` function is called, and prints output if it has been called more recently than one second ago. A ```sync;sync;sync``` will print output for the 2nd and 3rd sync's:
106
107```
108# ./examples/tracing/sync_timing.py
109Tracing for quick sync's... Ctrl-C to end
110At time 0.00 s: multiple syncs detected, last 95 ms ago
111At time 0.10 s: multiple syncs detected, last 96 ms ago
112```
113
114This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py):
115
116```Python
117from __future__ import print_function
118from bcc import BPF
119
120# load BPF program
121b = BPF(text="""
122#include <uapi/linux/ptrace.h>
123
124BPF_HASH(last);
125
126int do_trace(struct pt_regs *ctx) {
127    u64 ts, *tsp, delta, key = 0;
128
129    // attempt to read stored timestamp
130    tsp = last.lookup(&key);
131    if (tsp != NULL) {
132        delta = bpf_ktime_get_ns() - *tsp;
133        if (delta < 1000000000) {
134            // output if time is less than 1 second
135            bpf_trace_printk("%d\\n", delta / 1000000);
136        }
137        last.delete(&key);
138    }
139
140    // update stored timestamp
141    ts = bpf_ktime_get_ns();
142    last.update(&key, &ts);
143    return 0;
144}
145""")
146
147b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace")
148print("Tracing for quick sync's... Ctrl-C to end")
149
150# format output
151start = 0
152while 1:
153    (task, pid, cpu, flags, ts, ms) = b.trace_fields()
154    if start == 0:
155        start = ts
156    ts = ts - start
157    print("At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms))
158```
159
160Things to learn:
161
1621. ```bpf_ktime_get_ns()```: Returns the time as nanoseconds.
1631. ```BPF_HASH(last)```: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64.
1641. ```key = 0```: We'll only store one key/value pair in this hash, where the key is hardwired to zero.
1651. ```last.lookup(&key)```: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer.
1661. ```if (tsp != NULL) {```: The verifier requires that pointer values derived from a map lookup must be checked for a null value before they can be dereferenced and used.
1671. ```last.delete(&key)```: Delete the key from the hash. This is currently required because of [a kernel bug in `.update()`](https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02) (fixed in 4.8.10).
1681. ```last.update(&key, &ts)```: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp.
169
170### Lesson 5. sync_count.py
171
172Modify the sync_timing.py program (prior lesson) to store the count of all kernel sync system calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash.
173
174### Lesson 6. disksnoop.py
175
176Browse the [examples/tracing/disksnoop.py](../examples/tracing/disksnoop.py) program to see what is new. Here is some sample output:
177
178```
179# ./disksnoop.py
180TIME(s)            T  BYTES    LAT(ms)
18116458043.436012    W  4096        3.13
18216458043.437326    W  4096        4.44
18316458044.126545    R  4096       42.82
18416458044.129872    R  4096        3.24
185[...]
186```
187
188And a code snippet:
189
190```Python
191[...]
192REQ_WRITE = 1		# from include/linux/blk_types.h
193
194# load BPF program
195b = BPF(text="""
196#include <uapi/linux/ptrace.h>
197#include <linux/blkdev.h>
198
199BPF_HASH(start, struct request *);
200
201void trace_start(struct pt_regs *ctx, struct request *req) {
202	// stash start timestamp by request ptr
203	u64 ts = bpf_ktime_get_ns();
204
205	start.update(&req, &ts);
206}
207
208void trace_completion(struct pt_regs *ctx, struct request *req) {
209	u64 *tsp, delta;
210
211	tsp = start.lookup(&req);
212	if (tsp != 0) {
213		delta = bpf_ktime_get_ns() - *tsp;
214		bpf_trace_printk("%d %x %d\\n", req->__data_len,
215		    req->cmd_flags, delta / 1000);
216		start.delete(&req);
217	}
218}
219""")
220
221b.attach_kprobe(event="blk_start_request", fn_name="trace_start")
222b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start")
223b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion")
224[...]
225```
226
227Things to learn:
228
2291. ```REQ_WRITE```: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes.
2301. ```trace_start(struct pt_regs *ctx, struct request *req)```: This function will later be attached to kprobes. The arguments to kprobe functions are ```struct pt_regs *ctx```, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is ```struct request *```.
2311. ```start.update(&req, &ts)```: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return).
2321. ```req->__data_len```: We're dereferencing members of ```struct request```. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of ```bpf_probe_read_kernel()``` calls. Sometimes bcc can't handle a complex dereference, and you need to call ```bpf_probe_read_kernel()``` directly.
233
234This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next.
235
236### Lesson 7. hello_perf_output.py
237
238Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session:
239
240```
241# ./hello_perf_output.py
242TIME(s)            COMM             PID    MESSAGE
2430.000000000        bash             22986  Hello, perf_output!
2440.021080275        systemd-udevd    484    Hello, perf_output!
2450.021359520        systemd-udevd    484    Hello, perf_output!
2460.021590610        systemd-udevd    484    Hello, perf_output!
247[...]
248```
249
250Code is [examples/tracing/hello_perf_output.py](../examples/tracing/hello_perf_output.py):
251
252```Python
253from bcc import BPF
254
255# define BPF program
256prog = """
257#include <linux/sched.h>
258
259// define output data structure in C
260struct data_t {
261    u32 pid;
262    u64 ts;
263    char comm[TASK_COMM_LEN];
264};
265BPF_PERF_OUTPUT(events);
266
267int hello(struct pt_regs *ctx) {
268    struct data_t data = {};
269
270    data.pid = bpf_get_current_pid_tgid();
271    data.ts = bpf_ktime_get_ns();
272    bpf_get_current_comm(&data.comm, sizeof(data.comm));
273
274    events.perf_submit(ctx, &data, sizeof(data));
275
276    return 0;
277}
278"""
279
280# load BPF program
281b = BPF(text=prog)
282b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
283
284# header
285print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
286
287# process event
288start = 0
289def print_event(cpu, data, size):
290    global start
291    event = b["events"].event(data)
292    if start == 0:
293            start = event.ts
294    time_s = (float(event.ts - start)) / 1000000000
295    print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid,
296        "Hello, perf_output!"))
297
298# loop with callback to print_event
299b["events"].open_perf_buffer(print_event)
300while 1:
301    b.perf_buffer_poll()
302```
303
304Things to learn:
305
3061. ```struct data_t```: This defines the C struct we'll use to pass data from kernel to user space.
3071. ```BPF_PERF_OUTPUT(events)```: This names our output channel "events".
3081. ```struct data_t data = {};```: Create an empty data_t struct that we'll then populate.
3091. ```bpf_get_current_pid_tgid()```: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user.
3101. ```bpf_get_current_comm()```: Populates the first argument address with the current process name.
3111. ```events.perf_submit()```: Submit the event for user space to read via a perf ring buffer.
3121. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream.
3131. ```b["events"].event(data)```: Now get the event as a Python object, auto-generated from the C declaration.
3141. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream.
3151. ```while 1: b.perf_buffer_poll()```: Block waiting for events.
316
317### Lesson 8. sync_perf_output.py
318
319Rewrite sync_timing.py, from a prior lesson, to use ```BPF_PERF_OUTPUT```.
320
321### Lesson 9. bitehist.py
322
323The following tool records a histogram of disk I/O sizes. Sample output:
324
325```
326# ./bitehist.py
327Tracing... Hit Ctrl-C to end.
328^C
329     kbytes          : count     distribution
330       0 -> 1        : 3        |                                      |
331       2 -> 3        : 0        |                                      |
332       4 -> 7        : 211      |**********                            |
333       8 -> 15       : 0        |                                      |
334      16 -> 31       : 0        |                                      |
335      32 -> 63       : 0        |                                      |
336      64 -> 127      : 1        |                                      |
337     128 -> 255      : 800      |**************************************|
338```
339
340Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py):
341
342```Python
343from __future__ import print_function
344from bcc import BPF
345from time import sleep
346
347# load BPF program
348b = BPF(text="""
349#include <uapi/linux/ptrace.h>
350#include <linux/blkdev.h>
351
352BPF_HISTOGRAM(dist);
353
354int kprobe__blk_account_io_done(struct pt_regs *ctx, struct request *req)
355{
356	dist.increment(bpf_log2l(req->__data_len / 1024));
357	return 0;
358}
359""")
360
361# header
362print("Tracing... Hit Ctrl-C to end.")
363
364# trace until Ctrl-C
365try:
366	sleep(99999999)
367except KeyboardInterrupt:
368	print()
369
370# output
371b["dist"].print_log2_hist("kbytes")
372```
373
374A recap from earlier lessons:
375
376- ```kprobe__```: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe.
377- ```struct pt_regs *ctx, struct request *req```: Arguments to kprobe. The ```ctx``` is registers and BPF context, the ```req``` is the first argument to the instrumented function: ```blk_account_io_done()```.
378- ```req->__data_len```: Dereferencing that member.
379
380New things to learn:
381
3821. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist".
3831. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument.
3841. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram.
3851. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient.
386
387### Lesson 10. disklatency.py
388
389Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson.
390
391### Lesson 11. vfsreadlat.py
392
393This example is split into separate Python and C files. Example output:
394
395```
396# ./vfsreadlat.py 1
397Tracing... Hit Ctrl-C to end.
398     usecs               : count     distribution
399         0 -> 1          : 0        |                                        |
400         2 -> 3          : 2        |***********                             |
401         4 -> 7          : 7        |****************************************|
402         8 -> 15         : 4        |**********************                  |
403
404     usecs               : count     distribution
405         0 -> 1          : 29       |****************************************|
406         2 -> 3          : 28       |**************************************  |
407         4 -> 7          : 4        |*****                                   |
408         8 -> 15         : 8        |***********                             |
409        16 -> 31         : 0        |                                        |
410        32 -> 63         : 0        |                                        |
411        64 -> 127        : 0        |                                        |
412       128 -> 255        : 0        |                                        |
413       256 -> 511        : 2        |**                                      |
414       512 -> 1023       : 0        |                                        |
415      1024 -> 2047       : 0        |                                        |
416      2048 -> 4095       : 0        |                                        |
417      4096 -> 8191       : 4        |*****                                   |
418      8192 -> 16383      : 6        |********                                |
419     16384 -> 32767      : 9        |************                            |
420     32768 -> 65535      : 6        |********                                |
421     65536 -> 131071     : 2        |**                                      |
422
423     usecs               : count     distribution
424         0 -> 1          : 11       |****************************************|
425         2 -> 3          : 2        |*******                                 |
426         4 -> 7          : 10       |************************************    |
427         8 -> 15         : 8        |*****************************           |
428        16 -> 31         : 1        |***                                     |
429        32 -> 63         : 2        |*******                                 |
430[...]
431```
432
433Browse the code in [examples/tracing/vfsreadlat.py](../examples/tracing/vfsreadlat.py) and [examples/tracing/vfsreadlat.c](../examples/tracing/vfsreadlat.c). Things to learn:
434
4351. ```b = BPF(src_file = "vfsreadlat.c")```: Read the BPF C program from a separate source file.
4361. ```b.attach_kretprobe(event="vfs_read", fn_name="do_return")```: Attaches the BPF C function ```do_return()``` to the return of the kernel function ```vfs_read()```. This is a kretprobe: instrumenting the return from a function, rather than its entry.
4371. ```b["dist"].clear()```: Clears the histogram.
438
439### Lesson 12. urandomread.py
440
441Tracing while a ```dd if=/dev/urandom of=/dev/null bs=8k count=5``` is run:
442
443```
444# ./urandomread.py
445TIME(s)            COMM             PID    GOTBITS
44624652832.956994001 smtp             24690  384
44724652837.726500999 dd               24692  65536
44824652837.727111001 dd               24692  65536
44924652837.727703001 dd               24692  65536
45024652837.728294998 dd               24692  65536
45124652837.728888001 dd               24692  65536
452```
453
454Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py):
455
456```Python
457from __future__ import print_function
458from bcc import BPF
459
460# load BPF program
461b = BPF(text="""
462TRACEPOINT_PROBE(random, urandom_read) {
463    // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
464    bpf_trace_printk("%d\\n", args->got_bits);
465    return 0;
466}
467""")
468
469# header
470print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS"))
471
472# format output
473while 1:
474    try:
475        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
476    except ValueError:
477        continue
478    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
479```
480
481Things to learn:
482
4831. ```TRACEPOINT_PROBE(random, urandom_read)```: Instrument the kernel tracepoint ```random:urandom_read```. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run ```perf list``` for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints.
4841. ```args->got_bits```: ```args``` is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg:
485
486```
487# cat /sys/kernel/debug/tracing/events/random/urandom_read/format
488name: urandom_read
489ID: 972
490format:
491	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
492	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
493	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
494	field:int common_pid;	offset:4;	size:4;	signed:1;
495
496	field:int got_bits;	offset:8;	size:4;	signed:1;
497	field:int pool_left;	offset:12;	size:4;	signed:1;
498	field:int input_left;	offset:16;	size:4;	signed:1;
499
500print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left
501```
502
503In this case, we were printing the ```got_bits``` member.
504
505### Lesson 13. disksnoop.py fixed
506
507Convert disksnoop.py from a previous lesson to use the ```block:block_rq_issue``` and ```block:block_rq_complete``` tracepoints.
508
509### Lesson 14. strlen_count.py
510
511This program instruments a user-level function, the ```strlen()``` library function, and frequency counts its string argument. Example output:
512
513```
514# ./strlen_count.py
515Tracing strlen()... Hit Ctrl-C to end.
516^C     COUNT STRING
517         1 " "
518         1 "/bin/ls"
519         1 "."
520         1 "cpudist.py.1"
521         1 ".bashrc"
522         1 "ls --color=auto"
523         1 "key_t"
524[...]
525        10 "a7:~# "
526        10 "/root"
527        12 "LC_ALL"
528        12 "en_US.UTF-8"
529        13 "en_US.UTF-8"
530        20 "~"
531        70 "#%^,~:-=?+/}"
532       340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# "
533```
534
535These are various strings that are being processed by this library function while tracing, along with their frequency counts. ```strlen()``` was called on "LC_ALL" 12 times, for example.
536
537Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py):
538
539```Python
540from __future__ import print_function
541from bcc import BPF
542from time import sleep
543
544# load BPF program
545b = BPF(text="""
546#include <uapi/linux/ptrace.h>
547
548struct key_t {
549    char c[80];
550};
551BPF_HASH(counts, struct key_t);
552
553int count(struct pt_regs *ctx) {
554    if (!PT_REGS_PARM1(ctx))
555        return 0;
556
557    struct key_t key = {};
558    u64 zero = 0, *val;
559
560    bpf_probe_read_user(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
561    // could also use `counts.increment(key)`
562    val = counts.lookup_or_try_init(&key, &zero);
563    if (val) {
564      (*val)++;
565    }
566    return 0;
567};
568""")
569b.attach_uprobe(name="c", sym="strlen", fn_name="count")
570
571# header
572print("Tracing strlen()... Hit Ctrl-C to end.")
573
574# sleep until Ctrl-C
575try:
576    sleep(99999999)
577except KeyboardInterrupt:
578    pass
579
580# print output
581print("%10s %s" % ("COUNT", "STRING"))
582counts = b.get_table("counts")
583for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
584    print("%10d \"%s\"" % (v.value, k.c.encode('string-escape')))
585```
586
587Things to learn:
588
5891. ```PT_REGS_PARM1(ctx)```: This fetches the first argument to ```strlen()```, which is the string.
5901. ```b.attach_uprobe(name="c", sym="strlen", fn_name="count")```: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function ```strlen()```, and on execution call our C function ```count()```.
591
592### Lesson 15. nodejs_http_server.py
593
594This program instruments a user statically-defined tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output:
595
596```
597# ./nodejs_http_server.py 24728
598TIME(s)            COMM             PID    ARGS
59924653324.561322998 node             24728  path:/index.html
60024653335.343401998 node             24728  path:/images/welcome.png
60124653340.510164998 node             24728  path:/images/favicon.png
602```
603
604Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py):
605
606```Python
607from __future__ import print_function
608from bcc import BPF, USDT
609import sys
610
611if len(sys.argv) < 2:
612    print("USAGE: nodejs_http_server PID")
613    exit()
614pid = sys.argv[1]
615debug = 0
616
617# load BPF program
618bpf_text = """
619#include <uapi/linux/ptrace.h>
620int do_trace(struct pt_regs *ctx) {
621    uint64_t addr;
622    char path[128]={0};
623    bpf_usdt_readarg(6, ctx, &addr);
624    bpf_probe_read_user(&path, sizeof(path), (void *)addr);
625    bpf_trace_printk("path:%s\\n", path);
626    return 0;
627};
628"""
629
630# enable USDT probe from given PID
631u = USDT(pid=int(pid))
632u.enable_probe(probe="http__server__request", fn_name="do_trace")
633if debug:
634    print(u.get_text())
635    print(bpf_text)
636
637# initialize BPF
638b = BPF(text=bpf_text, usdt_contexts=[u])
639```
640
641Things to learn:
642
6431. ```bpf_usdt_readarg(6, ctx, &addr)```: Read the address of argument 6 from the USDT probe into ```addr```.
6441. ```bpf_probe_read_user(&path, sizeof(path), (void *)addr)```: Now the string ```addr``` points to into our ```path``` variable.
6451. ```u = USDT(pid=int(pid))```: Initialize USDT tracing for the given PID.
6461. ```u.enable_probe(probe="http__server__request", fn_name="do_trace")```: Attach our ```do_trace()``` BPF C function to the Node.js ```http__server__request``` USDT probe.
6471. ```b = BPF(text=bpf_text, usdt_contexts=[u])```: Need to pass in our USDT object, ```u```, to BPF object creation.
648
649### Lesson 16. task_switch.c
650
651This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned.
652
653This is a slightly more complex tracing example than Hello World. This program
654will be invoked for every task change in the kernel, and record in a BPF map
655the new and old pids.
656
657The C program below introduces a new concept: the prev argument. This
658argument is treated specially by the BCC frontend, such that accesses
659to this variable are read from the saved context that is passed by the
660kprobe infrastructure. The prototype of the args starting from
661position 1 should match the prototype of the kernel function being
662kprobed. If done so, the program will have seamless access to the
663function parameters.
664
665```c
666#include <uapi/linux/ptrace.h>
667#include <linux/sched.h>
668
669struct key_t {
670    u32 prev_pid;
671    u32 curr_pid;
672};
673
674BPF_HASH(stats, struct key_t, u64, 1024);
675int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
676    struct key_t key = {};
677    u64 zero = 0, *val;
678
679    key.curr_pid = bpf_get_current_pid_tgid();
680    key.prev_pid = prev->pid;
681
682    // could also use `stats.increment(key);`
683    val = stats.lookup_or_try_init(&key, &zero);
684    if (val) {
685      (*val)++;
686    }
687    return 0;
688}
689```
690
691The userspace component loads the file shown above, and attaches it to the
692`finish_task_switch` kernel function.
693The `[]` operator of the BPF object gives access to each BPF_HASH in the
694program, allowing pass-through access to the values residing in the kernel. Use
695the object as you would any other python dict object: read, update, and deletes
696are all allowed.
697```python
698from bcc import BPF
699from time import sleep
700
701b = BPF(src_file="task_switch.c")
702b.attach_kprobe(event="finish_task_switch", fn_name="count_sched")
703
704# generate many schedule events
705for i in range(0, 100): sleep(0.01)
706
707for k, v in b["stats"].items():
708    print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value))
709```
710
711These programs can be found in the files [examples/tracing/task_switch.c](../examples/tracing/task_switch.c) and [examples/tracing/task_switch.py](../examples/tracing/task_switch.py) respectively.
712
713### Lesson 17. Further Study
714
715For further study, see Sasha Goldshtein's [linux-tracing-workshop](https://github.com/goldshtn/linux-tracing-workshop), which contains additional labs. There are also many tools in bcc /tools to study.
716
717Please read [CONTRIBUTING-SCRIPTS.md](../CONTRIBUTING-SCRIPTS.md) if you wish to contribute tools to bcc. At the bottom of the main [README.md](../README.md), you'll also find methods for contacting us. Good luck, and happy tracing!
718
719## Networking
720
721To do.
722