• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Executable commands reference
2
3[TOC]
4
5## How simpleperf works
6
7Modern CPUs have a hardware component called the performance monitoring unit (PMU). The PMU has
8several hardware counters, counting events like how many cpu cycles have happened, how many
9instructions have executed, or how many cache misses have happened.
10
11The Linux kernel wraps these hardware counters into hardware perf events. In addition, the Linux
12kernel also provides hardware independent software events and tracepoint events. The Linux kernel
13exposes all events to userspace via the perf_event_open system call, which is used by simpleperf.
14
15Simpleperf has three main commands: stat, record and report.
16
17The stat command gives a summary of how many events have happened in the profiled processes in a
18time period. Here’s how it works:
191. Given user options, simpleperf enables profiling by making a system call to the kernel.
202. The kernel enables counters while the profiled processes are running.
213. After profiling, simpleperf reads counters from the kernel, and reports a counter summary.
22
23The record command records samples of the profiled processes in a time period. Here’s how it works:
241. Given user options, simpleperf enables profiling by making a system call to the kernel.
252. Simpleperf creates mapped buffers between simpleperf and the kernel.
263. The kernel enables counters while the profiled processes are running.
274. Each time a given number of events happen, the kernel dumps a sample to the mapped buffers.
285. Simpleperf reads samples from the mapped buffers and stores profiling data in a file called
29   perf.data.
30
31The report command reads perf.data and any shared libraries used by the profiled processes,
32and outputs a report showing where the time was spent.
33
34## Commands
35
36Simpleperf supports several commands, listed below:
37
38```
39The debug-unwind command: debug/test dwarf based offline unwinding, used for debugging simpleperf.
40The dump command: dumps content in perf.data, used for debugging simpleperf.
41The help command: prints help information for other commands.
42The kmem command: collects kernel memory allocation information (will be replaced by Python scripts).
43The list command: lists all event types supported on the Android device.
44The record command: profiles processes and stores profiling data in perf.data.
45The report command: reports profiling data in perf.data.
46The report-sample command: reports each sample in perf.data, used for supporting integration of
47                           simpleperf in Android Studio.
48The stat command: profiles processes and prints counter summary.
49
50```
51
52Each command supports different options, which can be seen through help message.
53
54```sh
55# List all commands.
56$ simpleperf --help
57
58# Print help message for record command.
59$ simpleperf record --help
60```
61
62Below describes the most frequently used commands, which are list, stat, record and report.
63
64## The list command
65
66The list command lists all events available on the device. Different devices may support different
67events because they have different hardware and kernels.
68
69```sh
70$ simpleperf list
71List of hw-cache events:
72  branch-loads
73  ...
74List of hardware events:
75  cpu-cycles
76  instructions
77  ...
78List of software events:
79  cpu-clock
80  task-clock
81  ...
82```
83
84On ARM/ARM64, the list command also shows a list of raw events, they are the events supported by
85the ARM PMU on the device. The kernel has wrapped part of them into hardware events and hw-cache
86events. For example, raw-cpu-cycles is wrapped into cpu-cycles, raw-instruction-retired is wrapped
87into instructions. The raw events are provided in case we want to use some events supported on the
88device, but unfortunately not wrapped by the kernel.
89
90## The stat command
91
92The stat command is used to get event counter values of the profiled processes. By passing options,
93we can select which events to use, which processes/threads to monitor, how long to monitor and the
94print interval.
95
96```sh
97# Stat using default events (cpu-cycles,instructions,...), and monitor process 7394 for 10 seconds.
98$ simpleperf stat -p 7394 --duration 10
99Performance counter statistics:
100
101 1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
102   510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
103     4,692,338  branch-misses      # 468.118 K/sec                    (100%)
104886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
105           753  context-switches   # 75.121 /sec                      (100%)
106           870  page-faults        # 86.793 /sec                      (100%)
107
108Total test time: 10.023829 seconds.
109```
110
111### Select events to stat
112
113We can select which events to use via -e.
114
115```sh
116# Stat event cpu-cycles.
117$ simpleperf stat -e cpu-cycles -p 11904 --duration 10
118
119# Stat event cache-references and cache-misses.
120$ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
121```
122
123When running the stat command, if the number of hardware events is larger than the number of
124hardware counters available in the PMU, the kernel shares hardware counters between events, so each
125event is only monitored for part of the total time. In the example below, there is a percentage at
126the end of each row, showing the percentage of the total time that each event was actually
127monitored.
128
129```sh
130# Stat using event cache-references, cache-references:u,....
131$ simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k \
132      -e cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
133Performance counter statistics:
134
1354,331,018  cache-references     # 4.861 M/sec    (87%)
1363,064,089  cache-references:u   # 3.439 M/sec    (87%)
1371,364,959  cache-references:k   # 1.532 M/sec    (87%)
138   91,721  cache-misses         # 102.918 K/sec  (87%)
139   45,735  cache-misses:u       # 51.327 K/sec   (87%)
140   38,447  cache-misses:k       # 43.131 K/sec   (87%)
1419,688,515  instructions         # 10.561 M/sec   (89%)
142
143Total test time: 1.026802 seconds.
144```
145
146In the example above, each event is monitored about 87% of the total time. But there is no
147guarantee that any pair of events are always monitored at the same time. If we want to have some
148events monitored at the same time, we can use --group.
149
150```sh
151# Stat using event cache-references, cache-references:u,....
152$ simpleperf stat -p 7964 --group cache-references,cache-misses \
153      --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k \
154      -e instructions --duration 1
155Performance counter statistics:
156
1573,638,900  cache-references     # 4.786 M/sec          (74%)
158   65,171  cache-misses         # 1.790953% miss rate  (74%)
1592,390,433  cache-references:u   # 3.153 M/sec          (74%)
160   32,280  cache-misses:u       # 1.350383% miss rate  (74%)
161  879,035  cache-references:k   # 1.251 M/sec          (68%)
162   30,303  cache-misses:k       # 3.447303% miss rate  (68%)
1638,921,161  instructions         # 10.070 M/sec         (86%)
164
165Total test time: 1.029843 seconds.
166```
167
168### Select target to stat
169
170We can select which processes or threads to monitor via -p or -t. Monitoring a
171process is the same as monitoring all threads in the process. Simpleperf can also fork a child
172process to run the new command and then monitor the child process.
173
174```sh
175# Stat process 11904 and 11905.
176$ simpleperf stat -p 11904,11905 --duration 10
177
178# Stat thread 11904 and 11905.
179$ simpleperf stat -t 11904,11905 --duration 10
180
181# Start a child process running `ls`, and stat it.
182$ simpleperf stat ls
183
184# Stat the process of an Android application. This only works for debuggable apps on non-rooted
185# devices.
186$ simpleperf stat --app simpleperf.example.cpp
187
188# Stat system wide using -a.
189$ simpleperf stat -a --duration 10
190```
191
192### Decide how long to stat
193
194When monitoring existing threads, we can use --duration to decide how long to monitor. When
195monitoring a child process running a new command, simpleperf monitors until the child process ends.
196In this case, we can use Ctrl-C to stop monitoring at any time.
197
198```sh
199# Stat process 11904 for 10 seconds.
200$ simpleperf stat -p 11904 --duration 10
201
202# Stat until the child process running `ls` finishes.
203$ simpleperf stat ls
204
205# Stop monitoring using Ctrl-C.
206$ simpleperf stat -p 11904 --duration 10
207^C
208```
209
210If you want to write a script to control how long to monitor, you can send one of SIGINT, SIGTERM,
211SIGHUP signals to simpleperf to stop monitoring.
212
213### Decide the print interval
214
215When monitoring perf counters, we can also use --interval to decide the print interval.
216
217```sh
218# Print stat for process 11904 every 300ms.
219$ simpleperf stat -p 11904 --duration 10 --interval 300
220
221# Print system wide stat at interval of 300ms for 10 seconds. Note that system wide profiling needs
222# root privilege.
223$ su 0 simpleperf stat -a --duration 10 --interval 300
224```
225
226### Display counters in systrace
227
228Simpleperf can also work with systrace to dump counters in the collected trace. Below is an example
229to do a system wide stat.
230
231```sh
232# Capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15
233# seconds.
234$ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
235# On host launch systrace to collect trace for 10 seconds.
236(HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
237# Open the collected new.html in browser and perf counters will be shown up.
238```
239
240### Show event count per thread
241
242By default, stat cmd outputs an event count sum for all monitored targets. But when `--per-thread`
243option is used, stat cmd outputs an event count for each thread in monitored targets. It can be
244used to find busy threads in a process or system wide. With `--per-thread` option, stat cmd opens
245a perf_event_file for each exisiting thread. If a monitored thread creates new threads, event
246count for new threads will be added to the monitored thread by default, otherwise omitted if
247`--no-inherit` option is also used.
248
249```sh
250# Print event counts for each thread in process 11904. Event counts for threads created after
251# stat cmd will be added to threads creating them.
252$ simpleperf stat --per-thread -p 11904 --duration 1
253
254# Print event counts for all threads running in the system every 1s. Threads not running will not
255# be reported.
256$ su 0 simpleperf stat --per-thread -a --interval 1000 --interval-only-values
257
258# Print event counts for all threads running in the system every 1s. Event counts for threads
259# created after stat cmd will be omitted.
260$ su 0 simpleperf stat --per-thread -a --interval 1000 --interval-only-values --no-inherit
261```
262
263### Show event count per core
264
265By default, stat cmd outputs an event count sum for all monitored cpu cores. But when `--per-core`
266option is used, stat cmd outputs an event count for each core. It can be used to see how events
267are distributed on different cores.
268When stating non-system wide with `--per-core` option, simpleperf creates a perf event for each
269monitored thread on each core. When a thread is in running state, perf events on all cores are
270enabled, but only the perf event on the core running the thread is in running state. So the
271percentage comment shows runtime_on_a_core / runtime_on_all_cores. Note that, percentage is still
272affected by hardware counter multiplexing. Check simpleperf log output for ways to distinguish it.
273
274```sh
275# Print event counts for each cpu running threads in process 11904.
276# A percentage shows runtime_on_a_cpu / runtime_on_all_cpus.
277$ simpleperf stat --per-core -p 11904 --duration 1
278Performance counter statistics:
279
280# cpu       count  event_name   # percentage = event_run_time / enabled_time
281  7    56,552,838  cpu-cycles   #   (60%)
282  3    25,958,605  cpu-cycles   #   (20%)
283  0    22,822,698  cpu-cycles   #   (15%)
284  1     6,661,495  cpu-cycles   #   (5%)
285  4     1,519,093  cpu-cycles   #   (0%)
286
287Total test time: 1.001082 seconds.
288
289# Print event counts for each cpu system wide.
290$ su 0 simpleperf stat --per-core -a --duration 1
291
292# Print cpu-cycle event counts for each cpu for each thread running in the system.
293$ su 0 simpleperf stat -e cpu-cycles -a --per-thread --per-core --duration 1
294```
295
296## The record command
297
298The record command is used to dump samples of the profiled processes. Each sample can contain
299information like the time at which the sample was generated, the number of events since last
300sample, the program counter of a thread, the call chain of a thread.
301
302By passing options, we can select which events to use, which processes/threads to monitor,
303what frequency to dump samples, how long to monitor, and where to store samples.
304
305```sh
306# Record on process 7394 for 10 seconds, using default event (cpu-cycles), using default sample
307# frequency (4000 samples per second), writing records to perf.data.
308$ simpleperf record -p 7394 --duration 10
309simpleperf I cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
310```
311
312### Select events to record
313
314By default, the cpu-cycles event is used to evaluate consumed cpu cycles. But we can also use other
315events via -e.
316
317```sh
318# Record using event instructions.
319$ simpleperf record -e instructions -p 11904 --duration 10
320
321# Record using task-clock, which shows the passed CPU time in nanoseconds.
322$ simpleperf record -e task-clock -p 11904 --duration 10
323```
324
325### Select target to record
326
327The way to select target in record command is similar to that in the stat command.
328
329```sh
330# Record process 11904 and 11905.
331$ simpleperf record -p 11904,11905 --duration 10
332
333# Record thread 11904 and 11905.
334$ simpleperf record -t 11904,11905 --duration 10
335
336# Record a child process running `ls`.
337$ simpleperf record ls
338
339# Record the process of an Android application. This only works for debuggable apps on non-rooted
340# devices.
341$ simpleperf record --app simpleperf.example.cpp
342
343# Record system wide.
344$ simpleperf record -a --duration 10
345```
346
347### Set the frequency to record
348
349We can set the frequency to dump records via -f or -c. For example, -f 4000 means
350dumping approximately 4000 records every second when the monitored thread runs. If a monitored
351thread runs 0.2s in one second (it can be preempted or blocked in other times), simpleperf dumps
352about 4000 * 0.2 / 1.0 = 800 records every second. Another way is using -c. For example, -c 10000
353means dumping one record whenever 10000 events happen.
354
355```sh
356# Record with sample frequency 1000: sample 1000 times every second running.
357$ simpleperf record -f 1000 -p 11904,11905 --duration 10
358
359# Record with sample period 100000: sample 1 time every 100000 events.
360$ simpleperf record -c 100000 -t 11904,11905 --duration 10
361```
362
363To avoid taking too much time generating samples, kernel >= 3.10 sets the max percent of cpu time
364used for generating samples (default is 25%), and decreases the max allowed sample frequency when
365hitting that limit. Simpleperf uses --cpu-percent option to adjust it, but it needs either root
366privilege or to be on Android >= Q.
367
368```sh
369# Record with sample frequency 10000, with max allowed cpu percent to be 50%.
370$ simpleperf record -f 1000 -p 11904,11905 --duration 10 --cpu-percent 50
371```
372
373### Decide how long to record
374
375The way to decide how long to monitor in record command is similar to that in the stat command.
376
377```sh
378# Record process 11904 for 10 seconds.
379$ simpleperf record -p 11904 --duration 10
380
381# Record until the child process running `ls` finishes.
382$ simpleperf record ls
383
384# Stop monitoring using Ctrl-C.
385$ simpleperf record -p 11904 --duration 10
386^C
387```
388
389If you want to write a script to control how long to monitor, you can send one of SIGINT, SIGTERM,
390SIGHUP signals to simpleperf to stop monitoring.
391
392### Set the path to store profiling data
393
394By default, simpleperf stores profiling data in perf.data in the current directory. But the path
395can be changed using -o.
396
397```sh
398# Write records to data/perf2.data.
399$ simpleperf record -p 11904 -o data/perf2.data --duration 10
400```
401
402#### Record call graphs
403
404A call graph is a tree showing function call relations. Below is an example.
405
406```
407main() {
408    FunctionOne();
409    FunctionTwo();
410}
411FunctionOne() {
412    FunctionTwo();
413    FunctionThree();
414}
415a call graph:
416    main-> FunctionOne
417       |    |
418       |    |-> FunctionTwo
419       |    |-> FunctionThree
420       |
421       |-> FunctionTwo
422```
423
424A call graph shows how a function calls other functions, and a reversed call graph shows how
425a function is called by other functions. To show a call graph, we need to first record it, then
426report it.
427
428There are two ways to record a call graph, one is recording a dwarf based call graph, the other is
429recording a stack frame based call graph. Recording dwarf based call graphs needs support of debug
430information in native binaries. While recording stack frame based call graphs needs support of
431stack frame registers.
432
433```sh
434# Record a dwarf based call graph
435$ simpleperf record -p 11904 -g --duration 10
436
437# Record a stack frame based call graph
438$ simpleperf record -p 11904 --call-graph fp --duration 10
439```
440
441[Here](README.md#suggestions-about-recording-call-graphs) are some suggestions about recording call graphs.
442
443### Record both on CPU time and off CPU time
444
445Simpleperf is a CPU profiler, which generates samples for a thread only when it is running on a
446CPU. But sometimes we want to know where the thread time is spent off-cpu (like preempted by other
447threads, blocked in IO or waiting for some events). To support this, simpleperf added a
448--trace-offcpu option to the record command. When --trace-offcpu is used, simpleperf does the
449following things:
450
4511) Only cpu-clock/task-clock event is allowed to be used with --trace-offcpu. This let simpleperf
452   generate on-cpu samples for cpu-clock event.
4532) Simpleperf also monitors sched:sched_switch event, which will generate a sched_switch sample
454   each time the monitored thread is scheduled off cpu.
4553) Simpleperf also records context switch records. So it knows when the thread is scheduled back on
456   a cpu.
457
458The samples and context switch records collected by simpleperf for a thread are shown below:
459
460![simpleperf_trace_offcpu_sample_mode](simpleperf_trace_offcpu_sample_mode.png)
461
462Here we have two types of samples:
4631) on-cpu samples generated for cpu-clock event. The period value in each sample means how many
464   nanoseconds are spent on cpu (for the callchain of this sample).
4652) off-cpu (sched_switch) samples generated for sched:sched_switch event. The period value is
466   calculated as **Timestamp of the next switch on record** minus **Timestamp of the current sample**
467   by simpleperf. So the period value in each sample means how many nanoseconds are spent off cpu
468   (for the callchain of this sample).
469
470**note**: In reality, switch on records and samples may lost. To mitigate the loss of accuracy, we
471calculate the period of an off-cpu sample as **Timestamp of the next switch on record or sample**
472minus **Timestamp of the current sample**.
473
474When reporting via python scripts, simpleperf_report_lib.py provides SetTraceOffCpuMode() method
475to control how to report the samples:
4761) on-cpu mode: only report on-cpu samples.
4772) off-cpu mode: only report off-cpu samples.
4783) on-off-cpu mode: report both on-cpu and off-cpu samples, which can be split by event name.
4794) mixed-on-off-cpu mode: report on-cpu and off-cpu samples under the same event name.
480
481If not set, mixed-on-off-cpu mode will be used to report.
482
483When using report_html.py, inferno and report_sample.py, the report mode can be set by
484--trace-offcpu option.
485
486Below are some examples recording and reporting trace offcpu profiles.
487
488```sh
489# Check if --trace-offcpu is supported by the kernel (should be available on kernel >= 4.2).
490$ simpleperf list --show-features
491trace-offcpu
492...
493
494# Record with --trace-offcpu.
495$ simpleperf record -g -p 11904 --duration 10 --trace-offcpu -e cpu-clock
496
497# Record system wide with --trace-offcpu.
498$ simpleperf record -a -g --duration 3 --trace-offcpu -e cpu-clock
499
500# Record with --trace-offcpu using app_profiler.py.
501$ ./app_profiler.py -p com.google.samples.apps.sunflower \
502    -r "-g -e cpu-clock:u --duration 10 --trace-offcpu"
503
504# Report on-cpu samples.
505$ ./report_html.py --trace-offcpu on-cpu
506# Report off-cpu samples.
507$ ./report_html.py --trace-offcpu off-cpu
508# Report on-cpu and off-cpu samples under different event names.
509$ ./report_html.py --trace-offcpu on-off-cpu
510# Report on-cpu and off-cpu samples under the same event name.
511$ ./report_html.py --trace-offcpu mixed-on-off-cpu
512```
513
514## The report command
515
516The report command is used to report profiling data generated by the record command. The report
517contains a table of sample entries. Each sample entry is a row in the report. The report command
518groups samples belong to the same process, thread, library, function in the same sample entry. Then
519sort the sample entries based on the event count a sample entry has.
520
521By passing options, we can decide how to filter out uninteresting samples, how to group samples
522into sample entries, and where to find profiling data and binaries.
523
524Below is an example. Records are grouped into 4 sample entries, each entry is a row. There are
525several columns, each column shows piece of information belonging to a sample entry. The first
526column is Overhead, which shows the percentage of events inside the current sample entry in total
527events. As the perf event is cpu-cycles, the overhead is the percentage of CPU cycles used in each
528function.
529
530```sh
531# Reports perf.data, using only records sampled in libsudo-game-jni.so, grouping records using
532# thread name(comm), process id(pid), thread id(tid), function name(symbol), and showing sample
533# count for each row.
534$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so \
535      --sort comm,pid,tid,symbol -n
536Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
537Arch: arm64
538Event: cpu-cycles (type 0, config 0)
539Samples: 28235
540Event count: 546356211
541
542Overhead  Sample  Command    Pid   Tid   Symbol
54359.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
54420.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
54513.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
5466.24%     1756    sudogame  7394  7394  @plt
547```
548
549### Set the path to read profiling data
550
551By default, the report command reads profiling data from perf.data in the current directory.
552But the path can be changed using -i.
553
554```sh
555$ simpleperf report -i data/perf2.data
556```
557
558### Set the path to find binaries
559
560To report function symbols, simpleperf needs to read executable binaries used by the monitored
561processes to get symbol table and debug information. By default, the paths are the executable
562binaries used by monitored processes while recording. However, these binaries may not exist when
563reporting or not contain symbol table and debug information. So we can use --symfs to redirect
564the paths.
565
566```sh
567# In this case, when simpleperf wants to read executable binary /A/b, it reads file in /A/b.
568$ simpleperf report
569
570# In this case, when simpleperf wants to read executable binary /A/b, it prefers file in
571# /debug_dir/A/b to file in /A/b.
572$ simpleperf report --symfs /debug_dir
573
574# Read symbols for system libraries built locally. Note that this is not needed since Android O,
575# which ships symbols for system libraries on device.
576$ simpleperf report --symfs $ANDROID_PRODUCT_OUT/symbols
577```
578
579### Filter samples
580
581When reporting, it happens that not all records are of interest. The report command supports four
582filters to select samples of interest.
583
584```sh
585# Report records in threads having name sudogame.
586$ simpleperf report --comms sudogame
587
588# Report records in process 7394 or 7395
589$ simpleperf report --pids 7394,7395
590
591# Report records in thread 7394 or 7395.
592$ simpleperf report --tids 7394,7395
593
594# Report records in libsudo-game-jni.so.
595$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
596```
597
598### Group samples into sample entries
599
600The report command uses --sort to decide how to group sample entries.
601
602```sh
603# Group records based on their process id: records having the same process id are in the same
604# sample entry.
605$ simpleperf report --sort pid
606
607# Group records based on their thread id and thread comm: records having the same thread id and
608# thread name are in the same sample entry.
609$ simpleperf report --sort tid,comm
610
611# Group records based on their binary and function: records in the same binary and function are in
612# the same sample entry.
613$ simpleperf report --sort dso,symbol
614
615# Default option: --sort comm,pid,tid,dso,symbol. Group records in the same thread, and belong to
616# the same function in the same binary.
617$ simpleperf report
618```
619
620#### Report call graphs
621
622To report a call graph, please make sure the profiling data is recorded with call graphs,
623as [here](#record-call-graphs).
624
625```
626$ simpleperf report -g
627```
628