README.md
1# Simpleperf
2
3Simpleperf is a native profiling tool for Android. It can be used to profile
4both Android applications and native processes running on Android. It can
5profile both Java and C++ code on Android. It can be used on Android L
6and above.
7
8Simpleperf is part of the Android Open Source Project. The source code is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/).
9The latest document is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md).
10
11## Table of Contents
12
13- [Introduction](#introduction)
14- [Tools in simpleperf](#tools-in-simpleperf)
15- [Android application profiling](#android-application-profiling)
16 - [Prepare an Android application](#prepare-an-android-application)
17 - [Record and report profiling data](#record-and-report-profiling-data)
18 - [Record and report call graph](#record-and-report-call-graph)
19 - [Report in html interface](#report-in-html-interface)
20 - [Show flame graph](#show-flame-graph)
21 - [Record both on CPU time and off CPU time](#record-both-on-cpu-time-and-off-cpu-time)
22 - [Profile from launch](#profile-from-launch)
23 - [Parse profiling data manually](#parse-profiling-data-manually)
24- [Android platform profiling](#android-platform-profiling)
25- [Executable commands reference](#executable-commands-reference)
26 - [How simpleperf works](#how-simpleperf-works)
27 - [Commands](#commands)
28 - [The list command](#the-list-command)
29 - [The stat command](#the-stat-command)
30 - [Select events to stat](#select-events-to-stat)
31 - [Select target to stat](#select-target-to-stat)
32 - [Decide how long to stat](#decide-how-long-to-stat)
33 - [Decide the print interval](#decide-the-print-interval)
34 - [Display counters in systrace](#display-counters-in-systrace)
35 - [The record command](#the-record-command)
36 - [Select events to record](#select-events-to-record)
37 - [Select target to record](#select-target-to-record)
38 - [Set the frequency to record](#set-the-frequency-to-record)
39 - [Decide how long to record](#decide-how-long-to-record)
40 - [Set the path to store profiling data](#set-the-path-to-store-profiling-data)
41 - [Record call graphs](#record-call-graphs-in-record-cmd)
42 - [Record both on CPU time and off CPU time](#record-both-on-cpu-time-and-off-cpu-time-in-record-cmd)
43 - [The report command](#the-report-command)
44 - [Set the path to read profiling data](#set-the-path-to-read-profiling-data)
45 - [Set the path to find binaries](#set-the-path-to-find-binaries)
46 - [Filter samples](#filter-samples)
47 - [Group samples into sample entries](#group-samples-into-sample-entries)
48 - [Report call graphs](#report-call-graphs-in-report-cmd)
49- [Scripts reference](#scripts-reference)
50 - [app_profiler.py](#app_profiler-py)
51 - [Profile from launch of an application](#profile-from-launch-of-an-application)
52 - [run_simpleperf_without_usb_connection.py](#run_simpleperf_without_usb_connection-py)
53 - [binary_cache_builder.py](#binary_cache_builder-py)
54 - [run_simpleperf_on_device.py](#run_simpleperf_on_device-py)
55 - [report.py](#report-py)
56 - [report_html.py](#report_html-py)
57 - [inferno](#inferno)
58 - [pprof_proto_generator.py](#pprof_proto_generator-py)
59 - [report_sample.py](#report_sample-py)
60 - [simpleperf_report_lib.py](#simpleperf_report_lib-py)
61- [Answers to common issues](#answers-to-common-issues)
62 - [Why we suggest profiling on android >= N devices](#why-we-suggest-profiling-on-android-n-devices)
63 - [Suggestions about recording call graphs](#suggestions-about-recording-call-graphs)
64 - [How to solve missing symbols in report](#how-to-solve-missing-symbols-in-report)
65- [Bugs and contribution](#bugs-and-contribution)
66
67## Introduction
68
69Simpleperf contains two parts: the simpleperf executable and Python scripts.
70
71The simpleperf executable works similar to linux-tools-perf, but has some specific features for
72the Android profiling environment:
73
741. It collects more info in profiling data. Since the common workflow is "record on the device, and
75 report on the host", simpleperf not only collects samples in profiling data, but also collects
76 needed symbols, device info and recording time.
77
782. It delivers new features for recording.
79 a. When recording dwarf based call graph, simpleperf unwinds the stack before writing a sample
80 to file. This is to save storage space on the device.
81 b. Support tracing both on CPU time and off CPU time with --trace-offcpu option.
82 c. Support recording callgraphs of JITed and interpreted Java code on Android >= P.
83
843. It relates closely to the Android platform.
85 a. Is aware of Android environment, like using system properties to enable profiling, using
86 run-as to profile in application's context.
87 b. Supports reading symbols and debug information from the .gnu_debugdata section, because
88 system libraries are built with .gnu_debugdata section starting from Android O.
89 c. Supports profiling shared libraries embedded in apk files.
90 d. It uses the standard Android stack unwinder, so its results are consistent with all other
91 Android tools.
92
934. It builds executables and shared libraries for different usages.
94 a. Builds static executables on the device. Since static executables don't rely on any library,
95 simpleperf executables can be pushed on any Android device and used to record profiling data.
96 b. Builds executables on different hosts: Linux, Mac and Windows. These executables can be used
97 to report on hosts.
98 c. Builds report shared libraries on different hosts. The report library is used by different
99 Python scripts to parse profiling data.
100
101Detailed documentation for the simpleperf executable is [here](#executable-commands-reference).
102
103Python scripts are split into three parts according to their functions:
104
1051. Scripts used for recording, like app_profiler.py, run_simpleperf_without_usb_connection.py.
106
1072. Scripts used for reporting, like report.py, report_html.py, inferno.
108
1093. Scripts used for parsing profiling data, like simpleperf_report_lib.py.
110
111Detailed documentation for the Python scripts is [here](#scripts-reference).
112
113## Tools in simpleperf
114
115The simpleperf executables and Python scripts are located in simpleperf/ in ndk releases, and in
116system/extras/simpleperf/scripts/ in AOSP. Their functions are listed below.
117
118bin/: contains executables and shared libraries.
119
120bin/android/${arch}/simpleperf: static simpleperf executables used on the device.
121
122bin/${host}/${arch}/simpleperf: simpleperf executables used on the host, only supports reporting.
123
124bin/${host}/${arch}/libsimpleperf_report.${so/dylib/dll}: report shared libraries used on the host.
125
126[app_profiler.py](#app_profiler-py): recording profiling data.
127
128[run_simpleperf_without_usb_connection.py](#run_simpleperf_without_usb_connection-py):
129 recording profiling data while the USB cable isn't connected.
130
131[binary_cache_builder.py](#binary_cache_builder-py): building binary cache for profiling data.
132
133[report.py](#report-py): reporting in stdio interface.
134
135[report_html.py](#report_html-py): reporting in html interface.
136
137[inferno.sh](#inferno) (or inferno.bat on Windows): generating flamegraph in html interface.
138
139inferno/: implementation of inferno. Used by inferno.sh.
140
141[pprof_proto_generator.py](#pprof_proto_generator-py): converting profiling data to the format
142 used by [pprof](https://github.com/google/pprof).
143
144[report_sample.py](#report_sample-py): converting profiling data to the format used by [FlameGraph](https://github.com/brendangregg/FlameGraph).
145
146[simpleperf_report_lib.py](#simpleperf_report_lib-py): library for parsing profiling data.
147
148
149## Android application profiling
150
151This section shows how to profile an Android application.
152Some examples are [Here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/README.md).
153
154Profiling an Android application involves three steps:
1551. Prepare an Android application.
1562. Record profiling data.
1573. Report profiling data.
158
159### Prepare an Android application
160
161Based on the profiling situation, we may need to customize the build script to generate an apk file
162specifically for profiling. Below are some suggestions.
163
1641. If you want to profile a debug build of an application:
165
166For the debug build type, Android studio sets android::debuggable="true" in AndroidManifest.xml,
167enables JNI checks and may not optimize C/C++ code. It can be profiled by simpleperf without any
168change.
169
1702. If you want to profile a release build of an application:
171
172For the release build type, Android studio sets android::debuggable="false" in AndroidManifest.xml,
173disables JNI checks and optimizes C/C++ code. However, security restrictions mean that only apps
174with android::debuggable set to true can be profiled. So simpleperf can only profile a release
175build under these two circumstances:
176If you are on a rooted device, you can profile any app.
177
178If you are on Android >= O, we can use [wrap.sh](#https://developer.android.com/ndk/guides/wrap-script.html)
179to profile a release build:
180Step 1: Add android::debuggable="true" in AndroidManifest.xml to enable profiling.
181```
182<manifest ...>
183 <application android::debuggable="true" ...>
184```
185
186Step 2: Add wrap.sh in lib/`arch` directories. wrap.sh runs the app without passing any debug flags
187to ART, so the app runs as a release app. wrap.sh can be done by adding the script below in
188app/build.gradle.
189```
190android {
191 buildTypes {
192 release {
193 sourceSets {
194 release {
195 resources {
196 srcDir {
197 "wrap_sh_lib_dir"
198 }
199 }
200 }
201 }
202 }
203 }
204}
205
206task createWrapShLibDir
207 for (String abi : ["armeabi", "armeabi-v7a", "arm64-v8a", "x86", "x86_64"]) {
208 def dir = new File("app/wrap_sh_lib_dir/lib/" + abi)
209 dir.mkdirs()
210 def wrapFile = new File(dir, "wrap.sh")
211 wrapFile.withWriter { writer ->
212 writer.write('#!/system/bin/sh\n\$@\n')
213 }
214 }
215}
216```
217
2183. If you want to profile C/C++ code:
219
220Android studio strips symbol table and debug info of native libraries in the apk. So the profiling
221results may contain unknown symbols or broken callgraphs. To fix this, we can pass app_profiler.py
222a directory containing unstripped native libraries via the -lib option. Usually the directory can
223be the path of your Android Studio project.
224
225
2264. If you want to profile Java code:
227
228On Android >= P, simpleperf supports profiling Java code, no matter whether it is executed by
229the interpreter, or JITed, or compiled into native instructions. So you don't need to do anything.
230
231On Android O, simpleperf supports profiling Java code which is compiled into native instructions,
232and it also needs wrap.sh to use the compiled Java code. To compile Java code, we can pass
233app_profiler.py the --compile_java_code option.
234
235On Android N, simpleperf supports profiling Java code that is compiled into native instructions.
236To compile java code, we can pass app_profiler.py the --compile_java_code option.
237
238On Android <= M, simpleperf doesn't support profiling Java code.
239
240
241Below I use application [SimpleperfExampleWithNative](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExampleWithNative).
242It builds an app-profiling.apk for profiling.
243
244```sh
245$ git clone https://android.googlesource.com/platform/system/extras
246$ cd extras/simpleperf/demo
247# Open SimpleperfExamplesWithNative project with Android studio, and build this project
248# successfully, otherwise the `./gradlew` command below will fail.
249$ cd SimpleperfExampleWithNative
250
251# On windows, use "gradlew" instead.
252$ ./gradlew clean assemble
253$ adb install -r app/build/outputs/apk/profiling/app-profiling.apk
254```
255
256### Record and report profiling data
257
258We can use [app-profiler.py](#app_profiler-py) to profile Android applications.
259
260```sh
261# Cd to the directory of simpleperf scripts. Record perf.data.
262# -p option selects the profiled app using its package name.
263# --compile_java_code option compiles Java code into native instructions, which isn't needed on
264# Android >= P.
265# -a option selects the Activity to profile.
266# -lib option gives the directory to find debug native libraries.
267$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative --compile_java_code \
268 -a .MixActivity -lib path_of_SimpleperfExampleWithNative
269```
270
271This will collect profiling data in perf.data in the current directory, and related native
272binaries in binary_cache/.
273
274Normally we need to use the app when profiling, otherwise we may record no samples. But in this
275case, the MixActivity starts a busy thread. So we don't need to use the app while profiling.
276
277```sh
278# Report perf.data in stdio interface.
279$ python report.py
280Cmdline: /data/data/com.example.simpleperf.simpleperfexamplewithnative/simpleperf record ...
281Arch: arm64
282Event: task-clock:u (type 1, config 1)
283Samples: 10023
284Event count: 10023000000
285
286Overhead Command Pid Tid Shared Object Symbol
28727.04% BusyThread 5703 5729 /system/lib64/libart.so art::JniMethodStart(art::Thread*)
28825.87% BusyThread 5703 5729 /system/lib64/libc.so long StrToI<long, ...
289...
290```
291
292[report.py](#report-py) reports profiling data in stdio interface. If there are a lot of unknown
293symbols in the report, check [here](#how-to-solve-missing-symbols-in-report).
294
295```sh
296# Report perf.data in html interface.
297$ python report_html.py
298
299# Add source code and disassembly. Change the path of source_dirs if it not correct.
300$ python report_html.py --add_source_code --source_dirs path_of_SimpleperfExampleWithNative \
301 --add_disassembly
302```
303
304[report_html.py](#report_html-py) generates report in report.html, and pops up a browser tab to
305show it.
306
307### Record and report call graph
308
309We can record and report [call graphs](#record-call-graphs-in-record-cmd) as below.
310
311```sh
312# Record dwarf based call graphs: add "-g" in the -r option.
313$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative \
314 -r "-e task-clock:u -f 1000 --duration 10 -g" -lib path_of_SimpleperfExampleWithNative
315
316# Record stack frame based call graphs: add "--call-graph fp" in the -r option.
317$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative \
318 -r "-e task-clock:u -f 1000 --duration 10 --call-graph fp" \
319 -lib path_of_SimpleperfExampleWithNative
320
321# Report call graphs in stdio interface.
322$ python report.py -g
323
324# Report call graphs in python Tk interface.
325$ python report.py -g --gui
326
327# Report call graphs in html interface.
328$ python report_html.py
329
330# Report call graphs in flame graphs.
331# On Windows, use inferno.bat instead of ./inferno.sh.
332$ ./inferno.sh -sc
333```
334
335### Report in html interface
336
337We can use [report_html.py](#report_html-py) to show profiling results in a web browser.
338report_html.py integrates chart statistics, sample table, flame graphs, source code annotation
339and disassembly annotation. It is the recommended way to show reports.
340
341```sh
342$ python report_html.py
343```
344
345### Show flame graph
346
347To show flame graphs, we need to first record call graphs. Flame graphs are shown by
348report_html.py in the "Flamegraph" tab.
349We can also use [inferno](#inferno) to show flame graphs directly.
350
351```sh
352# On Windows, use inferno.bat instead of ./inferno.sh.
353$ ./inferno.sh -sc
354```
355
356We can also build flame graphs using https://github.com/brendangregg/FlameGraph.
357Please make sure you have perl installed.
358
359```sh
360$ git clone https://github.com/brendangregg/FlameGraph.git
361$ python report_sample.py --symfs binary_cache >out.perf
362$ FlameGraph/stackcollapse-perf.pl out.perf >out.folded
363$ FlameGraph/flamegraph.pl out.folded >a.svg
364```
365
366### Record both on CPU time and off CPU time
367
368We can [record both on CPU time and off CPU time](#record-both-on-cpu-time-and-off-cpu-time-in-record-cmd).
369
370First check if trace-offcpu feature is supported on the device.
371
372```sh
373$ python run_simpleperf_on_device.py list --show-features
374dwarf-based-call-graph
375trace-offcpu
376```
377
378If trace-offcpu is supported, it will be shown in the feature list. Then we can try it.
379
380```sh
381$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .SleepActivity \
382 -r "-g -e task-clock:u -f 1000 --duration 10 --trace-offcpu" \
383 -lib path_of_SimpleperfExampleWithNative
384$ python report_html.py --add_disassembly --add_source_code \
385 --source_dirs path_of_SimpleperfExampleWithNative
386```
387
388### Profile from launch
389
390We can [profile from launch of an application](#profile-from-launch-of-an-application).
391
392```sh
393# Start simpleperf recording, then start the Activity to profile.
394$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .MainActivity
395
396# We can also start the Activity on the device manually.
397# 1. Make sure the application isn't running or one of the recent apps.
398# 2. Start simpleperf recording.
399$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative
400# 3. Start the app manually on the device.
401```
402
403### Parse profiling data manually
404
405We can also write python scripts to parse profiling data manually, by using
406[simpleperf_report_lib.py](#simpleperf_report_lib-py). Examples are report_sample.py,
407report_html.py.
408
409## Android platform profiling
410
411Here are some tips for Android platform developers, who build and flash system images on rooted
412devices:
4131. After running `adb root`, simpleperf can be used to profile any process or system wide.
4142. It is recommended to use the latest simpleperf available in AOSP master, if you are not working
415on the current master branch. Scripts are in `system/extras/simpleperf/scripts`, binaries are in
416`system/extras/simpleperf/scripts/bin/android`.
4173. It is recommended to use `app_profiler.py` for recording, and `report_html.py` for reporting.
418Below is an example.
419
420```sh
421# Record surfaceflinger process for 10 seconds with dwarf based call graph. More examples are in
422# scripts reference in the doc.
423$ python app_profiler.py -np surfaceflinger -r "-g --duration 10"
424
425# Generate html report.
426$ python report_html.py
427```
428
4294. Since Android >= O has symbols for system libraries on device, we don't need to use unstripped
430binaries in `$ANDROID_PRODUCT_OUT/symbols` to report call graphs. However, they are needed to add
431source code and disassembly (with line numbers) in the report. Below is an example.
432
433```sh
434# Doing recording with app_profiler.py or simpleperf on device, and generates perf.data on host.
435$ python app_profiler.py -np surfaceflinger -r "--call-graph fp --duration 10"
436
437# Collect unstripped binaries from $ANDROID_PRODUCT_OUT/symbols to binary_cache/.
438$ python binary_cache_builder.py -lib $ANDROID_PRODUCT_OUT/symbols
439
440# Report source code and disassembly. Disassembling all binaries is slow, so it's better to add
441# --binary_filter option to only disassemble selected binaries.
442$ python report_html.py --add_source_code --source_dirs $ANDROID_BUILD_TOP --add_disassembly \
443 --binary_filter surfaceflinger.so
444```
445
446
447## Executable commands reference
448
449### How simpleperf works
450
451Modern CPUs have a hardware component called the performance monitoring unit (PMU). The PMU has
452several hardware counters, counting events like how many cpu cycles have happened, how many
453instructions have executed, or how many cache misses have happened.
454
455The Linux kernel wraps these hardware counters into hardware perf events. In addition, the Linux
456kernel also provides hardware independent software events and tracepoint events. The Linux kernel
457exposes all events to userspace via the perf_event_open system call, which is used by simpleperf.
458
459Simpleperf has three main commands: stat, record and report.
460
461The stat command gives a summary of how many events have happened in the profiled processes in a
462time period. Here’s how it works:
4631. Given user options, simpleperf enables profiling by making a system call to the kernel.
4642. The kernel enables counters while the profiled processes are running.
4653. After profiling, simpleperf reads counters from the kernel, and reports a counter summary.
466
467The record command records samples of the profiled processes in a time period. Here’s how it works:
4681. Given user options, simpleperf enables profiling by making a system call to the kernel.
4692. Simpleperf creates mapped buffers between simpleperf and the kernel.
4703. The kernel enables counters while the profiled processes are running.
4714. Each time a given number of events happen, the kernel dumps a sample to the mapped buffers.
4725. Simpleperf reads samples from the mapped buffers and stores profiling data in a file called
473 perf.data.
474
475The report command reads perf.data and any shared libraries used by the profiled processes,
476and outputs a report showing where the time was spent.
477
478### Commands
479
480Simpleperf supports several commands, listed below:
481
482```
483The debug-unwind command: debug/test dwarf based offline unwinding, used for debugging simpleperf.
484The dump command: dumps content in perf.data, used for debugging simpleperf.
485The help command: prints help information for other commands.
486The kmem command: collects kernel memory allocation information (will be replaced by Python scripts).
487The list command: lists all event types supported on the Android device.
488The record command: profiles processes and stores profiling data in perf.data.
489The report command: reports profiling data in perf.data.
490The report-sample command: reports each sample in perf.data, used for supporting integration of
491 simpleperf in Android Studio.
492The stat command: profiles processes and prints counter summary.
493
494```
495
496Each command supports different options, which can be seen through help message.
497
498```sh
499# List all commands.
500$ simpleperf --help
501
502# Print help message for record command.
503$ simpleperf record --help
504```
505
506Below describes the most frequently used commands, which are list, stat, record and report.
507
508### The list command
509
510The list command lists all events available on the device. Different devices may support different
511events because they have different hardware and kernels.
512
513```sh
514$ simpleperf list
515List of hw-cache events:
516 branch-loads
517 ...
518List of hardware events:
519 cpu-cycles
520 instructions
521 ...
522List of software events:
523 cpu-clock
524 task-clock
525 ...
526```
527
528On ARM/ARM64, the list command also shows a list of raw events, they are the events supported by
529the ARM PMU on the device. The kernel has wrapped part of them into hardware events and hw-cache
530events. For example, raw-cpu-cycles is wrapped into cpu-cycles, raw-instruction-retired is wrapped
531into instructions. The raw events are provided in case we want to use some events supported on the
532device, but unfortunately not wrapped by the kernel.
533
534### The stat command
535
536The stat command is used to get event counter values of the profiled processes. By passing options,
537we can select which events to use, which processes/threads to monitor, how long to monitor and the
538print interval.
539
540```sh
541# Stat using default events (cpu-cycles,instructions,...), and monitor process 7394 for 10 seconds.
542$ simpleperf stat -p 7394 --duration 10
543Performance counter statistics:
544
545 1,320,496,145 cpu-cycles # 0.131736 GHz (100%)
546 510,426,028 instructions # 2.587047 cycles per instruction (100%)
547 4,692,338 branch-misses # 468.118 K/sec (100%)
548886.008130(ms) task-clock # 0.088390 cpus used (100%)
549 753 context-switches # 75.121 /sec (100%)
550 870 page-faults # 86.793 /sec (100%)
551
552Total test time: 10.023829 seconds.
553```
554
555#### Select events to stat
556
557We can select which events to use via -e.
558
559```sh
560# Stat event cpu-cycles.
561$ simpleperf stat -e cpu-cycles -p 11904 --duration 10
562
563# Stat event cache-references and cache-misses.
564$ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
565```
566
567When running the stat command, if the number of hardware events is larger than the number of
568hardware counters available in the PMU, the kernel shares hardware counters between events, so each
569event is only monitored for part of the total time. In the example below, there is a percentage at
570the end of each row, showing the percentage of the total time that each event was actually
571monitored.
572
573```sh
574# Stat using event cache-references, cache-references:u,....
575$ simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k \
576 -e cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
577Performance counter statistics:
578
5794,331,018 cache-references # 4.861 M/sec (87%)
5803,064,089 cache-references:u # 3.439 M/sec (87%)
5811,364,959 cache-references:k # 1.532 M/sec (87%)
582 91,721 cache-misses # 102.918 K/sec (87%)
583 45,735 cache-misses:u # 51.327 K/sec (87%)
584 38,447 cache-misses:k # 43.131 K/sec (87%)
5859,688,515 instructions # 10.561 M/sec (89%)
586
587Total test time: 1.026802 seconds.
588```
589
590In the example above, each event is monitored about 87% of the total time. But there is no
591guarantee that any pair of events are always monitored at the same time. If we want to have some
592events monitored at the same time, we can use --group.
593
594```sh
595# Stat using event cache-references, cache-references:u,....
596$ simpleperf stat -p 7964 --group cache-references,cache-misses \
597 --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k \
598 -e instructions --duration 1
599Performance counter statistics:
600
6013,638,900 cache-references # 4.786 M/sec (74%)
602 65,171 cache-misses # 1.790953% miss rate (74%)
6032,390,433 cache-references:u # 3.153 M/sec (74%)
604 32,280 cache-misses:u # 1.350383% miss rate (74%)
605 879,035 cache-references:k # 1.251 M/sec (68%)
606 30,303 cache-misses:k # 3.447303% miss rate (68%)
6078,921,161 instructions # 10.070 M/sec (86%)
608
609Total test time: 1.029843 seconds.
610```
611
612#### Select target to stat
613
614We can select which processes or threads to monitor via -p or -t. Monitoring a
615process is the same as monitoring all threads in the process. Simpleperf can also fork a child
616process to run the new command and then monitor the child process.
617
618```sh
619# Stat process 11904 and 11905.
620$ simpleperf stat -p 11904,11905 --duration 10
621
622# Stat thread 11904 and 11905.
623$ simpleperf stat -t 11904,11905 --duration 10
624
625# Start a child process running `ls`, and stat it.
626$ simpleperf stat ls
627
628# Stat the process of an Android application. This only works for debuggable apps on non-rooted
629# devices.
630$ simpleperf stat --app com.example.simpleperf.simpleperfexamplewithnative
631
632# Stat system wide using -a.
633$ simpleperf stat -a --duration 10
634```
635
636#### Decide how long to stat
637
638When monitoring existing threads, we can use --duration to decide how long to monitor. When
639monitoring a child process running a new command, simpleperf monitors until the child process ends.
640In this case, we can use Ctrl-C to stop monitoring at any time.
641
642```sh
643# Stat process 11904 for 10 seconds.
644$ simpleperf stat -p 11904 --duration 10
645
646# Stat until the child process running `ls` finishes.
647$ simpleperf stat ls
648
649# Stop monitoring using Ctrl-C.
650$ simpleperf stat -p 11904 --duration 10
651^C
652```
653
654If you want to write a script to control how long to monitor, you can send one of SIGINT, SIGTERM,
655SIGHUP signals to simpleperf to stop monitoring.
656
657#### Decide the print interval
658
659When monitoring perf counters, we can also use --interval to decide the print interval.
660
661```sh
662# Print stat for process 11904 every 300ms.
663$ simpleperf stat -p 11904 --duration 10 --interval 300
664
665# Print system wide stat at interval of 300ms for 10 seconds. Note that system wide profiling needs
666# root privilege.
667$ su 0 simpleperf stat -a --duration 10 --interval 300
668```
669
670#### Display counters in systrace
671
672Simpleperf can also work with systrace to dump counters in the collected trace. Below is an example
673to do a system wide stat.
674
675```sh
676# Capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15
677# seconds.
678$ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
679# On host launch systrace to collect trace for 10 seconds.
680(HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
681# Open the collected new.html in browser and perf counters will be shown up.
682```
683
684### The record command
685
686The record command is used to dump samples of the profiled processes. Each sample can contain
687information like the time at which the sample was generated, the number of events since last
688sample, the program counter of a thread, the call chain of a thread.
689
690By passing options, we can select which events to use, which processes/threads to monitor,
691what frequency to dump samples, how long to monitor, and where to store samples.
692
693```sh
694# Record on process 7394 for 10 seconds, using default event (cpu-cycles), using default sample
695# frequency (4000 samples per second), writing records to perf.data.
696$ simpleperf record -p 7394 --duration 10
697simpleperf I cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
698```
699
700#### Select events to record
701
702By default, the cpu-cycles event is used to evaluate consumed cpu cycles. But we can also use other
703events via -e.
704
705```sh
706# Record using event instructions.
707$ simpleperf record -e instructions -p 11904 --duration 10
708
709# Record using task-clock, which shows the passed CPU time in nanoseconds.
710$ simpleperf record -e task-clock -p 11904 --duration 10
711```
712
713#### Select target to record
714
715The way to select target in record command is similar to that in the stat command.
716
717```sh
718# Record process 11904 and 11905.
719$ simpleperf record -p 11904,11905 --duration 10
720
721# Record thread 11904 and 11905.
722$ simpleperf record -t 11904,11905 --duration 10
723
724# Record a child process running `ls`.
725$ simpleperf record ls
726
727# Record the process of an Android application. This only works for debuggable apps on non-rooted
728# devices.
729$ simpleperf record --app com.example.simpleperf.simpleperfexamplewithnative
730
731# Record system wide.
732$ simpleperf record -a --duration 10
733```
734
735#### Set the frequency to record
736
737We can set the frequency to dump records via -f or -c. For example, -f 4000 means
738dumping approximately 4000 records every second when the monitored thread runs. If a monitored
739thread runs 0.2s in one second (it can be preempted or blocked in other times), simpleperf dumps
740about 4000 * 0.2 / 1.0 = 800 records every second. Another way is using -c. For example, -c 10000
741means dumping one record whenever 10000 events happen.
742
743```sh
744# Record with sample frequency 1000: sample 1000 times every second running.
745$ simpleperf record -f 1000 -p 11904,11905 --duration 10
746
747# Record with sample period 100000: sample 1 time every 100000 events.
748$ simpleperf record -c 100000 -t 11904,11905 --duration 10
749```
750
751To avoid taking too much time generating samples, kernel >= 3.10 sets the max percent of cpu time
752used for generating samples (default is 25%), and decreases the max allowed sample frequency when
753hitting that limit. Simpleperf uses --cpu-percent option to adjust it, but it needs either root
754privilege or to be on Android >= Q.
755
756```sh
757# Record with sample frequency 10000, with max allowed cpu percent to be 50%.
758$ simpleperf record -f 1000 -p 11904,11905 --duration 10 --cpu-percent 50
759```
760
761#### Decide how long to record
762
763The way to decide how long to monitor in record command is similar to that in the stat command.
764
765```sh
766# Record process 11904 for 10 seconds.
767$ simpleperf record -p 11904 --duration 10
768
769# Record until the child process running `ls` finishes.
770$ simpleperf record ls
771
772# Stop monitoring using Ctrl-C.
773$ simpleperf record -p 11904 --duration 10
774^C
775```
776
777If you want to write a script to control how long to monitor, you can send one of SIGINT, SIGTERM,
778SIGHUP signals to simpleperf to stop monitoring.
779
780#### Set the path to store profiling data
781
782By default, simpleperf stores profiling data in perf.data in the current directory. But the path
783can be changed using -o.
784
785```sh
786# Write records to data/perf2.data.
787$ simpleperf record -p 11904 -o data/perf2.data --duration 10
788```
789
790<a name="record-call-graphs-in-record-cmd"></a>
791#### Record call graphs
792
793A call graph is a tree showing function call relations. Below is an example.
794
795```
796main() {
797 FunctionOne();
798 FunctionTwo();
799}
800FunctionOne() {
801 FunctionTwo();
802 FunctionThree();
803}
804a call graph:
805 main-> FunctionOne
806 | |
807 | |-> FunctionTwo
808 | |-> FunctionThree
809 |
810 |-> FunctionTwo
811```
812
813A call graph shows how a function calls other functions, and a reversed call graph shows how
814a function is called by other functions. To show a call graph, we need to first record it, then
815report it.
816
817There are two ways to record a call graph, one is recording a dwarf based call graph, the other is
818recording a stack frame based call graph. Recording dwarf based call graphs needs support of debug
819information in native binaries. While recording stack frame based call graphs needs support of
820stack frame registers.
821
822```sh
823# Record a dwarf based call graph
824$ simpleperf record -p 11904 -g --duration 10
825
826# Record a stack frame based call graph
827$ simpleperf record -p 11904 --call-graph fp --duration 10
828```
829
830[Here](#suggestions-about-recording-call-graphs) are some suggestions about recording call graphs.
831
832<a name="record-both-on-cpu-time-and-off-cpu-time-in-record-cmd"></a>
833#### Record both on CPU time and off CPU time
834
835Simpleperf is a CPU profiler, it generates samples for a thread only when it is running on a CPU.
836However, sometimes we want to figure out where the time of a thread is spent, whether it is running
837on a CPU, or staying in the kernel's ready queue, or waiting for something like I/O events.
838
839To support this, the record command uses --trace-offcpu to trace both on CPU time and off CPU time.
840When --trace-offcpu is used, simpleperf generates a sample when a running thread is scheduled out,
841so we know the callstack of a thread when it is scheduled out. And when reporting a perf.data
842generated with --trace-offcpu, we use time to the next sample (instead of event counts from the
843previous sample) as the weight of the current sample. As a result, we can get a call graph based
844on timestamps, including both on CPU time and off CPU time.
845
846trace-offcpu is implemented using sched:sched_switch tracepoint event, which may not be supported
847on old kernels. But it is guaranteed to be supported on devices >= Android O MR1. We can check
848whether trace-offcpu is supported as below.
849
850```sh
851$ simpleperf list --show-features
852dwarf-based-call-graph
853trace-offcpu
854```
855
856If trace-offcpu is supported, it will be shown in the feature list. Then we can try it.
857
858```sh
859# Record with --trace-offcpu.
860$ simpleperf record -g -p 11904 --duration 10 --trace-offcpu
861
862# Record with --trace-offcpu using app_profiler.py.
863$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .SleepActivity \
864 -r "-g -e task-clock:u -f 1000 --duration 10 --trace-offcpu"
865```
866
867Below is an example comparing the profiling result with / without --trace-offcpu.
868First we record without --trace-offcpu.
869
870```sh
871$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .SleepActivity
872
873$ python report_html.py --add_disassembly --add_source_code --source_dirs ../demo
874```
875
876The result is [here](./without_trace_offcpu.html).
877In the result, all time is taken by RunFunction(), and sleep time is ignored.
878But if we add --trace-offcpu, the result changes.
879
880```sh
881$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .SleepActivity \
882 -r "-g -e task-clock:u --trace-offcpu -f 1000 --duration 10"
883
884$ python report_html.py --add_disassembly --add_source_code --source_dirs ../demo
885```
886
887The result is [here](./trace_offcpu.html).
888In the result, half of the time is taken by RunFunction(), and the other half is taken by
889SleepFunction(). So it traces both on CPU time and off CPU time.
890
891### The report command
892
893The report command is used to report profiling data generated by the record command. The report
894contains a table of sample entries. Each sample entry is a row in the report. The report command
895groups samples belong to the same process, thread, library, function in the same sample entry. Then
896sort the sample entries based on the event count a sample entry has.
897
898By passing options, we can decide how to filter out uninteresting samples, how to group samples
899into sample entries, and where to find profiling data and binaries.
900
901Below is an example. Records are grouped into 4 sample entries, each entry is a row. There are
902several columns, each column shows piece of information belonging to a sample entry. The first
903column is Overhead, which shows the percentage of events inside the current sample entry in total
904events. As the perf event is cpu-cycles, the overhead is the percentage of CPU cycles used in each
905function.
906
907```sh
908# Reports perf.data, using only records sampled in libsudo-game-jni.so, grouping records using
909# thread name(comm), process id(pid), thread id(tid), function name(symbol), and showing sample
910# count for each row.
911$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so \
912 --sort comm,pid,tid,symbol -n
913Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
914Arch: arm64
915Event: cpu-cycles (type 0, config 0)
916Samples: 28235
917Event count: 546356211
918
919Overhead Sample Command Pid Tid Symbol
92059.25% 16680 sudogame 7394 7394 checkValid(Board const&, int, int)
92120.42% 5620 sudogame 7394 7394 canFindSolution_r(Board&, int, int)
92213.82% 4088 sudogame 7394 7394 randomBlock_r(Board&, int, int, int, int, int)
9236.24% 1756 sudogame 7394 7394 @plt
924```
925
926#### Set the path to read profiling data
927
928By default, the report command reads profiling data from perf.data in the current directory.
929But the path can be changed using -i.
930
931```sh
932$ simpleperf report -i data/perf2.data
933```
934
935#### Set the path to find binaries
936
937To report function symbols, simpleperf needs to read executable binaries used by the monitored
938processes to get symbol table and debug information. By default, the paths are the executable
939binaries used by monitored processes while recording. However, these binaries may not exist when
940reporting or not contain symbol table and debug information. So we can use --symfs to redirect
941the paths.
942
943```sh
944# In this case, when simpleperf wants to read executable binary /A/b, it reads file in /A/b.
945$ simpleperf report
946
947# In this case, when simpleperf wants to read executable binary /A/b, it prefers file in
948# /debug_dir/A/b to file in /A/b.
949$ simpleperf report --symfs /debug_dir
950
951# Read symbols for system libraries built locally. Note that this is not needed since Android O,
952# which ships symbols for system libraries on device.
953$ simpleperf report --symfs $ANDROID_PRODUCT_OUT/symbols
954```
955
956#### Filter samples
957
958When reporting, it happens that not all records are of interest. The report command supports four
959filters to select samples of interest.
960
961```sh
962# Report records in threads having name sudogame.
963$ simpleperf report --comms sudogame
964
965# Report records in process 7394 or 7395
966$ simpleperf report --pids 7394,7395
967
968# Report records in thread 7394 or 7395.
969$ simpleperf report --tids 7394,7395
970
971# Report records in libsudo-game-jni.so.
972$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
973```
974
975#### Group samples into sample entries
976
977The report command uses --sort to decide how to group sample entries.
978
979```sh
980# Group records based on their process id: records having the same process id are in the same
981# sample entry.
982$ simpleperf report --sort pid
983
984# Group records based on their thread id and thread comm: records having the same thread id and
985# thread name are in the same sample entry.
986$ simpleperf report --sort tid,comm
987
988# Group records based on their binary and function: records in the same binary and function are in
989# the same sample entry.
990$ simpleperf report --sort dso,symbol
991
992# Default option: --sort comm,pid,tid,dso,symbol. Group records in the same thread, and belong to
993# the same function in the same binary.
994$ simpleperf report
995```
996
997<a name="report-call-graphs-in-report-cmd"></a>
998#### Report call graphs
999
1000To report a call graph, please make sure the profiling data is recorded with call graphs,
1001as [here](#record-call-graphs-in-record-cmd).
1002
1003```
1004$ simpleperf report -g
1005```
1006
1007## Scripts reference
1008
1009<a name="app_profiler-py"></a>
1010### app_profiler.py
1011
1012app_profiler.py is used to record profiling data for Android applications and native executables.
1013
1014```sh
1015# Record an Android application.
1016$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative
1017
1018# Record an Android application with Java code compiled into native instructions.
1019$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative --compile_java_code
1020
1021# Record the launch of an Activity of an Android application.
1022$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .SleepActivity
1023
1024# Record a native process.
1025$ python app_profiler.py -np surfaceflinger
1026
1027# Record a native process given its pid.
1028$ python app_profiler.py --pid 11324
1029
1030# Record a command.
1031$ python app_profiler.py -cmd \
1032 "dex2oat --dex-file=/data/local/tmp/app-profiling.apk --oat-file=/data/local/tmp/a.oat"
1033
1034# Record an Android application, and use -r to send custom options to the record command.
1035$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative \
1036 -r "-e cpu-clock -g --duration 30"
1037
1038# Record both on CPU time and off CPU time.
1039$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative \
1040 -r "-e task-clock -g -f 1000 --duration 10 --trace-offcpu"
1041
1042# Save profiling data in a custom file (like perf_custom.data) instead of perf.data.
1043$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -o perf_custom.data
1044```
1045
1046#### Profile from launch of an application
1047
1048Sometimes we want to profile the launch-time of an application. To support this, we added --app in
1049the record command. The --app option sets the package name of the Android application to profile.
1050If the app is not already running, the record command will poll for the app process in a loop with
1051an interval of 1ms. So to profile from launch of an application, we can first start the record
1052command with --app, then start the app. Below is an example.
1053
1054```sh
1055$ python run_simpleperf_on_device.py record
1056 --app com.example.simpleperf.simpleperfexamplewithnative \
1057 -g --duration 1 -o /data/local/tmp/perf.data
1058# Start the app manually or using the `am` command.
1059```
1060
1061To make it convenient to use, app_profiler.py supports using the -a option to start an Activity
1062after recording has started.
1063
1064```sh
1065$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative -a .MainActivity
1066```
1067
1068<a name="run_simpleperf_without_usb_connection-py"></a>
1069### run_simpleperf_without_usb_connection.py
1070
1071run_simpleperf_without_usb_connection.py records profiling data while the USB cable isn't
1072connected. Below is an example.
1073
1074```sh
1075$ python run_simpleperf_without_usb_connection.py start \
1076 -p com.example.simpleperf.simpleperfexamplewithnative
1077# After the command finishes successfully, unplug the USB cable, run the
1078# SimpleperfExampleWithNative app. After a few seconds, plug in the USB cable.
1079$ python run_simpleperf_without_usb_connection.py stop
1080# It may take a while to stop recording. After that, the profiling data is collected in perf.data
1081# on host.
1082```
1083
1084<a name="binary_cache_builder-py"></a>
1085### binary_cache_builder.py
1086
1087The binary_cache directory is a directory holding binaries needed by a profiling data file. The
1088binaries are expected to be unstripped, having debug information and symbol tables. The
1089binary_cache directory is used by report scripts to read symbols of binaries. It is also used by
1090report_html.py to generate annotated source code and disassembly.
1091
1092By default, app_profiler.py builds the binary_cache directory after recording. But we can also
1093build binary_cache for existing profiling data files using binary_cache_builder.py. It is useful
1094when you record profiling data using `simpleperf record` directly, to do system wide profiling or
1095record without the USB cable connected.
1096
1097binary_cache_builder.py can either pull binaries from an Android device, or find binaries in
1098directories on the host (via -lib).
1099
1100```sh
1101# Generate binary_cache for perf.data, by pulling binaries from the device.
1102$ python binary_cache_builder.py
1103
1104# Generate binary_cache, by pulling binaries from the device and finding binaries in
1105# SimpleperfExampleWithNative.
1106$ python binary_cache_builder.py -lib path_of_SimpleperfExampleWithNative
1107```
1108
1109<a name="run_simpleperf_on_device-py"></a>
1110### run_simpleperf_on_device.py
1111
1112This script pushes the simpleperf executable on the device, and run a simpleperf command on the
1113device. It is more convenient than running adb commands manually.
1114
1115<a name="report-py"></a>
1116### report.py
1117
1118report.py is a wrapper of the report command on the host. It accepts all options of the report
1119command.
1120
1121```sh
1122# Report call graph
1123$ python report.py -g
1124
1125# Report call graph in a GUI window implemented by Python Tk.
1126$ python report.py -g --gui
1127```
1128
1129<a name="report_html-py"></a>
1130### report_html.py
1131
1132report_html.py generates report.html based on the profiling data. Then the report.html can show
1133the profiling result without depending on other files. So it can be shown in local browsers or
1134passed to other machines. Depending on which command-line options are used, the content of the
1135report.html can include: chart statistics, sample table, flame graphs, annotated source code for
1136each function, annotated disassembly for each function.
1137
1138```sh
1139# Generate chart statistics, sample table and flame graphs, based on perf.data.
1140$ python report_html.py
1141
1142# Add source code.
1143$ python report_html.py --add_source_code --source_dirs path_of_SimpleperfExampleWithNative
1144
1145# Add disassembly.
1146$ python report_html.py --add_disassembly
1147
1148# Adding disassembly for all binaries can cost a lot of time. So we can choose to only add
1149# disassembly for selected binaries.
1150$ python report_html.py --add_disassembly --binary_filter libgame.so
1151
1152# report_html.py accepts more than one recording data file.
1153$ python report_html.py -i perf1.data perf2.data
1154```
1155
1156Below is an example of generating html profiling results for SimpleperfExampleWithNative.
1157
1158```sh
1159$ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative
1160$ python report_html.py --add_source_code --source_dirs path_of_SimpleperfExampleWithNative \
1161 --add_disassembly
1162```
1163
1164After opening the generated [report.html](./report_html.html) in a browser, there are several tabs:
1165
1166The first tab is "Chart Statistics". You can click the pie chart to show the time consumed by each
1167process, thread, library and function.
1168
1169The second tab is "Sample Table". It shows the time taken by each function. By clicking one row in
1170the table, we can jump to a new tab called "Function".
1171
1172The third tab is "Flamegraph". It shows the flame graphs generated by [inferno](./inferno.md).
1173
1174The fourth tab is "Function". It only appears when users click a row in the "Sample Table" tab.
1175It shows information of a function, including:
1176
11771. A flame graph showing functions called by that function.
11782. A flame graph showing functions calling that function.
11793. Annotated source code of that function. It only appears when there are source code files for
1180 that function.
11814. Annotated disassembly of that function. It only appears when there are binaries containing that
1182 function.
1183
1184### inferno
1185
1186[inferno](./inferno.md) is a tool used to generate flame graph in a html file.
1187
1188```sh
1189# Generate flame graph based on perf.data.
1190# On Windows, use inferno.bat instead of ./inferno.sh.
1191$ ./inferno.sh -sc --record_file perf.data
1192
1193# Record a native program and generate flame graph.
1194$ ./inferno.sh -np surfaceflinger
1195```
1196
1197<a name="pprof_proto_generator-py"></a>
1198### pprof_proto_generator.py
1199
1200It converts a profiling data file into pprof.proto, a format used by [pprof](https://github.com/google/pprof).
1201
1202```sh
1203# Convert perf.data in the current directory to pprof.proto format.
1204$ python pprof_proto_generator.py
1205$ pprof -pdf pprof.profile
1206```
1207
1208<a name="report_sample-py"></a>
1209### report_sample.py
1210
1211It converts a profiling data file into a format used by [FlameGraph](https://github.com/brendangregg/FlameGraph).
1212
1213```sh
1214# Convert perf.data in the current directory to a format used by FlameGraph.
1215$ python report_sample.py --symfs binary_cache >out.perf
1216$ git clone https://github.com/brendangregg/FlameGraph.git
1217$ FlameGraph/stackcollapse-perf.pl out.perf >out.folded
1218$ FlameGraph/flamegraph.pl out.folded >a.svg
1219```
1220
1221<a name="simpleperf_report_lib-py"></a>
1222### simpleperf_report_lib.py
1223
1224simpleperf_report_lib.py is a Python library used to parse profiling data files generated by the
1225record command. Internally, it uses libsimpleperf_report.so to do the work. Generally, for each
1226profiling data file, we create an instance of ReportLib, pass it the file path (via SetRecordFile).
1227Then we can read all samples through GetNextSample(). For each sample, we can read its event info
1228(via GetEventOfCurrentSample), symbol info (via GetSymbolOfCurrentSample) and call chain info
1229(via GetCallChainOfCurrentSample). We can also get some global information, like record options
1230(via GetRecordCmd), the arch of the device (via GetArch) and meta strings (via MetaInfo).
1231
1232Examples of using simpleperf_report_lib.py are in report_sample.py, report_html.py,
1233pprof_proto_generator.py and inferno/inferno.py.
1234
1235## Answers to common issues
1236
1237### Why we suggest profiling on Android >= N devices?
1238```
12391. Running on a device reflects a real running situation, so we suggest
1240profiling on real devices instead of emulators.
12412. To profile Java code, we need ART running in oat mode, which is only
1242available >= L for rooted devices, and >= N for non-rooted devices.
12433. Old Android versions are likely to be shipped with old kernels (< 3.18),
1244which may not support profiling features like recording dwarf based call graphs.
12454. Old Android versions are likely to be shipped with Arm32 chips. In Arm32
1246mode, recording stack frame based call graphs doesn't work well.
1247```
1248
1249### Suggestions about recording call graphs
1250
1251Below is our experiences of dwarf based call graphs and stack frame based call graphs.
1252
1253dwarf based call graphs:
12541. Need support of debug information in binaries.
12552. Behave normally well on both ARM and ARM64, for both fully compiled Java code and C++ code.
12563. Can only unwind 64K stack for each sample. So usually can't show complete flame-graph. But
1257 probably is enough for users to identify hot places.
12584. Take more CPU time than stack frame based call graphs. So the sample frequency is suggested
1259 to be 1000 Hz. Thus at most 1000 samples per second.
1260
1261stack frame based call graphs:
12621. Need support of stack frame registers.
12632. Don't work well on ARM. Because ARM is short of registers, and ARM and THUMB code have different
1264 stack frame registers. So the kernel can't unwind user stack containing both ARM/THUMB code.
12653. Also don't work well on fully compiled Java code on ARM64. Because the ART compiler doesn't
1266 reserve stack frame registers.
12674. Work well when profiling native programs on ARM64. One example is profiling surfacelinger. And
1268 usually shows complete flame-graph when it works well.
12695. Take less CPU time than dwarf based call graphs. So the sample frequency can be 4000 Hz or
1270 higher.
1271
1272So if you need to profile code on ARM or profile fully compiled Java code, dwarf based call graphs
1273may be better. If you need to profile C++ code on ARM64, stack frame based call graphs may be
1274better. After all, you can always try dwarf based call graph first, because it always produces
1275reasonable results when given unstripped binaries properly. If it doesn't work well enough, then
1276try stack frame based call graphs instead.
1277
1278Simpleperf needs to have unstripped native binaries on the device to generate good dwarf based call
1279graphs. It can be supported in two ways:
12801. Use unstripped native binaries when building the apk, as [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExampleWithNative/app/profiling.gradle).
12812. Pass directory containing unstripped native libraries to app_profiler.py via -lib. And it will
1282 download the unstripped native libraries on the device.
1283
1284```sh
1285$ python app_profiler.py -lib NATIVE_LIB_DIR
1286```
1287
1288### How to solve missing symbols in report?
1289
1290The simpleperf record command collects symbols on device in perf.data. But if the native libraries
1291you use on device are stripped, this will result in a lot of unknown symbols in the report. A
1292solution is to build binary_cache on host.
1293
1294```sh
1295# Collect binaries needed by perf.data in binary_cache/.
1296$ python binary_cache_builder.py -lib NATIVE_LIB_DIR,...
1297```
1298
1299The NATIVE_LIB_DIRs passed in -lib option are the directories containing unstripped native
1300libraries on host. After running it, the native libraries containing symbol tables are collected
1301in binary_cache/ for use when reporting.
1302
1303```sh
1304$ python report.py --symfs binary_cache
1305
1306# report_html.py searches binary_cache/ automatically, so you don't need to
1307# pass it any argument.
1308$ python report_html.py
1309```
1310
1311## Bugs and contribution
1312
1313Bugs and feature requests can be submitted at http://github.com/android-ndk/ndk/issues.
1314Patches can be uploaded to android-review.googlesource.com as [here](https://source.android.com/setup/contribute/),
1315or sent to email addresses listed [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/OWNERS).
1316
1317If you want to compile simpleperf C++ source code, follow below steps:
13181. Download AOSP master branch as [here](https://source.android.com/setup/build/requirements).
13192. Build simpleperf.
1320```sh
1321$ . build/envsetup.sh
1322$ lunch aosp_arm64-userdebug
1323$ mmma system/extras/simpleperf -j30
1324```
1325
1326If built successfully, out/target/product/generic_arm64/system/bin/simpleperf is for ARM64, and
1327out/target/product/generic_arm64/system/bin/simpleperf32 is for ARM.
1328