• Home
Name Date Size #Lines LOC

..--

pictures/03-May-2024-

README.mdD03-May-202414 KiB298209

android_application_profiling.mdD03-May-202411.4 KiB314235

android_platform_profiling.mdD03-May-20244.5 KiB11082

bottleneck.pngD03-May-2024157.6 KiB

collect_etm_data_for_autofdo.mdD03-May-202411.3 KiB269204

debug_dwarf_unwinding.mdD03-May-20243.6 KiB8061

executable_commands_reference.mdD03-May-202426.6 KiB676516

inferno.mdD03-May-20243.9 KiB11076

inferno.pngD03-May-2024308.6 KiB

inferno_small.pngD03-May-202420.3 KiB

introduction.pdfD03-May-2024268.6 KiB

jit_symbols.mdD03-May-20241.6 KiB5434

main_thread_flamegraph.pngD03-May-2024178.3 KiB

report.htmlD03-May-20241.9 MiB14,85914,774

report_bottleneck.htmlD03-May-20241.8 MiB13,95313,872

report_html.htmlD03-May-2024622.7 KiB1,7331,555

sample_filter.mdD03-May-20242.3 KiB8965

scripts_reference.mdD03-May-202411.7 KiB323230

simpleperf_trace_offcpu_sample_mode.pngD03-May-202411.3 KiB

view_the_profile.mdD03-May-202411.1 KiB343234

README.md

1# Simpleperf
2
3Android Studio includes a graphical front end to Simpleperf, documented in
4[Inspect CPU activity with CPU Profiler](https://developer.android.com/studio/profile/cpu-profiler).
5Most users will prefer to use that instead of using Simpleperf directly.
6
7Simpleperf is a native CPU profiling tool for Android. It can be used to profile
8both Android applications and native processes running on Android. It can
9profile both Java and C++ code on Android. The simpleperf executable can run on Android >=L,
10and Python scripts can be used on Android >= N.
11
12Simpleperf is part of the Android Open Source Project.
13The source code is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/).
14The latest document is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md).
15
16[TOC]
17
18## Introduction
19
20An introduction slide deck is [here](./introduction.pdf).
21
22Simpleperf contains two parts: the simpleperf executable and Python scripts.
23
24The simpleperf executable works similar to linux-tools-perf, but has some specific features for
25the Android profiling environment:
26
271. It collects more info in profiling data. Since the common workflow is "record on the device, and
28   report on the host", simpleperf not only collects samples in profiling data, but also collects
29   needed symbols, device info and recording time.
30
312. It delivers new features for recording.
32   1) When recording dwarf based call graph, simpleperf unwinds the stack before writing a sample
33      to file. This is to save storage space on the device.
34   2) Support tracing both on CPU time and off CPU time with --trace-offcpu option.
35   3) Support recording callgraphs of JITed and interpreted Java code on Android >= P.
36
373. It relates closely to the Android platform.
38   1) Is aware of Android environment, like using system properties to enable profiling, using
39      run-as to profile in application's context.
40   2) Supports reading symbols and debug information from the .gnu_debugdata section, because
41      system libraries are built with .gnu_debugdata section starting from Android O.
42   3) Supports profiling shared libraries embedded in apk files.
43   4) It uses the standard Android stack unwinder, so its results are consistent with all other
44      Android tools.
45
464. It builds executables and shared libraries for different usages.
47   1) Builds static executables on the device. Since static executables don't rely on any library,
48      simpleperf executables can be pushed on any Android device and used to record profiling data.
49   2) Builds executables on different hosts: Linux, Mac and Windows. These executables can be used
50      to report on hosts.
51   3) Builds report shared libraries on different hosts. The report library is used by different
52      Python scripts to parse profiling data.
53
54Detailed documentation for the simpleperf executable is [here](#executable-commands-reference).
55
56Python scripts are split into three parts according to their functions:
57
581. Scripts used for recording, like app_profiler.py, run_simpleperf_without_usb_connection.py.
59
602. Scripts used for reporting, like report.py, report_html.py, inferno.
61
623. Scripts used for parsing profiling data, like simpleperf_report_lib.py.
63
64The python scripts are tested on Python >= 3.9. Older versions may not be supported.
65Detailed documentation for the Python scripts is [here](#scripts-reference).
66
67
68## Tools in simpleperf
69
70The simpleperf executables and Python scripts are located in simpleperf/ in ndk releases, and in
71system/extras/simpleperf/scripts/ in AOSP. Their functions are listed below.
72
73bin/: contains executables and shared libraries.
74
75bin/android/${arch}/simpleperf: static simpleperf executables used on the device.
76
77bin/${host}/${arch}/simpleperf: simpleperf executables used on the host, only supports reporting.
78
79bin/${host}/${arch}/libsimpleperf_report.${so/dylib/dll}: report shared libraries used on the host.
80
81*.py, inferno, purgatorio: Python scripts used for recording and reporting. Details are in [scripts_reference.md](scripts_reference.md).
82
83
84## Android application profiling
85
86See [android_application_profiling.md](./android_application_profiling.md).
87
88
89## Android platform profiling
90
91See [android_platform_profiling.md](./android_platform_profiling.md).
92
93
94## Executable commands reference
95
96See [executable_commands_reference.md](./executable_commands_reference.md).
97
98
99## Scripts reference
100
101See [scripts_reference.md](./scripts_reference.md).
102
103## View the profile
104
105See [view_the_profile.md](./view_the_profile.md).
106
107## Answers to common issues
108
109### Support on different Android versions
110
111On Android < N, the kernel may be too old (< 3.18) to support features like recording DWARF
112based call graphs.
113On Android M - O, we can only profile C++ code and fully compiled Java code.
114On Android >= P, the ART interpreter supports DWARF based unwinding. So we can profile Java code.
115On Android >= Q, we can used simpleperf shipped on device to profile released Android apps, with
116  `<profileable android:shell="true" />`.
117
118
119### Comparing DWARF based and stack frame based call graphs
120
121Simpleperf supports two ways recording call stacks with samples. One is DWARF based call graph,
122the other is stack frame based call graph. Below is their comparison:
123
124Recording DWARF based call graph:
1251. Needs support of debug information in binaries.
1262. Behaves normally well on both ARM and ARM64, for both Java code and C++ code.
1273. Can only unwind 64K stack for each sample. So it isn't always possible to unwind to the bottom.
128   However, this is alleviated in simpleperf, as explained in the next section.
1294. Takes more CPU time than stack frame based call graphs. So it has higher overhead, and can't
130   sample at very high frequency (usually <= 4000 Hz).
131
132Recording stack frame based call graph:
1331. Needs support of stack frame registers.
1342. Doesn't work well on ARM. Because ARM is short of registers, and ARM and THUMB code have
135   different stack frame registers. So the kernel can't unwind user stack containing both ARM and
136   THUMB code.
1373. Also doesn't work well on Java code. Because the ART compiler doesn't reserve stack frame
138   registers. And it can't get frames for interpreted Java code.
1394. Works well when profiling native programs on ARM64. One example is profiling surfacelinger. And
140   usually shows complete flamegraph when it works well.
1415. Takes much less CPU time than DWARF based call graphs. So the sample frequency can be 10000 Hz or
142   higher.
143
144So if you need to profile code on ARM or profile Java code, DWARF based call graph is better. If you
145need to profile C++ code on ARM64, stack frame based call graphs may be better. After all, you can
146fisrt try DWARF based call graph, which is also the default option when `-g` is used. Because it
147always produces reasonable results. If it doesn't work well enough, then try stack frame based call
148graph instead.
149
150
151### Fix broken DWARF based call graph
152
153A DWARF-based call graph is generated by unwinding thread stacks. When a sample is recorded, a
154kernel dumps up to 64 kilobytes of stack data. By unwinding the stack based on DWARF information,
155we can get a call stack.
156
157Two reasons may cause a broken call stack:
1581. The kernel can only dump up to 64 kilobytes of stack data for each sample, but a thread can have
159   much larger stack. In this case, we can't unwind to the thread start point.
160
1612. We need binaries containing DWARF call frame information to unwind stack frames. The binary
162   should have one of the following sections: .eh_frame, .debug_frame, .ARM.exidx or .gnu_debugdata.
163
164To mitigate these problems,
165
166
167For the missing stack data problem:
1681. To alleviate it, simpleperf joins callchains (call stacks) after recording. If two callchains of
169   a thread have an entry containing the same ip and sp address, then simpleperf tries to join them
170   to make the callchains longer. So we can get more complete callchains by recording longer and
171   joining more samples. This doesn't guarantee to get complete call graphs. But it usually works
172   well.
173
1742. Simpleperf stores samples in a buffer before unwinding them. If the bufer is low in free space,
175   simpleperf may decide to cut stack data for a sample to 1K. Hopefully, this can be recovered by
176   callchain joiner. But when a high percentage of samples are cut, many callchains can be broken.
177   We can tell if many samples are cut in the record command output, like:
178
179```sh
180$ simpleperf record ...
181simpleperf I cmd_record.cpp:809] Samples recorded: 105584 (cut 86291). Samples lost: 6501.
182```
183
184   There are two ways to avoid cutting samples. One is increasing the buffer size, like
185   `--user-buffer-size 1G`. But `--user-buffer-size` is only available on latest simpleperf. If that
186   option isn't available, we can use `--no-cut-samples` to disable cutting samples.
187
188For the missing DWARF call frame info problem:
1891. Most C++ code generates binaries containing call frame info, in .eh_frame or .ARM.exidx sections.
190   These sections are not stripped, and are usually enough for stack unwinding.
191
1922. For C code and a small percentage of C++ code that the compiler is sure will not generate
193   exceptions, the call frame info is generated in .debug_frame section. .debug_frame section is
194   usually stripped with other debug sections. One way to fix it, is to download unstripped binaries
195   on device, as [here](#fix-broken-callchain-stopped-at-c-functions).
196
1973. The compiler doesn't generate unwind instructions for function prologue and epilogue. Because
198   they operates stack frames and will not generate exceptions. But profiling may hit these
199   instructions, and fails to unwind them. This usually doesn't matter in a frame graph. But in a
200   time based Stack Chart (like in Android Studio and Firefox profiler), this causes stack gaps once
201   in a while. We can remove stack gaps via `--remove-gaps`, which is already enabled by default.
202
203
204### Fix broken callchain stopped at C functions
205
206When using dwarf based call graphs, simpleperf generates callchains during recording to save space.
207The debug information needed to unwind C functions is in .debug_frame section, which is usually
208stripped in native libraries in apks. To fix this, we can download unstripped version of native
209libraries on device, and ask simpleperf to use them when recording.
210
211To use simpleperf directly:
212
213```sh
214# create native_libs dir on device, and push unstripped libs in it (nested dirs are not supported).
215$ adb shell mkdir /data/local/tmp/native_libs
216$ adb push <unstripped_dir>/*.so /data/local/tmp/native_libs
217# run simpleperf record with --symfs option.
218$ adb shell simpleperf record xxx --symfs /data/local/tmp/native_libs
219```
220
221To use app_profiler.py:
222
223```sh
224$ ./app_profiler.py -lib <unstripped_dir>
225```
226
227
228### How to solve missing symbols in report?
229
230The simpleperf record command collects symbols on device in perf.data. But if the native libraries
231you use on device are stripped, this will result in a lot of unknown symbols in the report. A
232solution is to build binary_cache on host.
233
234```sh
235# Collect binaries needed by perf.data in binary_cache/.
236$ ./binary_cache_builder.py -lib NATIVE_LIB_DIR,...
237```
238
239The NATIVE_LIB_DIRs passed in -lib option are the directories containing unstripped native
240libraries on host. After running it, the native libraries containing symbol tables are collected
241in binary_cache/ for use when reporting.
242
243```sh
244$ ./report.py --symfs binary_cache
245
246# report_html.py searches binary_cache/ automatically, so you don't need to
247# pass it any argument.
248$ ./report_html.py
249```
250
251
252### Show annotated source code and disassembly
253
254To show hot places at source code and instruction level, we need to show source code and
255disassembly with event count annotation. Simpleperf supports showing annotated source code and
256disassembly for C++ code and fully compiled Java code. Simpleperf supports two ways to do it:
257
2581. Through report_html.py:
259   1) Generate perf.data and pull it on host.
260   2) Generate binary_cache, containing elf files with debug information. Use -lib option to add
261     libs with debug info. Do it with
262     `binary_cache_builder.py -i perf.data -lib <dir_of_lib_with_debug_info>`.
263   3) Use report_html.py to generate report.html with annotated source code and disassembly,
264     as described [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/scripts_reference.md#report_html_py).
265
2662. Through pprof.
267   1) Generate perf.data and binary_cache as above.
268   2) Use pprof_proto_generator.py to generate pprof proto file. `pprof_proto_generator.py`.
269   3) Use pprof to report a function with annotated source code, as described [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/scripts_reference.md#pprof_proto_generator_py).
270
271## Bugs and contribution
272
273Bugs and feature requests can be submitted at https://github.com/android/ndk/issues.
274Patches can be uploaded to android-review.googlesource.com as [here](https://source.android.com/setup/contribute/),
275or sent to email addresses listed [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/OWNERS).
276
277If you want to compile simpleperf C++ source code, follow below steps:
2781. Download AOSP main branch as [here](https://source.android.com/setup/build/requirements).
2792. Build simpleperf.
280```sh
281$ . build/envsetup.sh
282$ lunch aosp_arm64-userdebug
283$ mmma system/extras/simpleperf -j30
284```
285
286If built successfully, out/target/product/generic_arm64/system/bin/simpleperf is for ARM64, and
287out/target/product/generic_arm64/system/bin/simpleperf32 is for ARM.
288
289The source code of simpleperf python scripts is in [system/extras/simpleperf/scripts](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/scripts/).
290Most scripts rely on simpleperf binaries to work. To update binaries for scripts (using linux
291x86_64 host and android arm64 target as an example):
292```sh
293$ cp out/host/linux-x86/lib64/libsimpleperf_report.so system/extras/simpleperf/scripts/bin/linux/x86_64/libsimpleperf_report.so
294$ cp out/target/product/generic_arm64/system/bin/simpleperf_ndk64 system/extras/simpleperf/scripts/bin/android/arm64/simpleperf
295```
296
297Then you can try the latest simpleperf scripts and binaries in system/extras/simpleperf/scripts.
298