• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Collect ETM data for AutoFDO
2
3[TOC]
4
5## Introduction
6
7ETM is a hardware feature available on arm64 devices. It collects the instruction stream running on
8each cpu. ARM uses ETM as an alternative for LBR (last branch record) on x86.
9Simpleperf supports collecting ETM data, and converting it to input files for AutoFDO, which can
10then be used for PGO (profile-guided optimization) during compilation.
11
12On ARMv8, ETM is considered as an external debug interface (unless ARMv8.4 Self-hosted Trace
13extension is impelemented). So it needs to be enabled explicitly in the bootloader, and isn't
14available on user devices. For Pixel devices, it's available on EVT and DVT devices on Pixel 4,
15Pixel 4a (5G) and Pixel 5. To test if it's available on other devices, you can follow commands in
16this doc and see if you can record any ETM data.
17
18## Examples
19
20Below are examples collecting ETM data for AutoFDO. It has two steps: first recording ETM data,
21second converting ETM data to AutoFDO input files.
22
23Record ETM data:
24
25```sh
26# preparation: we need to be root to record ETM data
27$ adb root
28$ adb shell
29redfin:/ \# cd data/local/tmp
30redfin:/data/local/tmp \#
31
32# Do a system wide collection, it writes output to perf.data.
33# If only want ETM data for kernel, use `-e cs-etm:k`.
34# If only want ETM data for userspace, use `-e cs-etm:u`.
35redfin:/data/local/tmp \# simpleperf record -e cs-etm --duration 3 -a
36
37# To reduce file size and time converting to AutoFDO input files, we recommend converting ETM data
38# into an intermediate branch-list format.
39redfin:/data/local/tmp \# simpleperf inject --output branch-list -o branch_list.data
40```
41
42Converting ETM data to AutoFDO input files needs to read binaries.
43So for userspace libraries, they can be converted on device. For kernel, it needs
44to be converted on host, with vmlinux and kernel modules available.
45
46Convert ETM data for userspace libraries:
47
48```sh
49# Injecting ETM data on device. It writes output to perf_inject.data.
50# perf_inject.data is a text file, containing branch counts for each library.
51redfin:/data/local/tmp \# simpleperf inject -i branch_list.data
52```
53
54Convert ETM data for kernel:
55
56```sh
57# pull ETM data to host.
58host $ adb pull /data/local/tmp/branch_list.data
59# download vmlinux and kernel modules to <binary_dir>
60# host simpleperf is in <aosp-top>/system/extras/simpleperf/scripts/bin/linux/x86_64/simpleperf,
61# or you can build simpleperf by `mmma system/extras/simpleperf`.
62host $ simpleperf inject --symdir <binary_dir> -i branch_list.data
63```
64
65The generated perf_inject.data may contain branch info for multiple binaries. But AutoFDO only
66accepts one at a time. So we need to split perf_inject.data.
67The format of perf_inject.data is below:
68
69```perf_inject.data format
70
71executed range with count info for binary1
72branch with count info for binary1
73// name for binary1
74
75executed range with count info for binary2
76branch with count info for binary2
77// name for binary2
78
79...
80```
81
82We need to split perf_inject.data, and make sure one file only contains info for one binary.
83
84Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile. AutoFDO only works
85for binaries having an executable segment as its first loadable segment. But binaries built in
86Android may not follow this rule. Simpleperf inject command knows how to work around this problem.
87But there is a check in AutoFDO forcing binaries to start with an executable segment. We need to
88disable the check in AutoFDO, by commenting out L127-L136 in
89https://github.com/google/autofdo/commit/188db2834ce74762ed17108ca344916994640708#diff-2d132ecbb5e4f13e0da65419f6d1759dd27d6b696786dd7096c0c34d499b1710R127-R136.
90Then we can use `create_llvm_prof` in AutoFDO to create profiles used by clang.
91
92```sh
93# perf_inject_binary1.data is split from perf_inject.data, and only contains branch info for binary1.
94host $ autofdo/create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 -out a.prof -format binary
95
96# perf_inject_kernel.data is split from perf_inject.data, and only contains branch info for [kernel.kallsyms].
97host $ autofdo/create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.prof -format binary
98```
99
100Then we can use a.prof for PGO during compilation, via `-fprofile-sample-use=a.prof`.
101[Here](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers) are more details.
102
103### A complete example: etm_test_loop.cpp
104
105`etm_test_loop.cpp` is an example to show the complete process.
106The source code is in [etm_test_loop.cpp](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/runtest/etm_test_loop.cpp).
107The build script is in [Android.bp](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/runtest/Android.bp).
108It builds an executable called `etm_test_loop`, which runs on device.
109
110Step 1: Build `etm_test_loop` binary.
111
112```sh
113(host) <AOSP>$ . build/envsetup.sh
114(host) <AOSP>$ lunch aosp_arm64-userdebug
115(host) <AOSP>$ make etm_test_loop
116```
117
118Step 2: Run `etm_test_loop` on device, and collect ETM data for its running.
119
120```sh
121(host) <AOSP>$ adb push out/target/product/generic_arm64/system/bin/etm_test_loop /data/local/tmp
122(host) <AOSP>$ adb root
123(host) <AOSP>$ adb shell
124(device) / # cd /data/local/tmp
125(device) /data/local/tmp # chmod a+x etm_test_loop
126(device) /data/local/tmp # simpleperf record -e cs-etm:u ./etm_test_loop
127simpleperf I cmd_record.cpp:729] Recorded for 0.0370068 seconds. Start post processing.
128simpleperf I cmd_record.cpp:799] Aux data traced: 1689136
129(device) /data/local/tmp # simpleperf inject -i perf.data --output branch-list -o branch_list.data
130simpleperf W dso.cpp:557] failed to read min virtual address of [vdso]: File not found
131(device) /data/local/tmp # exit
132(host) <AOSP>$ adb pull /data/local/tmp/branch_list.data
133```
134
135Step 3: Convert ETM data to AutoFDO data.
136
137```sh
138# Build simpleperf tool on host.
139(host) <AOSP>$ make simpleperf_ndk
140(host) <AOSP>$ simpleperf_ndk64 inject -i branch_list.data -o perf_inject_etm_test_loop.data --symdir out/target/product/generic_arm64/symbols/system/bin
141simpleperf W cmd_inject.cpp:505] failed to build instr ranges for binary [vdso]: File not found
142(host) <AOSP>$ cat perf_inject_etm_test_loop.data
14313
1441000-1010:1
1451014-1050:1
146...
147112c->0:1
148// /data/local/tmp/etm_test_loop
149
150(host) <AOSP>$ create_llvm_prof -profile perf_inject_etm_test_loop.data -profiler text -binary out/target/product/generic_arm64/symbols/system/bin/etm_test_loop -out etm_test_loop.afdo -format binary
151(host) <AOSP>$ ls -lh etm_test_loop.afdo
152rw-r--r-- 1 user group 241 Aug 29 16:04 etm_test_loop.afdo
153```
154
155Step 4: Use AutoFDO data to build optimized binary.
156
157```sh
158(host) <AOSP>$ mkdir toolchain/pgo-profiles/sampling/
159(host) <AOSP>$ cp etm_test_loop.afdo toolchain/pgo-profiles/sampling/
160(host) <AOSP>$ vi toolchain/pgo-profiles/sampling/Android.bp
161# edit Android.bp to add a fdo_profile module
162# soong_namespace {}
163#
164# fdo_profile {
165#    name: "etm_test_loop_afdo",
166#    profile: ["etm_test_loop.afdo"],
167# }
168```
169
170`soong_namespace` is added to support fdo_profile modules with the same name
171
172In a product config mk file, update `PRODUCT_AFDO_PROFILES` with
173
174```make
175PRODUCT_AFDO_PROFILES += etm_test_loop://toolchain/pgo-profiles/sampling:etm_test_loop_afdo
176```
177
178```sh
179(host) <AOSP>$ vi system/extras/simpleperf/runtest/Android.bp
180# edit Android.bp to enable afdo for etm_test_loop.
181# cc_binary {
182#    name: "etm_test_loop",
183#    srcs: ["etm_test_loop.cpp"],
184#    afdo: true,
185# }
186(host) <AOSP>$ make etm_test_loop
187```
188
189If comparing the disassembly of `out/target/product/generic_arm64/symbols/system/bin/etm_test_loop`
190before and after optimizing with AutoFDO data, we can see different preferences when branching.
191
192
193## Collect ETM data with a daemon
194
195Android also has a daemon collecting ETM data periodically. It only runs on userdebug and eng
196devices. The source code is in https://android.googlesource.com/platform/system/extras/+/master/profcollectd/.
197
198## Support ETM in the kernel
199
200To let simpleperf use ETM function, we need to enable Coresight driver in the kernel, which lives in
201`<linux_kernel>/drivers/hwtracing/coresight`.
202
203The Coresight driver can be enabled by below kernel configs:
204
205```config
206	CONFIG_CORESIGHT=y
207	CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
208	CONFIG_CORESIGHT_SOURCE_ETM4X=y
209```
210
211On Kernel 5.10+, we recommend building Coresight driver as kernel modules. Because it works with
212GKI kernel.
213
214```config
215	CONFIG_CORESIGHT=m
216	CONFIG_CORESIGHT_LINK_AND_SINK_TMC=m
217	CONFIG_CORESIGHT_SOURCE_ETM4X=m
218```
219
220Android common kernel 5.10+ should have all the Coresight patches needed to collect ETM data.
221Android common kernel 5.4 misses two patches. But by adding patches in
222https://android-review.googlesource.com/q/topic:test_etm_on_hikey960_5.4, we can collect ETM data
223on hikey960 with 5.4 kernel.
224For Android common kernel 4.14 and 4.19, we have backported all necessary Coresight patches.
225
226Besides Coresight driver, we also need to add Coresight devices in device tree. An example is in
227https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/arm/juno-base.dtsi. There should
228be a path flowing ETM data from ETM device through funnels, ETF and replicators, all the way to
229ETR, which writes ETM data to system memory.
230
231One optional flag in ETM device tree is "arm,coresight-loses-context-with-cpu". It saves ETM
232registers when a CPU enters low power state. It may be needed to avoid
233"coresight_disclaim_device_unlocked" warning when doing system wide collection.
234
235One optional flag in ETR device tree is "arm,scatter-gather". Simpleperf requests 4M system memory
236for ETR to store ETM data. Without IOMMU, the memory needs to be contiguous. If the kernel can't
237fulfill the request, simpleperf will report out of memory error. Fortunately, we can use
238"arm,scatter-gather" flag to let ETR run in scatter gather mode, which uses non-contiguous memory.
239
240
241### A possible problem: trace_id mismatch
242
243Each CPU has an ETM device, which has a unique trace_id assigned from the kernel.
244The formula is: `trace_id = 0x10 + cpu * 2`, as in https://github.com/torvalds/linux/blob/master/include/linux/coresight-pmu.h#L37.
245If the formula is modified by local patches, then simpleperf inject command can't parse ETM data
246properly and is likely to give empty output.
247
248
249## Enable ETM in the bootloader
250
251Unless ARMv8.4 Self-hosted Trace extension is implemented, ETM is considered as an external debug
252interface. It may be disabled by fuse (like JTAG). So we need to check if ETM is disabled, and
253if bootloader provides a way to reenable it.
254
255We can tell if ETM is disable by checking its TRCAUTHSTATUS register, which is exposed in sysfs,
256like /sys/bus/coresight/devices/coresight-etm0/mgmt/trcauthstatus. To reenable ETM, we need to
257enable non-Secure non-invasive debug on ARM CPU. The method depends on chip vendors(SOCs).
258
259
260## Related docs
261
262* [Arm Architecture Reference Manual Armv8, D3 AArch64 Self-hosted Trace](https://developer.arm.com/documentation/ddi0487/latest)
263* [ARM ETM Architecture Specification](https://developer.arm.com/documentation/ihi0064/latest/)
264* [ARM CoreSight Architecture Specification](https://developer.arm.com/documentation/ihi0029/latest)
265* [CoreSight Components Technical Reference Manual](https://developer.arm.com/documentation/ddi0314/h/)
266* [CoreSight Trace Memory Controller Technical Reference Manual](https://developer.arm.com/documentation/ddi0461/b/)
267* [OpenCSD library for decoding ETM data](https://github.com/Linaro/OpenCSD)
268* [AutoFDO tool for converting profile data](https://github.com/google/autofdo)
269