1# Collect ETM data for AutoFDO 2 3[TOC] 4 5## Introduction 6 7ETM is a hardware feature available on arm64 devices. It collects the instruction stream running on 8each cpu. ARM uses ETM as an alternative for LBR (last branch record) on x86. 9Simpleperf supports collecting ETM data, and converting it to input files for AutoFDO, which can 10then be used for PGO (profile-guided optimization) during compilation. 11 12On ARMv8, ETM is considered as an external debug interface (unless ARMv8.4 Self-hosted Trace 13extension is impelemented). So it needs to be enabled explicitly in the bootloader, and isn't 14available on user devices. For Pixel devices, it's available on EVT and DVT devices on Pixel 4, 15Pixel 4a (5G) and Pixel 5. To test if it's available on other devices, you can follow commands in 16this doc and see if you can record any ETM data. 17 18## Examples 19 20Below are examples collecting ETM data for AutoFDO. It has two steps: first recording ETM data, 21second converting ETM data to AutoFDO input files. 22 23Record ETM data: 24 25```sh 26# preparation: we need to be root to record ETM data 27$ adb root 28$ adb shell 29redfin:/ \# cd data/local/tmp 30redfin:/data/local/tmp \# 31 32# Do a system wide collection, it writes output to perf.data. 33# If only want ETM data for kernel, use `-e cs-etm:k`. 34# If only want ETM data for userspace, use `-e cs-etm:u`. 35redfin:/data/local/tmp \# simpleperf record -e cs-etm --duration 3 -a 36 37# To reduce file size and time converting to AutoFDO input files, we recommend converting ETM data 38# into an intermediate branch-list format. 39redfin:/data/local/tmp \# simpleperf inject --output branch-list -o branch_list.data 40``` 41 42Converting ETM data to AutoFDO input files needs to read binaries. 43So for userspace libraries, they can be converted on device. For kernel, it needs 44to be converted on host, with vmlinux and kernel modules available. 45 46Convert ETM data for userspace libraries: 47 48```sh 49# Injecting ETM data on device. It writes output to perf_inject.data. 50# perf_inject.data is a text file, containing branch counts for each library. 51redfin:/data/local/tmp \# simpleperf inject -i branch_list.data 52``` 53 54Convert ETM data for kernel: 55 56```sh 57# pull ETM data to host. 58host $ adb pull /data/local/tmp/branch_list.data 59# download vmlinux and kernel modules to <binary_dir> 60# host simpleperf is in <aosp-top>/system/extras/simpleperf/scripts/bin/linux/x86_64/simpleperf, 61# or you can build simpleperf by `mmma system/extras/simpleperf`. 62host $ simpleperf inject --symdir <binary_dir> -i branch_list.data 63``` 64 65The generated perf_inject.data may contain branch info for multiple binaries. But AutoFDO only 66accepts one at a time. So we need to split perf_inject.data. 67The format of perf_inject.data is below: 68 69```perf_inject.data format 70 71executed range with count info for binary1 72branch with count info for binary1 73// name for binary1 74 75executed range with count info for binary2 76branch with count info for binary2 77// name for binary2 78 79... 80``` 81 82We need to split perf_inject.data, and make sure one file only contains info for one binary. 83 84Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile like below: 85 86```sh 87# perf_inject_kernel.data is split from perf_inject.data, and only contains branch info for [kernel.kallsyms]. 88host $ autofdo/create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.prof -format binary 89``` 90 91Then we can use a.prof for PGO during compilation, via `-fprofile-sample-use=a.prof`. 92[Here](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers) are more details. 93 94## Collect ETM data with a daemon 95 96Android also has a daemon collecting ETM data periodically. It only runs on userdebug and eng 97devices. The source code is in https://android.googlesource.com/platform/system/extras/+/master/profcollectd/. 98 99## Support ETM in the kernel 100 101To let simpleperf use ETM function, we need to enable Coresight driver in the kernel, which lives in 102`<linux_kernel>/drivers/hwtracing/coresight`. 103 104The Coresight driver can be enabled by below kernel configs: 105 106```config 107 CONFIG_CORESIGHT=y 108 CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y 109 CONFIG_CORESIGHT_SOURCE_ETM4X=y 110``` 111 112On Kernel 5.10+, we recommend building Coresight driver as kernel modules. Because it works with 113GKI kernel. 114 115```config 116 CONFIG_CORESIGHT=m 117 CONFIG_CORESIGHT_LINK_AND_SINK_TMC=m 118 CONFIG_CORESIGHT_SOURCE_ETM4X=m 119``` 120 121Android common kernel 5.10+ should have all the Coresight patches needed to collect ETM data. 122Android common kernel 5.4 misses two patches. But by adding patches in 123https://android-review.googlesource.com/q/topic:test_etm_on_hikey960_5.4, we can collect ETM data 124on hikey960 with 5.4 kernel. 125For Android common kernel 4.14 and 4.19, we have backported all necessary Coresight patches. 126 127Besides Coresight driver, we also need to add Coresight devices in device tree. An example is in 128https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/arm/juno-base.dtsi. There should 129be a path flowing ETM data from ETM device through funnels, ETF and replicators, all the way to 130ETR, which writes ETM data to system memory. 131 132One optional flag in ETM device tree is "arm,coresight-loses-context-with-cpu". It saves ETM 133registers when a CPU enters low power state. It may be needed to avoid 134"coresight_disclaim_device_unlocked" warning when doing system wide collection. 135 136One optional flag in ETR device tree is "arm,scatter-gather". Simpleperf requests 4M system memory 137for ETR to store ETM data. Without IOMMU, the memory needs to be contiguous. If the kernel can't 138fulfill the request, simpleperf will report out of memory error. Fortunately, we can use 139"arm,scatter-gather" flag to let ETR run in scatter gather mode, which uses non-contiguous memory. 140 141## Enable ETM in the bootloader 142 143Unless ARMv8.4 Self-hosted Trace extension is implemented, ETM is considered as an external debug 144interface. It may be disabled by fuse (like JTAG). So we need to check if ETM is disabled, and 145if bootloader provides a way to reenable it. 146 147We can tell if ETM is disable by checking its TRCAUTHSTATUS register, which is exposed in sysfs, 148like /sys/bus/coresight/devices/coresight-etm0/mgmt/trcauthstatus. To reenable ETM, we need to 149enable non-Secure non-invasive debug on ARM CPU. The method depends on chip vendors(SOCs). 150 151 152## Related docs 153 154* [Arm Architecture Reference Manual Armv8, D3 AArch64 Self-hosted Trace](https://developer.arm.com/documentation/ddi0487/latest) 155* [ARM ETM Architecture Specification](https://developer.arm.com/documentation/ihi0064/latest/) 156* [ARM CoreSight Architecture Specification](https://developer.arm.com/documentation/ihi0029/latest) 157* [CoreSight Components Technical Reference Manual](https://developer.arm.com/documentation/ddi0314/h/) 158* [CoreSight Trace Memory Controller Technical Reference Manual](https://developer.arm.com/documentation/ddi0461/b/) 159* [OpenCSD library for decoding ETM data](https://github.com/Linaro/OpenCSD) 160* [AutoFDO tool for converting profile data](https://github.com/google/autofdo) 161