1# Native heap profiler 2 3NOTE: **heapprofd requires Android 10 or higher** 4 5Heapprofd is a tool that tracks native heap allocations & deallocations of an 6Android process within a given time period. The resulting profile can be used to 7attribute memory usage to particular call-stacks, supporting a mix of both 8native and java code. The tool can be used by Android platform and app 9developers to investigate memory issues. 10 11On debug Android builds, you can profile all apps and most system services. 12On "user" builds, you can only use it on apps with the debuggable or 13profileable manifest flag. 14 15## Quickstart 16 17See the [Memory Guide](/docs/case-studies/memory.md#heapprofd) for getting 18started with heapprofd. 19 20## UI 21 22Dumps from heapprofd are shown as flamegraphs in the UI after clicking on the 23diamond. Each diamond corresponds to a snapshot of the allocations and 24callstacks collected at that point in time. 25 26![heapprofd snapshots in the UI tracks](/docs/images/profile-diamond.png) 27 28![heapprofd flamegraph](/docs/images/native-flamegraph.png) 29 30## SQL 31 32Information about callstacks is written to the following tables: 33 34* [`stack_profile_mapping`](/docs/analysis/sql-tables.autogen#stack_profile_mapping) 35* [`stack_profile_frame`](/docs/analysis/sql-tables.autogen#stack_profile_frame) 36* [`stack_profile_callsite`](/docs/analysis/sql-tables.autogen#stack_profile_callsite) 37 38The allocations themselves are written to 39[`heap_profile_allocation`](/docs/analysis/sql-tables.autogen#heap_profile_allocation). 40 41Offline symbolization data is stored in 42[`stack_profile_symbol`](/docs/analysis/sql-tables.autogen#stack_profile_symbol). 43 44See [Example Queries](#heapprofd-example-queries) for example SQL queries. 45 46## Recording 47 48Heapprofd can be configured and started in three ways. 49 50#### Manual configuration 51 52This requires manually setting the 53[HeapprofdConfig](/docs/reference/trace-config-proto.autogen#HeapprofdConfig) 54section of the trace config. The only benefit of doing so is that in this way 55heap profiling can be enabled alongside any other tracing data sources. 56 57#### Using the tools/heap_profile script (recommended) 58 59You can use the `tools/heap_profile` script. If you are having trouble 60make sure you are using the 61[latest version]( 62https://raw.githubusercontent.com/google/perfetto/master/tools/heap_profile). 63 64You can target processes either by name (`-n com.example.myapp`) or by PID 65(`-p 1234`). In the first case, the heap profile will be initiated on both on 66already-running processes that match the package name and new processes launched 67after the profiling session is started. 68For the full arguments list see the 69[heap_profile cmdline reference page](/docs/reference/heap_profile-cli). 70 71#### Using the Recording page of Perfetto UI 72 73You can also use the [Perfetto UI](https://ui.perfetto.dev/#!/record?p=memory) 74to record heapprofd profiles. Tick "Heap profiling" in the trace configuration, 75enter the processes you want to target, click "Add Device" to pair your phone, 76and record profiles straight from your browser. This is also possible on 77Windows. 78 79## Viewing the data 80 81The resulting profile proto contains four views on the data 82 83* **space**: how many bytes were allocated but not freed at this callstack the 84 moment the dump was created. 85* **alloc\_space**: how many bytes were allocated (including ones freed at the 86 moment of the dump) at this callstack 87* **objects**: how many allocations without matching frees were done at this 88 callstack. 89* **alloc\_objects**: how many allocations (including ones with matching frees) 90 were done at this callstack. 91 92_(Googlers: You can also open the gzipped protos using http://pprof/)_ 93 94TIP: you might want to put `libart.so` as a "Hide regex" when profiling apps. 95 96You can use the [Perfetto UI](https://ui.perfetto.dev) to visualize heap dumps. 97Upload the `raw-trace` file in your output directory. You will see all heap 98dumps as diamonds on the timeline, click any of them to get a flamegraph. 99 100Alternatively [Speedscope](https://speedscope.app) can be used to visualize 101the gzipped protos, but will only show the space view. 102 103TIP: Click Left Heavy on the top left for a good visualization. 104 105## Sampling interval 106 107Heapprofd samples heap allocations by hooking calls to malloc/free and C++'s 108operator new/delete. Given a sampling interval of n bytes, one allocation is 109sampled, on average, every n bytes allocated. This allows to reduce the 110performance impact on the target process. The default sampling rate 111is 4096 bytes. 112 113The easiest way to reason about this is to imagine the memory allocations as a 114stream of one byte allocations. From this stream, every byte has a 1/n 115probability of being selected as a sample, and the corresponding callstack 116gets attributed the complete n bytes. For more accuracy, allocations larger than 117the sampling interval bypass the sampling logic and are recorded with their true 118size. 119 120## Startup profiling 121 122When specifying a target process name (as opposite to the PID), new processes 123matching that name are profiled from their startup. The resulting profile will 124contain all allocations done between the start of the process and the end 125of the profiling session. 126 127On Android, Java apps are usually not exec()-ed from scratch, but fork()-ed from 128the [zygote], which then specializes into the desired app. If the app's name 129matches a name specified in the profiling session, profiling will be enabled as 130part of the zygote specialization. The resulting profile contains all 131allocations done between that point in zygote specialization and the end of the 132profiling session. Some allocations done early in the specialization process are 133not accounted for. 134 135At the trace proto level, the resulting [ProfilePacket] will have the 136`from_startup` field set to true in the corresponding `ProcessHeapSamples` 137message. This is not surfaced in the converted pprof compatible proto. 138 139[ProfilePacket]: /docs/reference/trace-packet-proto.autogen#ProfilePacket 140[zygote]: https://developer.android.com/topic/performance/memory-overview#SharingRAM 141 142## Runtime profiling 143 144When a profiling session is started, all matching processes (by name or PID) 145are enumerated and are signalled to request profiling. Profiling isn't actually 146enabled until a few hundred milliseconds after the next allocation that is 147done by the application. If the application is idle when profiling is 148requested, and then does a burst of allocations, these may be missed. 149 150The resulting profile will contain all allocations done between when profiling 151is enabled, and the end of the profiling session. 152 153The resulting [ProfilePacket] will have `from_startup` set to false in the 154corresponding `ProcessHeapSamples` message. This does not get surfaced in the 155converted pprof compatible proto. 156 157## Concurrent profiling sessions 158 159If multiple sessions name the same target process (either by name or PID), 160only the first relevant session will profile the process. The other sessions 161will report that the process had already been profiled when converting to 162the pprof compatible proto. 163 164If you see this message but do not expect any other sessions, run 165 166```shell 167adb shell killall perfetto 168``` 169 170to stop any concurrent sessions that may be running. 171 172The resulting [ProfilePacket] will have `rejected_concurrent` set to true in 173otherwise empty corresponding `ProcessHeapSamples` message. This does not get 174surfaced in the converted pprof compatible proto. 175 176## {#heapprofd-targets} Target processes 177 178Depending on the build of Android that heapprofd is run on, some processes 179are not be eligible to be profiled. 180 181On _user_ (i.e. production, non-rootable) builds, only Java applications with 182either the profileable or the debuggable manifest flag set can be profiled. 183Profiling requests for non-profileable/debuggable processes will result in an 184empty profile. 185 186On userdebug builds, all processes except for a small set of critical 187services can be profiled (to find the set of disallowed targets, look for 188`never_profile_heap` in [heapprofd.te]( 189https://cs.android.com/android/platform/superproject/+/master:system/sepolicy/private/heapprofd.te?q=never_profile_heap). 190This restriction can be lifted by disabling SELinux by running 191`adb shell su root setenforce 0` or by passing `--disable-selinux` to the 192`heap_profile` script. 193 194<center> 195 196| | userdebug setenforce 0 | userdebug | user | 197|-------------------------|:----------------------:|:---------:|:----:| 198| critical native service | Y | N | N | 199| native service | Y | Y | N | 200| app | Y | Y | N | 201| profileable app | Y | Y | Y | 202| debuggable app | Y | Y | Y | 203 204</center> 205 206To mark an app as profileable, put `<profileable android:shell="true"/>` into 207the `<application>` section of the app manifest. 208 209```xml 210<manifest ...> 211 <application> 212 <profileable android:shell="true"/> 213 ... 214 </application> 215</manifest> 216``` 217 218## DEDUPED frames 219 220If the name of a Java method includes `[DEDUPED]`, this means that multiple 221methods share the same code. ART only stores the name of a single one in its 222metadata, which is displayed here. This is not necessarily the one that was 223called. 224 225## Triggering heap snapshots on demand 226 227Heap snapshot are recorded into the trace either at regular time intervals, if 228using the `continuous_dump_config` field, or at the end of the session. 229 230You can also trigger a snapshot of all currently profiled processes by running 231`adb shell killall -USR1 heapprofd`. This can be useful in lab tests for 232recording the current memory usage of the target in a specific state. 233 234This dump will show up in addition to the dump at the end of the profile that is 235always produced. You can create multiple of these dumps, and they will be 236enumerated in the output directory. 237 238## Symbolization 239 240NOTE: Symbolization is currently only available on Linux and MacOS. 241 242### Set up llvm-symbolizer 243 244You only need to do this once. 245 246To use symbolization, your system must have llvm-symbolizer installed and 247accessible from `$PATH` as `llvm-symbolizer`. On Debian, you can install it 248using `sudo apt install llvm-9`. 249This will create `/usr/bin/llvm-symbolizer-9`. Symlink that to somewhere in 250your `$PATH` as `llvm-symbolizer`. 251 252For instance, `ln -s /usr/bin/llvm-symbolizer-9 ~/bin/llvm-symbolizer`, and 253add `~/bin` to your path (or run the commands below with `PATH=~/bin:$PATH` 254prefixed). 255 256### Symbolize your profile 257 258If the profiled binary or libraries do not have symbol names, you can 259symbolize profiles offline. Even if they do, you might want to symbolize in 260order to get inlined function and line number information. All tools 261(traceconv, trace_processor_shell, the heap_profile script) support specifying 262the `PERFETTO_BINARY_PATH` as an environment variable. 263 264``` 265PERFETTO_BINARY_PATH=somedir tools/heap_profile --name ${NAME} 266``` 267 268You can persist symbols for a trace by running 269`PERFETTO_BINARY_PATH=somedir tools/traceconv symbolize raw-trace > symbols`. 270You can then concatenate the symbols to the trace ( 271`cat raw-trace symbols > symbolized-trace`) and the symbols will part of 272`symbolized-trace`. The `tools/heap_profile` script will also generate this 273file in your output directory, if `PERFETTO_BINARY_PATH` is used. 274 275The symbol file is the first with matching Build ID in the following order: 276 2771. absolute path of library file relative to binary path. 2782. absolute path of library file relative to binary path, but with base.apk! 279 removed from filename. 2803. basename of library file relative to binary path. 2814. basename of library file relative to binary path, but with base.apk! 282 removed from filename. 2835. in the subdirectory .build-id: the first two hex digits of the build-id 284 as subdirectory, then the rest of the hex digits, with ".debug" appended. 285 See 286 https://fedoraproject.org/wiki/RolandMcGrath/BuildID#Find_files_by_build_ID 287 288For example, "/system/lib/base.apk!foo.so" with build id abcd1234, 289is looked for at: 290 2911. $PERFETTO_BINARY_PATH/system/lib/base.apk!foo.so 2922. $PERFETTO_BINARY_PATH/system/lib/foo.so 2933. $PERFETTO_BINARY_PATH/base.apk!foo.so 2944. $PERFETTO_BINARY_PATH/foo.so 2955. $PERFETTO_BINARY_PATH/.build-id/ab/cd1234.debug 296 297Alternatively, you can set the `PERFETTO_SYMBOLIZER_MODE` environment variable 298to `index`, and the symbolizer will recursively search the given directory for 299an ELF file with the given build id. This way, you will not have to worry 300about correct filenames. 301 302## Deobfuscation 303 304If your profile contains obfuscated Java methods (like `fsd.a`), you can 305provide a deobfuscation map to turn them back into human readable. 306To do so, use the `PERFETTO_PROGUARD_MAP` environment variable, using the 307format `packagename=filename[:packagename=filename...]`, e.g. 308`PERFETTO_PROGUARD_MAP=com.example.pkg1=foo.txt:com.example.pkg2=bar.txt`. 309All tools 310(traceconv, trace_processor_shell, the heap_profile script) support specifying 311the `PERFETTO_PROGUARD_MAP` as an environment variable. 312 313You can get a deobfuscation map for your trace using 314`tools/traceconv deobfuscate`. Then concatenate the resulting file to your 315trace to get a deobfuscated version of it. 316 317``` 318PERFETTO_PROGUARD_MAP=com.example.pkg tools/traceconv deobfuscate ${TRACE} > deobfuscation_map 319cat ${TRACE} deobfuscation_map > deobfuscated_trace 320``` 321 322## Troubleshooting 323 324### Buffer overrun 325 326If the rate of allocations is too high for heapprofd to keep up, the profiling 327session will end early due to a buffer overrun. If the buffer overrun is 328caused by a transient spike in allocations, increasing the shared memory buffer 329size (passing `--shmem-size` to `tools/heap_profile`) can resolve the issue. 330Otherwise the sampling interval can be increased (at the expense of lower 331accuracy in the resulting profile) by passing `--interval=16000` or higher. 332 333### Profile is empty 334 335Check whether your target process is eligible to be profiled by consulting 336[Target processes](#heapprofd-targets) above. 337 338Also check the [Known Issues](#known-issues). 339 340### Implausible callstacks 341 342If you see a callstack that seems to impossible from looking at the code, make 343sure no [DEDUPED frames](#deduped-frames) are involved. 344 345Also, if your code is linked using _Identical Code Folding_ 346(ICF), i.e. passing `-Wl,--icf=...` to the linker, most trivial functions, often 347constructors and destructors, can be aliased to binary-equivalent operators 348of completely unrelated classes. 349 350### Symbolization: Could not find library 351 352When symbolizing a profile, you might come across messages like this: 353 354```bash 355Could not find /data/app/invalid.app-wFgo3GRaod02wSvPZQ==/lib/arm64/somelib.so 356(Build ID: 44b7138abd5957b8d0a56ce86216d478). 357``` 358 359Check whether your library (in this example somelib.so) exists in 360`PERFETTO_BINARY_PATH`. Then compare the Build ID to the one in your 361symbol file, which you can get by running 362`readelf -n /path/in/binary/path/somelib.so`. If it does not match, the 363symbolized file has a different version than the one on device, and cannot 364be used for symbolization. 365If it does, try moving somelib.so to the root of `PERFETTO_BINARY_PATH` and 366try again. 367 368### Only one frame shown 369If you only see a single frame for functions in a specific library, make sure 370that the library has unwind information. We need one of 371 372* `.gnu_debugdata` 373* `.eh_frame` (+ preferably `.eh_frame_hdr`) 374* `.debug_frame`. 375 376Frame-pointer unwinding is *not supported*. 377 378To check if an ELF file has any of those, run 379 380```console 381$ readelf -S file.so | grep "gnu_debugdata\|eh_frame\|debug_frame" 382 [12] .eh_frame_hdr PROGBITS 000000000000c2b0 0000c2b0 383 [13] .eh_frame PROGBITS 0000000000011000 00011000 384 [24] .gnu_debugdata PROGBITS 0000000000000000 000f7292 385``` 386 387If this does not show one or more of the sections, change your build system 388to not strip them. 389 390## (non-Android) Linux support 391 392NOTE: Do not use this for production purposes. 393 394You can use a standalone library to profile memory allocations on Linux. 395First [build Perfetto](/docs/contributing/build-instructions.md). You only need 396to do this once. 397 398``` 399tools/build_all_configs.py 400ninja -C out/linux_clang_release 401``` 402 403Then, run traced 404 405``` 406out/linux_clang_release/traced 407``` 408 409Start the profile (e.g. targeting trace_processor_shell) 410 411``` 412tools/heap_profile -n trace_processor_shell --print-config | \ 413out/linux_clang_release/perfetto \ 414 -c - --txt \ 415 -o ~/heapprofd-trace 416``` 417 418Finally, run your target (e.g. trace_processor_shell) with LD_PRELOAD 419 420``` 421LD_PRELOAD=out/linux_clang_release/libheapprofd_glibc_preload.so out/linux_clang_release/trace_processor_shell <trace> 422``` 423 424Then, Ctrl-C the Perfetto invocation and upload ~/heapprofd-trace to the 425[Perfetto UI](https://ui.perfetto.dev). 426 427## Known Issues 428 429### {#known-issues-android11} Android 11 430 431* 32-bit programs cannot be targeted on 64-bit devices. 432* Setting `sampling_interval_bytes` to 0 crashes the target process. 433 This is an invalid config that should be rejected instead. 434* For startup profiles, some frame names might be missing. This will be 435 resolved in Android 12. 436* `Failed to send control socket byte.` is displayed in logcat at the end of 437 every profile. This is benign. 438* The object count may be incorrect in `dump_at_max` profiles. 439 440### {#known-issues-android10} Android 10 441* Function names in libraries with load bias might be incorrect. Use 442 [offline symbolization](#symbolization) to resolve this issue. 443* For startup profiles, some frame names might be missing. This will be 444 resolved in Android 12. 445* 32-bit programs cannot be targeted on 64-bit devices. 446* x86 / x86_64 platforms are not supported. This includes the Android 447_Cuttlefish_. 448 emulator. 449* On ARM32, the bottom-most frame is always `ERROR 2`. This is harmless and 450 the callstacks are still complete. 451* If heapprofd is run standalone (by running `heapprofd` in a root shell, rather 452 than through init), `/dev/socket/heapprofd` get assigned an incorrect SELinux 453 domain. You will not be able to profile any processes unless you disable 454 SELinux enforcement. 455 Run `restorecon /dev/socket/heapprofd` in a root shell to resolve. 456* Using `vfork(2)` or `clone(2)` with `CLONE_VM` and allocating / freeing 457 memory in the child process will prematurely end the profile. 458 `java.lang.Runtime.exec` does this, calling it will prematurely end 459 the profile. Note that this is in violation of the POSIX standard. 460* Setting `sampling_interval_bytes` to 0 crashes the target process. 461 This is an invalid config that should be rejected instead. 462* `Failed to send control socket byte.` is displayed in logcat at the end of 463 every profile. This is benign. 464* The object count may be incorrect in `dump_at_max` profiles. 465 466## Heapprofd vs malloc_info() vs RSS 467 468When using heapprofd and interpreting results, it is important to know the 469precise meaning of the different memory metrics that can be obtained from the 470operating system. 471 472**heapprofd** gives you the number of bytes the target program 473requested from the default C/C++ allocator. If you are profiling a Java app from 474startup, allocations that happen early in the application's initialization will 475not be visible to heapprofd. Native services that do not fork from the Zygote 476are not affected by this. 477 478**malloc\_info** is a libc function that gives you information about the 479allocator. This can be triggered on userdebug builds by using 480`am dumpheap -m <PID> /data/local/tmp/heap.txt`. This will in general be more 481than the memory seen by heapprofd, depending on the allocator not all memory 482is immediately freed. In particular, jemalloc retains some freed memory in 483thread caches. 484 485**Heap RSS** is the amount of memory requested from the operating system by the 486allocator. This is larger than the previous two numbers because memory can only 487be obtained in page size chunks, and fragmentation causes some of that memory to 488be wasted. This can be obtained by running `adb shell dumpsys meminfo <PID>` and 489looking at the "Private Dirty" column. 490RSS can also end up being smaller than the other two if the device kernel uses 491memory compression (ZRAM, enabled by default on recent versions of android) and 492the memory of the process get swapped out onto ZRAM. 493 494| | heapprofd | malloc\_info | RSS | 495|---------------------|:-----------------:|:------------:|:---:| 496| from native startup | x | x | x | 497| after zygote init | x | x | x | 498| before zygote init | | x | x | 499| thread caches | | x | x | 500| fragmentation | | | x | 501 502If you observe high RSS or malloc\_info metrics but heapprofd does not match, 503you might be hitting some patological fragmentation problem in the allocator. 504 505## Convert to pprof 506 507You can use [traceconv](/docs/quickstart/traceconv.md) to convert the heap dumps 508in a trace into the [pprof](https://github.com/google/pprof) format. These can 509then be viewed using the pprof CLI or a UI (e.g. Speedscope, or Google-internal 510pprof/). 511 512```bash 513tools/traceconv profile /tmp/profile 514``` 515 516This will create a directory in `/tmp/` containing the heap dumps. Run: 517 518```bash 519gzip /tmp/heap_profile-XXXXXX/*.pb 520``` 521 522to get gzipped protos, which tools handling pprof profile protos expect. 523 524## {#heapprofd-example-queries} Example SQL Queries 525 526We can get the callstacks that allocated using an SQL Query in the 527Trace Processor. For each frame, we get one row for the number of allocated 528bytes, where `count` and `size` is positive, and, if any of them were already 529freed, another line with negative `count` and `size`. The sum of those gets us 530the `space` view. 531 532```sql 533select a.callsite_id, a.ts, a.upid, f.name, f.rel_pc, m.build_id, m.name as mapping_name, 534 sum(a.size) as space_size, sum(a.count) as space_count 535 from heap_profile_allocation a join 536 stack_profile_callsite c ON (a.callsite_id = c.id) join 537 stack_profile_frame f ON (c.frame_id = f.id) join 538 stack_profile_mapping m ON (f.mapping = m.id) 539 group by 1, 2, 3, 4, 5, 6, 7 order by space_size desc; 540``` 541 542| callsite_id | ts | upid | name | rel_pc | build_id | mapping_name | space_size | space_count | 543|-------------|----|------|-------|-----------|------|--------|----------|------| 544|6660|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |106496|4| 545|192 |5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1| 546|1421|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1| 547|1537|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1| 548|8843|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26424 |1| 549|8618|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |24576 |4| 550|3750|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |12288 |1| 551|2820|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192 |2| 552|3788|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192 |2| 553 554We can see all the functions are "malloc" and "realloc", which is not terribly 555informative. Usually we are interested in the _cumulative_ bytes allocated in 556a function (otherwise, we will always only see malloc / realloc). Chasing the 557parent_id of a callsite (not shown in this table) recursively is very hard in 558SQL. 559 560There is an **experimental** table that surfaces this information. The **API is 561subject to change**. 562 563```sql 564select name, map_name, cumulative_size 565 from experimental_flamegraph(8300973884377,1,'native') 566 order by abs(cumulative_size) desc; 567``` 568 569| name | map_name | cumulative_size | 570|------|----------|----------------| 571|__start_thread|/apex/com.android.runtime/lib64/bionic/libc.so|392608| 572|_ZL15__pthread_startPv|/apex/com.android.runtime/lib64/bionic/libc.so|392608| 573|_ZN13thread_data_t10trampolineEPKS|/system/lib64/libutils.so|199496| 574|_ZN7android14AndroidRuntime15javaThreadShellEPv|/system/lib64/libandroid_runtime.so|199496| 575|_ZN7android6Thread11_threadLoopEPv|/system/lib64/libutils.so|199496| 576|_ZN3art6Thread14CreateCallbackEPv|/apex/com.android.art/lib64/libart.so|193112| 577|_ZN3art35InvokeVirtualOrInterface...|/apex/com.android.art/lib64/libart.so|193112| 578|_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc|/apex/com.android.art/lib64/libart.so|193112| 579|art_quick_invoke_stub|/apex/com.android.art/lib64/libart.so|193112| 580