1# Native heap profiler 2 3NOTE: **heapprofd requires Android 10 or higher** 4 5Heapprofd is a tool that tracks native heap allocations & deallocations of an 6Android process within a given time period. The resulting profile can be used to 7attribute memory usage to particular call-stacks, supporting a mix of both 8native and java code. The tool can be used by Android platform and app 9developers to investigate memory issues. 10 11On debug Android builds, you can profile all apps and most system services. 12On "user" builds, you can only use it on apps with the debuggable or 13profileable manifest flag. 14 15## Quickstart 16 17See the [Memory Guide](/docs/case-studies/memory.md#heapprofd) for getting 18started with heapprofd. 19 20## UI 21 22Dumps from heapprofd are shown as flamegraphs in the UI after clicking on the 23diamond. Each diamond corresponds to a snapshot of the allocations and 24callstacks collected at that point in time. 25 26![heapprofd snapshots in the UI tracks](/docs/images/profile-diamond.png) 27 28![heapprofd flamegraph](/docs/images/native-flamegraph.png) 29 30## SQL 31 32Information about callstacks is written to the following tables: 33 34* [`stack_profile_mapping`](/docs/analysis/sql-tables.autogen#stack_profile_mapping) 35* [`stack_profile_frame`](/docs/analysis/sql-tables.autogen#stack_profile_frame) 36* [`stack_profile_callsite`](/docs/analysis/sql-tables.autogen#stack_profile_callsite) 37 38The allocations themselves are written to 39[`heap_profile_allocation`](/docs/analysis/sql-tables.autogen#heap_profile_allocation). 40 41Offline symbolization data is stored in 42[`stack_profile_symbol`](/docs/analysis/sql-tables.autogen#stack_profile_symbol). 43 44See [Example Queries](#heapprofd-example-queries) for example SQL queries. 45 46## Recording 47 48Heapprofd can be configured and started in three ways. 49 50#### Manual configuration 51 52This requires manually setting the 53[HeapprofdConfig](/docs/reference/trace-config-proto.autogen#HeapprofdConfig) 54section of the trace config. The only benefit of doing so is that in this way 55heap profiling can be enabled alongside any other tracing data sources. 56 57#### Using the tools/heap_profile script (recommended) 58 59On Linux / MacOS, use the `tools/heap_profile` script. If you are having trouble 60make sure you are using the 61[latest version]( 62https://raw.githubusercontent.com/google/perfetto/master/tools/heap_profile). 63 64You can target processes either by name (`-n com.example.myapp`) or by PID 65(`-p 1234`). In the first case, the heap profile will be initiated on both on 66already-running processes that match the package name and new processes launched 67after the profiling session is started. 68For the full arguments list see the 69[heap_profile cmdline reference page](/docs/reference/heap_profile-cli). 70 71#### Using the Recording page of Perfetto UI 72 73You can also use the [Perfetto UI](https://ui.perfetto.dev/#!/record?p=memory) 74to record heapprofd profiles. Tick "Heap profiling" in the trace configuration, 75enter the processes you want to target, click "Add Device" to pair your phone, 76and record profiles straight from your browser. This is also possible on 77Windows. 78 79## Viewing the data 80 81The resulting profile proto contains four views on the data 82 83* **space**: how many bytes were allocated but not freed at this callstack the 84 moment the dump was created. 85* **alloc\_space**: how many bytes were allocated (including ones freed at the 86 moment of the dump) at this callstack 87* **objects**: how many allocations without matching frees were done at this 88 callstack. 89* **alloc\_objects**: how many allocations (including ones with matching frees) 90 were done at this callstack. 91 92_(Googlers: You can also open the gzipped protos using http://pprof/)_ 93 94TIP: you might want to put `libart.so` as a "Hide regex" when profiling apps. 95 96You can use the [Perfetto UI](https://ui.perfetto.dev) to visualize heap dumps. 97Upload the `raw-trace` file in your output directory. You will see all heap 98dumps as diamonds on the timeline, click any of them to get a flamegraph. 99 100Alternatively [Speedscope](https://speedscope.app) can be used to visualize 101the gzipped protos, but will only show the space view. 102 103TIP: Click Left Heavy on the top left for a good visualization. 104 105## Sampling interval 106 107Heapprofd samples heap allocations by hooking calls to malloc/free and C++'s 108operator new/delete. Given a sampling interval of n bytes, one allocation is 109sampled, on average, every n bytes allocated. This allows to reduce the 110performance impact on the target process. The default sampling rate 111is 4096 bytes. 112 113The easiest way to reason about this is to imagine the memory allocations as a 114stream of one byte allocations. From this stream, every byte has a 1/n 115probability of being selected as a sample, and the corresponding callstack 116gets attributed the complete n bytes. For more accuracy, allocations larger than 117the sampling interval bypass the sampling logic and are recorded with their true 118size. 119 120## Startup profiling 121 122When specifying a target process name (as opposite to the PID), new processes 123matching that name are profiled from their startup. The resulting profile will 124contain all allocations done between the start of the process and the end 125of the profiling session. 126 127On Android, Java apps are usually not exec()-ed from scratch, but fork()-ed from 128the [zygote], which then specializes into the desired app. If the app's name 129matches a name specified in the profiling session, profiling will be enabled as 130part of the zygote specialization. The resulting profile contains all 131allocations done between that point in zygote specialization and the end of the 132profiling session. Some allocations done early in the specialization process are 133not accounted for. 134 135At the trace proto level, the resulting [ProfilePacket] will have the 136`from_startup` field set to true in the corresponding `ProcessHeapSamples` 137message. This is not surfaced in the converted pprof compatible proto. 138 139[ProfilePacket]: /docs/reference/trace-packet-proto.autogen#ProfilePacket 140[zygote]: https://developer.android.com/topic/performance/memory-overview#SharingRAM 141 142## Runtime profiling 143 144When a profiling session is started, all matching processes (by name or PID) 145are enumerated and profiling is enabled. The resulting profile will contain 146all allocations done between the beginning and the end of the profiling 147session. 148 149The resulting [ProfilePacket] will have `from_startup` set to false in the 150corresponding `ProcessHeapSamples` message. This does not get surfaced in the 151converted pprof compatible proto. 152 153## Concurrent profiling sessions 154 155If multiple sessions name the same target process (either by name or PID), 156only the first relevant session will profile the process. The other sessions 157will report that the process had already been profiled when converting to 158the pprof compatible proto. 159 160If you see this message but do not expect any other sessions, run 161 162```shell 163adb shell killall perfetto 164``` 165 166to stop any concurrent sessions that may be running. 167 168The resulting [ProfilePacket] will have `rejected_concurrent` set to true in 169otherwise empty corresponding `ProcessHeapSamples` message. This does not get 170surfaced in the converted pprof compatible proto. 171 172## {#heapprofd-targets} Target processes 173 174Depending on the build of Android that heapprofd is run on, some processes 175are not be eligible to be profiled. 176 177On _user_ (i.e. production, non-rootable) builds, only Java applications with 178either the profileable or the debuggable manifest flag set can be profiled. 179Profiling requests for non-profileable/debuggable processes will result in an 180empty profile. 181 182On userdebug builds, all processes except for a small blacklist of critical 183services can be profiled (to find the blacklist, look for 184`never_profile_heap` in [heapprofd.te]( 185https://cs.android.com/android/platform/superproject/+/master:system/sepolicy/private/heapprofd.te?q=never_profile_heap). 186This restriction can be lifted by disabling SELinux by running 187`adb shell su root setenforce 0` or by passing `--disable-selinux` to the 188`heap_profile` script. 189 190<center> 191 192| | userdebug setenforce 0 | userdebug | user | 193|-------------------------|:----------------------:|:---------:|:----:| 194| critical native service | Y | N | N | 195| native service | Y | Y | N | 196| app | Y | Y | N | 197| profileable app | Y | Y | Y | 198| debuggable app | Y | Y | Y | 199 200</center> 201 202To mark an app as profileable, put `<profileable android:shell="true"/>` into 203the `<application>` section of the app manifest. 204 205```xml 206<manifest ...> 207 <application> 208 <profileable android:shell="true"/> 209 ... 210 </application> 211</manifest> 212``` 213 214## DEDUPED frames 215 216If the name of a Java method includes `[DEDUPED]`, this means that multiple 217methods share the same code. ART only stores the name of a single one in its 218metadata, which is displayed here. This is not necessarily the one that was 219called. 220 221## Triggering heap snapshots on demand 222 223Heap snapshot are recorded into the trace either at regular time intervals, if 224using the `continuous_dump_config` field, or at the end of the session. 225 226You can also trigger a snapshot of all currently profiled processes by running 227`adb shell killall -USR1 heapprofd`. This can be useful in lab tests for 228recording the current memory usage of the target in a specific state. 229 230This dump will show up in addition to the dump at the end of the profile that is 231always produced. You can create multiple of these dumps, and they will be 232enumerated in the output directory. 233 234## Symbolization 235 236NOTE: Symbolization is currently only available on Linux 237 238### Set up llvm-symbolizer 239 240You only need to do this once. 241 242To use symbolization, your system must have llvm-symbolizer installed and 243accessible from `$PATH` as `llvm-symbolizer`. On Debian, you can install it 244using `sudo apt install llvm-9`. 245This will create `/usr/bin/llvm-symbolizer-9`. Symlink that to somewhere in 246your `$PATH` as `llvm-symbolizer`. 247 248For instance, `ln -s /usr/bin/llvm-symbolizer-9 ~/bin/llvm-symbolizer`, and 249add `~/bin` to your path (or run the commands below with `PATH=~/bin:$PATH` 250prefixed). 251 252### Symbolize your profile 253 254If the profiled binary or libraries do not have symbol names, you can 255symbolize profiles offline. Even if they do, you might want to symbolize in 256order to get inlined function and line number information. All tools 257(traceconv, trace_processor_shell, the heap_profile script) support specifying 258the `PERFETTO_BINARY_PATH` as an environment variable. 259 260``` 261PERFETTO_BINARY_PATH=somedir tools/heap_profile --name ${NAME} 262``` 263 264You can persist symbols for a trace by running 265`PERFETTO_BINARY_PATH=somedir tools/traceconv symbolize raw-trace > symbols`. 266You can then concatenate the symbols to the trace ( 267`cat raw-trace symbols > symbolized-trace`) and the symbols will part of 268`symbolized-trace`. The `tools/heap_profile` script will also generate this 269file in your output directory, if `PERFETTO_BINARY_PATH` is used. 270 271The symbol file is the first with matching Build ID in the following order: 272 2731. absolute path of library file relative to binary path. 2742. absolute path of library file relative to binary path, but with base.apk! 275 removed from filename. 2763. basename of library file relative to binary path. 2774. basename of library file relative to binary path, but with base.apk! 278 removed from filename. 2795. in the subdirectory .build-id: the first two hex digits of the build-id 280 as subdirectory, then the rest of the hex digits, with ".debug" appended. 281 See 282 https://fedoraproject.org/wiki/RolandMcGrath/BuildID#Find_files_by_build_ID 283 284For example, "/system/lib/base.apk!foo.so" with build id abcd1234, 285is looked for at: 286 2871. $PERFETTO_BINARY_PATH/system/lib/base.apk!foo.so 2882. $PERFETTO_BINARY_PATH/system/lib/foo.so 2893. $PERFETTO_BINARY_PATH/base.apk!foo.so 2904. $PERFETTO_BINARY_PATH/foo.so 2915. $PERFETTO_BINARY_PATH/.build-id/ab/cd1234.debug 292 293## Troubleshooting 294 295### Buffer overrun 296 297If the rate of allocations is too high for heapprofd to keep up, the profiling 298session will end early due to a buffer overrun. If the buffer overrun is 299caused by a transient spike in allocations, increasing the shared memory buffer 300size (passing `--shmem-size` to `tools/heap_profile`) can resolve the issue. 301Otherwise the sampling interval can be increased (at the expense of lower 302accuracy in the resulting profile) by passing `--interval=16000` or higher. 303 304### Profile is empty 305 306Check whether your target process is eligible to be profiled by consulting 307[Target processes](#target-processes) above. 308 309Also check the [Known Issues](#known-issues). 310 311### Implausible callstacks 312 313If you see a callstack that seems to impossible from looking at the code, make 314sure no [DEDUPED frames](#deduped-frames) are involved. 315 316Also, if your code is linked using _Identical Code Folding_ 317(ICF), i.e. passing `-Wl,--icf=...` to the linker, most trivial functions, often 318constructors and destructors, can be aliased to binary-equivalent operators 319of completely unrelated classes. 320 321### Symbolization: Could not find library 322 323When symbolizing a profile, you might come across messages like this: 324 325```bash 326Could not find /data/app/invalid.app-wFgo3GRaod02wSvPZQ==/lib/arm64/somelib.so 327(Build ID: 44b7138abd5957b8d0a56ce86216d478). 328``` 329 330Check whether your library (in this example somelib.so) exists in 331`PERFETTO_BINARY_PATH`. Then compare the Build ID to the one in your 332symbol file, which you can get by running 333`readelf -n /path/in/binary/path/somelib.so`. If it does not match, the 334symbolized file has a different version than the one on device, and cannot 335be used for symbolization. 336If it does, try moving somelib.so to the root of `PERFETTO_BINARY_PATH` and 337try again. 338 339### Only one frame shown 340If you only see a single frame for functions in a specific library, make sure 341that the library has unwind information. We need one of 342 343* `.gnu_debugdata` 344* `.eh_frame` (+ preferably `.eh_frame_hdr`) 345* `.debug_frame`. 346 347Frame-pointer unwinding is *not supported*. 348 349To check if an ELF file has any of those, run 350 351```console 352$ readelf -S file.so | grep "gnu_debugdata\|eh_frame\|debug_frame" 353 [12] .eh_frame_hdr PROGBITS 000000000000c2b0 0000c2b0 354 [13] .eh_frame PROGBITS 0000000000011000 00011000 355 [24] .gnu_debugdata PROGBITS 0000000000000000 000f7292 356``` 357 358If this does not show one or more of the sections, change your build system 359to not strip them. 360 361## Known Issues 362 363### Android 10 364 365* On ARM32, the bottom-most frame is always `ERROR 2`. This is harmless and 366 the callstacks are still complete. 367* x86 platforms are not supported. This includes the Android _Cuttlefish_ 368 emulator. 369* If heapprofd is run standalone (by running `heapprofd` in a root shell, rather 370 than through init), `/dev/socket/heapprofd` get assigned an incorrect SELinux 371 domain. You will not be able to profile any processes unless you disable 372 SELinux enforcement. 373 Run `restorecon /dev/socket/heapprofd` in a root shell to resolve. 374 375## Heapprofd vs malloc_info() vs RSS 376 377When using heapprofd and interpreting results, it is important to know the 378precise meaning of the different memory metrics that can be obtained from the 379operating system. 380 381**heapprofd** gives you the number of bytes the target program 382requested from the default C/C++ allocator. If you are profiling a Java app from 383startup, allocations that happen early in the application's initialization will 384not be visible to heapprofd. Native services that do not fork from the Zygote 385are not affected by this. 386 387**malloc\_info** is a libc function that gives you information about the 388allocator. This can be triggered on userdebug builds by using 389`am dumpheap -m <PID> /data/local/tmp/heap.txt`. This will in general be more 390than the memory seen by heapprofd, depending on the allocator not all memory 391is immediately freed. In particular, jemalloc retains some freed memory in 392thread caches. 393 394**Heap RSS** is the amount of memory requested from the operating system by the 395allocator. This is larger than the previous two numbers because memory can only 396be obtained in page size chunks, and fragmentation causes some of that memory to 397be wasted. This can be obtained by running `adb shell dumpsys meminfo <PID>` and 398looking at the "Private Dirty" column. 399RSS can also end up being smaller than the other two if the device kernel uses 400memory compression (ZRAM, enabled by default on recent versions of android) and 401the memory of the process get swapped out onto ZRAM. 402 403| | heapprofd | malloc\_info | RSS | 404|---------------------|:-----------------:|:------------:|:---:| 405| from native startup | x | x | x | 406| after zygote init | x | x | x | 407| before zygote init | | x | x | 408| thread caches | | x | x | 409| fragmentation | | | x | 410 411If you observe high RSS or malloc\_info metrics but heapprofd does not match, 412you might be hitting some patological fragmentation problem in the allocator. 413 414## Convert to pprof 415 416You can use [traceconv](/docs/quickstart/traceconv.md) to convert the heap dumps 417in a trace into the [pprof](https://github.com/google/pprof) format. These can 418then be viewed using the pprof CLI or a UI (e.g. Speedscope, or Google-internal 419pprof/). 420 421```bash 422tools/traceconv profile /tmp/profile 423``` 424 425This will create a directory in `/tmp/` containing the heap dumps. Run: 426 427```bash 428gzip /tmp/heap_profile-XXXXXX/*.pb 429``` 430 431to get gzipped protos, which tools handling pprof profile protos expect. 432 433## {#heapprofd-example-queries} Example SQL Queries 434 435We can get the callstacks that allocated using an SQL Query in the 436Trace Processor. For each frame, we get one row for the number of allocated 437bytes, where `count` and `size` is positive, and, if any of them were already 438freed, another line with negative `count` and `size`. The sum of those gets us 439the `space` view. 440 441```sql 442select a.callsite_id, a.ts, a.upid, f.name, f.rel_pc, m.build_id, m.name as mapping_name, 443 sum(a.size) as space_size, sum(a.count) as space_count 444 from heap_profile_allocation a join 445 stack_profile_callsite c ON (a.callsite_id = c.id) join 446 stack_profile_frame f ON (c.frame_id = f.id) join 447 stack_profile_mapping m ON (f.mapping = m.id) 448 group by 1, 2, 3, 4, 5, 6, 7 order by space_size desc; 449``` 450 451| callsite_id | ts | upid | name | rel_pc | build_id | mapping_name | space_size | space_count | 452|-------------|----|------|-------|-----------|------|--------|----------|------| 453|6660|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |106496|4| 454|192 |5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1| 455|1421|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1| 456|1537|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1| 457|8843|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26424 |1| 458|8618|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |24576 |4| 459|3750|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |12288 |1| 460|2820|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192 |2| 461|3788|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192 |2| 462 463We can see all the functions are "malloc" and "realloc", which is not terribly 464informative. Usually we are interested in the _cumulative_ bytes allocated in 465a function (otherwise, we will always only see malloc / realloc). Chasing the 466parent_id of a callsite (not shown in this table) recursively is very hard in 467SQL. 468 469There is an **experimental** table that surfaces this information. The **API is 470subject to change**. 471 472```sql 473select name, map_name, cumulative_size 474 from experimental_flamegraph(8300973884377,1,'native') 475 order by abs(cumulative_size) desc; 476``` 477 478| name | map_name | cumulative_size | 479|------|----------|----------------| 480|__start_thread|/apex/com.android.runtime/lib64/bionic/libc.so|392608| 481|_ZL15__pthread_startPv|/apex/com.android.runtime/lib64/bionic/libc.so|392608| 482|_ZN13thread_data_t10trampolineEPKS|/system/lib64/libutils.so|199496| 483|_ZN7android14AndroidRuntime15javaThreadShellEPv|/system/lib64/libandroid_runtime.so|199496| 484|_ZN7android6Thread11_threadLoopEPv|/system/lib64/libutils.so|199496| 485|_ZN3art6Thread14CreateCallbackEPv|/apex/com.android.art/lib64/libart.so|193112| 486|_ZN3art35InvokeVirtualOrInterface...|/apex/com.android.art/lib64/libart.so|193112| 487|_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc|/apex/com.android.art/lib64/libart.so|193112| 488|art_quick_invoke_stub|/apex/com.android.art/lib64/libart.so|193112| 489