• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Native heap profiler
2
3NOTE: **heapprofd requires Android 10 or higher**
4
5Heapprofd is a tool that tracks native heap allocations & deallocations of an
6Android process within a given time period. The resulting profile can be used to
7attribute memory usage to particular call-stacks, supporting a mix of both
8native and java code. The tool can be used by Android platform and app
9developers to investigate memory issues.
10
11On debug Android builds, you can profile all apps and most system services.
12On "user" builds, you can only use it on apps with the debuggable or
13profileable manifest flag.
14
15## Quickstart
16
17See the [Memory Guide](/docs/case-studies/memory.md#heapprofd) for getting
18started with heapprofd.
19
20## UI
21
22Dumps from heapprofd are shown as flamegraphs in the UI after clicking on the
23diamond. Each diamond corresponds to a snapshot of the allocations and
24callstacks collected at that point in time.
25
26![heapprofd snapshots in the UI tracks](/docs/images/profile-diamond.png)
27
28![heapprofd flamegraph](/docs/images/native-flamegraph.png)
29
30## SQL
31
32Information about callstacks is written to the following tables:
33
34* [`stack_profile_mapping`](/docs/analysis/sql-tables.autogen#stack_profile_mapping)
35* [`stack_profile_frame`](/docs/analysis/sql-tables.autogen#stack_profile_frame)
36* [`stack_profile_callsite`](/docs/analysis/sql-tables.autogen#stack_profile_callsite)
37
38The allocations themselves are written to
39[`heap_profile_allocation`](/docs/analysis/sql-tables.autogen#heap_profile_allocation).
40
41Offline symbolization data is stored in
42[`stack_profile_symbol`](/docs/analysis/sql-tables.autogen#stack_profile_symbol).
43
44See [Example Queries](#heapprofd-example-queries) for example SQL queries.
45
46## Recording
47
48Heapprofd can be configured and started in three ways.
49
50#### Manual configuration
51
52This requires manually setting the
53[HeapprofdConfig](/docs/reference/trace-config-proto.autogen#HeapprofdConfig)
54section of the trace config. The only benefit of doing so is that in this way
55heap profiling can be enabled alongside any other tracing data sources.
56
57#### Using the tools/heap_profile script (recommended)
58
59On Linux / MacOS, use the `tools/heap_profile` script. If you are having trouble
60make sure you are using the
61[latest version](
62https://raw.githubusercontent.com/google/perfetto/master/tools/heap_profile).
63
64You can target processes either by name (`-n com.example.myapp`) or by PID
65(`-p 1234`). In the first case, the heap profile will be initiated on both on
66already-running processes that match the package name and new processes launched
67after the profiling session is started.
68For the full arguments list see the
69[heap_profile cmdline reference page](/docs/reference/heap_profile-cli).
70
71#### Using the Recording page of Perfetto UI
72
73You can also use the [Perfetto UI](https://ui.perfetto.dev/#!/record?p=memory)
74to record heapprofd profiles. Tick "Heap profiling" in the trace configuration,
75enter the processes you want to target, click "Add Device" to pair your phone,
76and record profiles straight from your browser. This is also possible on
77Windows.
78
79## Viewing the data
80
81The resulting profile proto contains four views on the data
82
83* **space**: how many bytes were allocated but not freed at this callstack the
84  moment the dump was created.
85* **alloc\_space**: how many bytes were allocated (including ones freed at the
86  moment of the dump) at this callstack
87* **objects**: how many allocations without matching frees were done at this
88  callstack.
89* **alloc\_objects**: how many allocations (including ones with matching frees)
90  were done at this callstack.
91
92_(Googlers: You can also open the gzipped protos using http://pprof/)_
93
94TIP: you might want to put `libart.so` as a "Hide regex" when profiling apps.
95
96You can use the [Perfetto UI](https://ui.perfetto.dev) to visualize heap dumps.
97Upload the `raw-trace` file in your output directory. You will see all heap
98dumps as diamonds on the timeline, click any of them to get a flamegraph.
99
100Alternatively [Speedscope](https://speedscope.app) can be used to visualize
101the gzipped protos, but will only show the space view.
102
103TIP: Click Left Heavy on the top left for a good visualization.
104
105## Sampling interval
106
107Heapprofd samples heap allocations by hooking calls to malloc/free and C++'s
108operator new/delete. Given a sampling interval of n bytes, one allocation is
109sampled, on average, every n bytes allocated. This allows to reduce the
110performance impact on the target process. The default sampling rate
111is 4096 bytes.
112
113The easiest way to reason about this is to imagine the memory allocations as a
114stream of one byte allocations. From this stream, every byte has a 1/n
115probability of being selected as a sample, and the corresponding callstack
116gets attributed the complete n bytes. For more accuracy, allocations larger than
117the sampling interval bypass the sampling logic and are recorded with their true
118size.
119
120## Startup profiling
121
122When specifying a target process name (as opposite to the PID), new processes
123matching that name are profiled from their startup. The resulting profile will
124contain all allocations done between the start of the process and the end
125of the profiling session.
126
127On Android, Java apps are usually not exec()-ed from scratch, but fork()-ed from
128the [zygote], which then specializes into the desired app. If the app's name
129matches a name specified in the profiling session, profiling will be enabled as
130part of the zygote specialization. The resulting profile contains all
131allocations done between that point in zygote specialization and the end of the
132profiling session. Some allocations done early in the specialization process are
133not accounted for.
134
135At the trace proto level, the resulting [ProfilePacket] will have the
136`from_startup` field set to true in the corresponding `ProcessHeapSamples`
137message. This is not surfaced in the converted pprof compatible proto.
138
139[ProfilePacket]: /docs/reference/trace-packet-proto.autogen#ProfilePacket
140[zygote]: https://developer.android.com/topic/performance/memory-overview#SharingRAM
141
142## Runtime profiling
143
144When a profiling session is started, all matching processes (by name or PID)
145are enumerated and profiling is enabled. The resulting profile will contain
146all allocations done between the beginning and the end of the profiling
147session.
148
149The resulting [ProfilePacket] will have `from_startup` set to false in the
150corresponding `ProcessHeapSamples` message. This does not get surfaced in the
151converted pprof compatible proto.
152
153## Concurrent profiling sessions
154
155If multiple sessions name the same target process (either by name or PID),
156only the first relevant session will profile the process. The other sessions
157will report that the process had already been profiled when converting to
158the pprof compatible proto.
159
160If you see this message but do not expect any other sessions, run
161
162```shell
163adb shell killall perfetto
164```
165
166to stop any concurrent sessions that may be running.
167
168The resulting [ProfilePacket] will have `rejected_concurrent` set  to true in
169otherwise empty corresponding `ProcessHeapSamples` message. This does not get
170surfaced in the converted pprof compatible proto.
171
172## {#heapprofd-targets} Target processes
173
174Depending on the build of Android that heapprofd is run on, some processes
175are not be eligible to be profiled.
176
177On _user_ (i.e. production, non-rootable) builds, only Java applications with
178either the profileable or the debuggable manifest flag set can be profiled.
179Profiling requests for non-profileable/debuggable processes will result in an
180empty profile.
181
182On userdebug builds, all processes except for a small blacklist of critical
183services can be profiled (to find the blacklist, look for
184`never_profile_heap` in [heapprofd.te](
185https://cs.android.com/android/platform/superproject/+/master:system/sepolicy/private/heapprofd.te?q=never_profile_heap).
186This restriction can be lifted by disabling SELinux by running
187`adb shell su root setenforce 0` or by passing `--disable-selinux` to the
188`heap_profile` script.
189
190<center>
191
192|                         | userdebug setenforce 0 | userdebug | user |
193|-------------------------|:----------------------:|:---------:|:----:|
194| critical native service |            Y           |     N     |  N   |
195| native service          |            Y           |     Y     |  N   |
196| app                     |            Y           |     Y     |  N   |
197| profileable app         |            Y           |     Y     |  Y   |
198| debuggable app          |            Y           |     Y     |  Y   |
199
200</center>
201
202To mark an app as profileable, put `<profileable android:shell="true"/>` into
203the `<application>` section of the app manifest.
204
205```xml
206<manifest ...>
207    <application>
208        <profileable android:shell="true"/>
209        ...
210    </application>
211</manifest>
212```
213
214## DEDUPED frames
215
216If the name of a Java method includes `[DEDUPED]`, this means that multiple
217methods share the same code. ART only stores the name of a single one in its
218metadata, which is displayed here. This is not necessarily the one that was
219called.
220
221## Triggering heap snapshots on demand
222
223Heap snapshot are recorded into the trace either at regular time intervals, if
224using the `continuous_dump_config` field, or at the end of the session.
225
226You can also trigger a snapshot of all currently profiled processes by running
227`adb shell killall -USR1 heapprofd`. This can be useful in lab tests for
228recording the current memory usage of the target in a specific state.
229
230This dump will show up in addition to the dump at the end of the profile that is
231always produced. You can create multiple of these dumps, and they will be
232enumerated in the output directory.
233
234## Symbolization
235
236NOTE: Symbolization is currently only available on Linux
237
238### Set up llvm-symbolizer
239
240You only need to do this once.
241
242To use symbolization, your system must have llvm-symbolizer installed and
243accessible from `$PATH` as `llvm-symbolizer`. On Debian, you can install it
244using `sudo apt install llvm-9`.
245This will create `/usr/bin/llvm-symbolizer-9`. Symlink that to somewhere in
246your `$PATH` as `llvm-symbolizer`.
247
248For instance, `ln -s /usr/bin/llvm-symbolizer-9 ~/bin/llvm-symbolizer`, and
249add `~/bin` to your path (or run the commands below with `PATH=~/bin:$PATH`
250prefixed).
251
252### Symbolize your profile
253
254If the profiled binary or libraries do not have symbol names, you can
255symbolize profiles offline. Even if they do, you might want to symbolize in
256order to get inlined function and line number information. All tools
257(traceconv, trace_processor_shell, the heap_profile script) support specifying
258the `PERFETTO_BINARY_PATH` as an environment variable.
259
260```
261PERFETTO_BINARY_PATH=somedir tools/heap_profile --name ${NAME}
262```
263
264You can persist symbols for a trace by running
265`PERFETTO_BINARY_PATH=somedir tools/traceconv symbolize raw-trace > symbols`.
266You can then concatenate the symbols to the trace (
267`cat raw-trace symbols > symbolized-trace`) and the symbols will part of
268`symbolized-trace`. The `tools/heap_profile` script will also generate this
269file in your output directory, if `PERFETTO_BINARY_PATH` is used.
270
271The symbol file is the first with matching Build ID in the following order:
272
2731. absolute path of library file relative to binary path.
2742. absolute path of library file relative to binary path, but with base.apk!
275    removed from filename.
2763. basename of library file relative to binary path.
2774. basename of library file relative to binary path, but with base.apk!
278    removed from filename.
2795. in the subdirectory .build-id: the first two hex digits of the build-id
280    as subdirectory, then the rest of the hex digits, with ".debug" appended.
281    See
282    https://fedoraproject.org/wiki/RolandMcGrath/BuildID#Find_files_by_build_ID
283
284For example, "/system/lib/base.apk!foo.so" with build id abcd1234,
285is looked for at:
286
2871. $PERFETTO_BINARY_PATH/system/lib/base.apk!foo.so
2882. $PERFETTO_BINARY_PATH/system/lib/foo.so
2893. $PERFETTO_BINARY_PATH/base.apk!foo.so
2904. $PERFETTO_BINARY_PATH/foo.so
2915. $PERFETTO_BINARY_PATH/.build-id/ab/cd1234.debug
292
293## Troubleshooting
294
295### Buffer overrun
296
297If the rate of allocations is too high for heapprofd to keep up, the profiling
298session will end early due to a buffer overrun. If the buffer overrun is
299caused by a transient spike in allocations, increasing the shared memory buffer
300size (passing `--shmem-size` to `tools/heap_profile`) can resolve the issue.
301Otherwise the sampling interval can be increased (at the expense of lower
302accuracy in the resulting profile) by passing `--interval=16000` or higher.
303
304### Profile is empty
305
306Check whether your target process is eligible to be profiled by consulting
307[Target processes](#target-processes) above.
308
309Also check the [Known Issues](#known-issues).
310
311### Implausible callstacks
312
313If you see a callstack that seems to impossible from looking at the code, make
314sure no [DEDUPED frames](#deduped-frames) are involved.
315
316Also, if your code is linked using _Identical Code Folding_
317(ICF), i.e. passing `-Wl,--icf=...` to the linker, most trivial functions, often
318constructors and destructors, can be aliased to binary-equivalent operators
319of completely unrelated classes.
320
321### Symbolization: Could not find library
322
323When symbolizing a profile, you might come across messages like this:
324
325```bash
326Could not find /data/app/invalid.app-wFgo3GRaod02wSvPZQ==/lib/arm64/somelib.so
327(Build ID: 44b7138abd5957b8d0a56ce86216d478).
328```
329
330Check whether your library (in this example somelib.so) exists in
331`PERFETTO_BINARY_PATH`. Then compare the Build ID to the one in your
332symbol file, which you can get by running
333`readelf -n /path/in/binary/path/somelib.so`. If it does not match, the
334symbolized file has a different version than the one on device, and cannot
335be used for symbolization.
336If it does, try moving somelib.so to the root of `PERFETTO_BINARY_PATH` and
337try again.
338
339### Only one frame shown
340If you only see a single frame for functions in a specific library, make sure
341that the library has unwind information. We need one of
342
343* `.gnu_debugdata`
344* `.eh_frame` (+ preferably `.eh_frame_hdr`)
345* `.debug_frame`.
346
347Frame-pointer unwinding is *not supported*.
348
349To check if an ELF file has any of those, run
350
351```console
352$ readelf -S file.so | grep "gnu_debugdata\|eh_frame\|debug_frame"
353  [12] .eh_frame_hdr     PROGBITS         000000000000c2b0  0000c2b0
354  [13] .eh_frame         PROGBITS         0000000000011000  00011000
355  [24] .gnu_debugdata    PROGBITS         0000000000000000  000f7292
356```
357
358If this does not show one or more of the sections, change your build system
359to not strip them.
360
361## Known Issues
362
363### Android 10
364
365* On ARM32, the bottom-most frame is always `ERROR 2`. This is harmless and
366  the callstacks are still complete.
367* x86 platforms are not supported. This includes the Android _Cuttlefish_
368  emulator.
369* If heapprofd is run standalone (by running `heapprofd` in a root shell, rather
370  than through init), `/dev/socket/heapprofd` get assigned an incorrect SELinux
371  domain. You will not be able to profile any processes unless you disable
372  SELinux enforcement.
373  Run `restorecon /dev/socket/heapprofd` in a root shell to resolve.
374
375## Heapprofd vs malloc_info() vs RSS
376
377When using heapprofd and interpreting results, it is important to know the
378precise meaning of the different memory metrics that can be obtained from the
379operating system.
380
381**heapprofd** gives you the number of bytes the target program
382requested from the default C/C++ allocator. If you are profiling a Java app from
383startup, allocations that happen early in the application's initialization will
384not be visible to heapprofd. Native services that do not fork from the Zygote
385are not affected by this.
386
387**malloc\_info** is a libc function that gives you information about the
388allocator. This can be triggered on userdebug builds by using
389`am dumpheap -m <PID> /data/local/tmp/heap.txt`. This will in general be more
390than the memory seen by heapprofd, depending on the allocator not all memory
391is immediately freed. In particular, jemalloc retains some freed memory in
392thread caches.
393
394**Heap RSS** is the amount of memory requested from the operating system by the
395allocator. This is larger than the previous two numbers because memory can only
396be obtained in page size chunks, and fragmentation causes some of that memory to
397be wasted. This can be obtained by running `adb shell dumpsys meminfo <PID>` and
398looking at the "Private Dirty" column.
399RSS can also end up being smaller than the other two if the device kernel uses
400memory compression (ZRAM, enabled by default on recent versions of android) and
401the memory of the process get swapped out onto ZRAM.
402
403|                     | heapprofd         | malloc\_info | RSS |
404|---------------------|:-----------------:|:------------:|:---:|
405| from native startup |          x        |      x       |  x  |
406| after zygote init   |          x        |      x       |  x  |
407| before zygote init  |                   |      x       |  x  |
408| thread caches       |                   |      x       |  x  |
409| fragmentation       |                   |              |  x  |
410
411If you observe high RSS or malloc\_info metrics but heapprofd does not match,
412you might be hitting some patological fragmentation problem in the allocator.
413
414## Convert to pprof
415
416You can use [traceconv](/docs/quickstart/traceconv.md) to convert the heap dumps
417in a trace into the [pprof](https://github.com/google/pprof) format. These can
418then be viewed using the pprof CLI or a UI (e.g. Speedscope, or Google-internal
419pprof/).
420
421```bash
422tools/traceconv profile /tmp/profile
423```
424
425This will create a directory in `/tmp/` containing the heap dumps. Run:
426
427```bash
428gzip /tmp/heap_profile-XXXXXX/*.pb
429```
430
431to get gzipped protos, which tools handling pprof profile protos expect.
432
433## {#heapprofd-example-queries} Example SQL Queries
434
435We can get the callstacks that allocated using an SQL Query in the
436Trace Processor. For each frame, we get one row for the number of allocated
437bytes, where `count` and `size` is positive, and, if any of them were already
438freed, another line with negative `count` and `size`. The sum of those gets us
439the `space` view.
440
441```sql
442select a.callsite_id, a.ts, a.upid, f.name, f.rel_pc, m.build_id, m.name as mapping_name,
443        sum(a.size) as space_size, sum(a.count) as space_count
444      from heap_profile_allocation a join
445           stack_profile_callsite c ON (a.callsite_id = c.id) join
446           stack_profile_frame f ON (c.frame_id = f.id) join
447           stack_profile_mapping m ON (f.mapping = m.id)
448      group by 1, 2, 3, 4, 5, 6, 7 order by space_size desc;
449```
450
451| callsite_id | ts | upid | name | rel_pc | build_id | mapping_name | space_size | space_count |
452|-------------|----|------|-------|-----------|------|--------|----------|------|
453|6660|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |106496|4|
454|192 |5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
455|1421|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
456|1537|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
457|8843|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26424 |1|
458|8618|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |24576 |4|
459|3750|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |12288 |1|
460|2820|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192  |2|
461|3788|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192  |2|
462
463We can see all the functions are "malloc" and "realloc", which is not terribly
464informative. Usually we are interested in the _cumulative_ bytes allocated in
465a function (otherwise, we will always only see malloc / realloc). Chasing the
466parent_id of a callsite (not shown in this table) recursively is very hard in
467SQL.
468
469There is an **experimental** table that surfaces this information. The **API is
470subject to change**.
471
472```sql
473select name, map_name, cumulative_size
474       from experimental_flamegraph(8300973884377,1,'native')
475       order by abs(cumulative_size) desc;
476```
477
478| name | map_name | cumulative_size |
479|------|----------|----------------|
480|__start_thread|/apex/com.android.runtime/lib64/bionic/libc.so|392608|
481|_ZL15__pthread_startPv|/apex/com.android.runtime/lib64/bionic/libc.so|392608|
482|_ZN13thread_data_t10trampolineEPKS|/system/lib64/libutils.so|199496|
483|_ZN7android14AndroidRuntime15javaThreadShellEPv|/system/lib64/libandroid_runtime.so|199496|
484|_ZN7android6Thread11_threadLoopEPv|/system/lib64/libutils.so|199496|
485|_ZN3art6Thread14CreateCallbackEPv|/apex/com.android.art/lib64/libart.so|193112|
486|_ZN3art35InvokeVirtualOrInterface...|/apex/com.android.art/lib64/libart.so|193112|
487|_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc|/apex/com.android.art/lib64/libart.so|193112|
488|art_quick_invoke_stub|/apex/com.android.art/lib64/libart.so|193112|
489