1# Memory counters and events 2 3Perfetto allows to gather a number of memory events and counters on 4Android and Linux. These events come from kernel interfaces, both ftrace and 5/proc interfaces, and are of two types: polled counters and events pushed by 6the kernel in the ftrace buffer. 7 8## Per-process polled counters 9 10The process stats data source allows to poll `/proc/<pid>/status` and 11`/proc/<pid>/oom_score_adj` at user-defined intervals. 12 13See [`man 5 proc`][man-proc] for their semantic. 14 15### UI 16 17![](/docs/images/proc_stat.png "UI showing trace data collected by process stats pollers") 18 19### SQL 20 21```sql 22select c.ts, c.value, t.name as counter_name, p.name as proc_name, p.pid 23from counter as c left join process_counter_track as t on c.track_id = t.id 24left join process as p using (upid) 25where t.name like 'mem.%' 26``` 27ts | counter_name | value_kb | proc_name | pid 28---|--------------|----------|-----------|---- 29261187015027350 | mem.virt | 1326464 | com.android.vending | 28815 30261187015027350 | mem.rss | 85592 | com.android.vending | 28815 31261187015027350 | mem.rss.anon | 36948 | com.android.vending | 28815 32261187015027350 | mem.rss.file | 46560 | com.android.vending | 28815 33261187015027350 | mem.swap | 6908 | com.android.vending | 28815 34261187015027350 | mem.rss.watermark | 102856 | com.android.vending | 28815 35261187090251420 | mem.virt | 1326464 | com.android.vending | 28815 36 37### TraceConfig 38 39To collect process stat counters every X ms set `proc_stats_poll_ms = X` in 40your process stats config. X must be greater than 100ms to avoid excessive CPU 41usage. Details about the specific counters being collected can be found in the 42[ProcessStats reference](/docs/reference/trace-packet-proto.autogen#ProcessStats). 43 44```protobuf 45data_sources: { 46 config { 47 name: "linux.process_stats" 48 process_stats_config { 49 scan_all_processes_on_start: true 50 proc_stats_poll_ms: 1000 51 } 52 } 53} 54``` 55 56## Per-process memory events (ftrace) 57 58### rss_stat 59 60Recent versions of the Linux kernel allow to report ftrace events when the 61Resident Set Size (RSS) mm counters change. This is the same counter available 62in `/proc/pid/status` as `VmRSS`. The main advantage of this event is that by 63being an event-driven push event it allows to detect very short memory usage 64bursts that would be otherwise undetectable by using /proc counters. 65 66Memory usage peaks of hundreds of MB can have dramatically negative impact on 67Android, even if they last only few ms, as they can cause mass low memory kills 68to reclaim memory. 69 70The kernel feature that supports this has been introduced in the Linux Kernel 71in [b3d1411b6] and later improved by [e4dcad20]. They are available in upstream 72since Linux v5.5-rc1. This patch has been backported in several Google Pixel 73kernels running Android 10 (Q). 74 75[b3d1411b6]: https://github.com/torvalds/linux/commit/b3d1411b6726ea6930222f8f12587d89762477c6 76[e4dcad20]: https://github.com/torvalds/linux/commit/e4dcad204d3a281be6f8573e0a82648a4ad84e69 77 78### mm_event 79 80`mm_event` is an ftrace event that captures statistics about key memory events 81(a subset of the ones exposed by `/proc/vmstat`). Unlike RSS-stat counter 82updates, mm events are extremely high volume and tracing them individually would 83be unfeasible. `mm_event` instead reports only periodic histograms in the trace, 84reducing sensibly the overhead. 85 86`mm_event` is available only on some Google Pixel kernels running Android 10 (Q) 87and beyond. 88 89When `mm_event` is enabled, the following mm event types are recorded: 90 91* mem.mm.min_flt: Minor page faults 92* mem.mm.maj_flt: Major page faults 93* mem.mm.swp_flt: Page faults served by swapcache 94* mem.mm.read_io: Read page faults backed by I/O 95* mem.mm..compaction: Memory compaction events 96* mem.mm.reclaim: Memory reclaim events 97 98For each event type, the event records: 99 100* count: how many times the event happened since the previous event. 101* min_lat: the smallest latency (the duration of the mm event) recorded since 102 the previous event. 103* max_lat: the highest latency recorded since the previous event. 104 105### UI 106 107![rss_stat and mm_event](/docs/images/rss_stat_and_mm_event.png) 108 109### SQL 110 111At the SQL level, these events are imported and exposed in the same way as 112the corresponding polled events. This allows to collect both types of events 113(pushed and polled) and treat them uniformly in queries and scripts. 114 115```sql 116select c.ts, c.value, t.name as counter_name, p.name as proc_name, p.pid 117from counter as c left join process_counter_track as t on c.track_id = t.id 118left join process as p using (upid) 119where t.name like 'mem.%' 120``` 121 122ts | value | counter_name | proc_name | pid 123---|-------|--------------|-----------|---- 124777227867975055 | 18358272 | mem.rss.anon | com.google.android.apps.safetyhub | 31386 125777227865995315 | 5 | mem.mm.min_flt.count | com.google.android.apps.safetyhub | 31386 126777227865995315 | 8 | mem.mm.min_flt.max_lat | com.google.android.apps.safetyhub | 31386 127777227865995315 | 4 | mem.mm.min_flt.avg_lat | com.google.android.apps.safetyhub | 31386 128777227865998023 | 3 | mem.mm.swp_flt.count | com.google.android.apps.safetyhub | 31386 129 130### TraceConfig 131 132```protobuf 133data_sources: { 134 config { 135 name: "linux.ftrace" 136 ftrace_config { 137 ftrace_events: "kmem/rss_stat" 138 ftrace_events: "mm_event/mm_event_record" 139 } 140 } 141} 142 143# This is for getting Thread<>Process associations and full process names. 144data_sources: { 145 config { 146 name: "linux.process_stats" 147 } 148} 149``` 150 151## System-wide polled counters 152 153This data source allows periodic polling of system data from: 154 155- `/proc/stat` 156- `/proc/vmstat` 157- `/proc/meminfo` 158 159See [`man 5 proc`][man-proc] for their semantic. 160 161### UI 162 163![System Memory Counters](/docs/images/sys_stat_counters.png 164"Example of system memory counters in the UI") 165 166The polling period and specific counters to include in the trace can be set in the trace config. 167 168### SQL 169 170```sql 171select c.ts, t.name, c.value / 1024 as value_kb from counters as c left join counter_track as t on c.track_id = t.id 172``` 173 174ts | name | value_kb 175---|------|--------- 176775177736769834 | MemAvailable | 1708956 177775177736769834 | Buffers | 6208 178775177736769834 | Cached | 1352960 179775177736769834 | SwapCached | 8232 180775177736769834 | Active | 1021108 181775177736769834 | Inactive(file) | 351496 182 183### TraceConfig 184 185The set of supported counters is available in the 186[TraceConfig reference](/docs/reference/trace-config-proto.autogen#SysStatsConfig) 187 188```protobuf 189data_sources: { 190 config { 191 name: "linux.sys_stats" 192 sys_stats_config { 193 meminfo_period_ms: 1000 194 meminfo_counters: MEMINFO_MEM_TOTAL 195 meminfo_counters: MEMINFO_MEM_FREE 196 meminfo_counters: MEMINFO_MEM_AVAILABLE 197 198 vmstat_period_ms: 1000 199 vmstat_counters: VMSTAT_NR_FREE_PAGES 200 vmstat_counters: VMSTAT_NR_ALLOC_BATCH 201 vmstat_counters: VMSTAT_NR_INACTIVE_ANON 202 vmstat_counters: VMSTAT_NR_ACTIVE_ANON 203 204 stat_period_ms: 2500 205 stat_counters: STAT_CPU_TIMES 206 stat_counters: STAT_FORK_COUNT 207 } 208 } 209} 210``` 211 212 213 214## Low-memory Kills (LMK) 215 216#### Background 217 218The Android framework kills apps and services, especially background ones, to 219make room for newly opened apps when memory is needed. These are known as low 220memory kills (LMK). 221 222Note LMKs are not always the symptom of a performance problem. The rule of thumb 223is that the severity (as in: user perceived impact) is proportional to the state 224of the app being killed. The app state can be derived in a trace from the OOM 225adjustment score. 226 227A LMK of a foreground app or service is typically a big concern. This happens 228when the app that the user was using disappeared under their fingers, or their 229favorite music player service suddenly stopped playing music. 230 231A LMK of a cached app or service, instead, is frequently business-as-usual and 232in most cases won't be noticed by the end user until they try to go back to 233the app, which will then cold-start. 234 235The situation in between these extremes is more nuanced. LMKs of cached 236apps/service can be still problematic if it happens in storms (i.e. observing 237that most processes get LMK-ed in a short time frame) and are often the symptom 238of some component of the system causing memory spikes. 239 240### lowmemorykiller vs lmkd 241 242#### In-kernel lowmemorykiller driver 243In Android, LMK used to be handled by an ad-hoc kernel-driver, 244Linux's [drivers/staging/android/lowmemorykiller.c](https://github.com/torvalds/linux/blob/v3.8/drivers/staging/android/lowmemorykiller.c). 245This driver uses to emit the ftrace event `lowmemorykiller/lowmemory_kill` 246in the trace. 247 248#### Userspace lmkd 249 250Android 9 introduced a userspace native daemon that took over the LMK 251responsibility: `lmkd`. Not all devices running Android 9 will 252necessarily use `lmkd` as the ultimate choice of in-kernel vs userspace is 253up to the phone manufacturer, their kernel version and kernel config. 254 255On Google Pixel phones, `lmkd`-side killing is used since Pixel 2 running 256Android 9. 257 258See https://source.android.com/devices/tech/perf/lmkd for details. 259 260`lmkd` emits a userspace atrace counter event called `kill_one_process`. 261 262#### Android LMK vs Linux oomkiller 263 264LMKs on Android, whether the old in-kernel `lowmemkiller` or the newer `lmkd`, 265use a completely different mechanism than the standard 266[Linux kernel's OOM Killer](https://linux-mm.org/OOM_Killer). 267Perfetto at the moment supports only Android LMK events (Both in-kernel and 268user-space) and does not support tracing of Linux kernel OOM Killer events. 269Linux OOMKiller events are still theoretically possible on Android but extremely 270unlikely to happen. If they happen, they are more likely the symptom of a 271mis-configured BSP. 272 273### UI 274 275Newer userspace LMKs are available in the UI under the `lmkd` track 276in the form of a counter. The counter value is the PID of the killed process 277(in the example below, PID=27985). 278 279![Userspace lmkd](/docs/images/lmk_lmkd.png "Example of a LMK caused by lmkd") 280 281TODO: we are working on a better UI support for LMKs. 282 283### SQL 284 285Both newer lmkd and legacy kernel-driven lowmemorykiler events are normalized 286at import time and available under the `mem.lmk` key in the `instants` table. 287 288```sql 289select ts, process.name, process.pid from instants left join process on instants.ref = process.upid where instants.name = 'mem.lmk' 290``` 291 292| ts | name | pid | 293|----|------|-----| 294| 442206415875043 | roid.apps.turbo | 27324 | 295| 442206446142234 | android.process.acore | 27683 | 296| 442206462090204 | com.google.process.gapps | 28198 | 297 298### TraceConfig 299 300To enable tracing of low memory kills add the following options to trace config: 301 302```protobuf 303data_sources: { 304 config { 305 name: "linux.ftrace" 306 ftrace_config { 307 # For old in-kernel events. 308 ftrace_events: "lowmemorykiller/lowmemory_kill" 309 310 # For new userspace lmkds. 311 atrace_apps: "lmkd" 312 313 # This is not strictly required but is useful to know the state 314 # of the process (FG, cached, ...) before it got killed. 315 ftrace_events: "oom/oom_score_adj_update" 316 } 317 } 318} 319``` 320 321## {#oom-adj} App states and OOM adjustment score 322 323The Android app state can be inferred in a trace from the process 324`oom_score_adj`. The mapping is not 1:1, there are more states than 325oom_score_adj value groups and the `oom_score_adj` range for cached processes 326spans from 900 to 1000. 327 328The mapping can be inferred from the 329[ActivityManager's ProcessList sources](https://cs.android.com/android/platform/superproject/+/android10-release:frameworks/base/services/core/java/com/android/server/am/ProcessList.java;l=126) 330 331```java 332// This is a process only hosting activities that are not visible, 333// so it can be killed without any disruption. 334static final int CACHED_APP_MAX_ADJ = 999; 335static final int CACHED_APP_MIN_ADJ = 900; 336 337// This is the oom_adj level that we allow to die first. This cannot be equal to 338// CACHED_APP_MAX_ADJ unless processes are actively being assigned an oom_score_adj of 339// CACHED_APP_MAX_ADJ. 340static final int CACHED_APP_LMK_FIRST_ADJ = 950; 341 342// The B list of SERVICE_ADJ -- these are the old and decrepit 343// services that aren't as shiny and interesting as the ones in the A list. 344static final int SERVICE_B_ADJ = 800; 345 346// This is the process of the previous application that the user was in. 347// This process is kept above other things, because it is very common to 348// switch back to the previous app. This is important both for recent 349// task switch (toggling between the two top recent apps) as well as normal 350// UI flow such as clicking on a URI in the e-mail app to view in the browser, 351// and then pressing back to return to e-mail. 352static final int PREVIOUS_APP_ADJ = 700; 353 354// This is a process holding the home application -- we want to try 355// avoiding killing it, even if it would normally be in the background, 356// because the user interacts with it so much. 357static final int HOME_APP_ADJ = 600; 358 359// This is a process holding an application service -- killing it will not 360// have much of an impact as far as the user is concerned. 361static final int SERVICE_ADJ = 500; 362 363// This is a process with a heavy-weight application. It is in the 364// background, but we want to try to avoid killing it. Value set in 365// system/rootdir/init.rc on startup. 366static final int HEAVY_WEIGHT_APP_ADJ = 400; 367 368// This is a process currently hosting a backup operation. Killing it 369// is not entirely fatal but is generally a bad idea. 370static final int BACKUP_APP_ADJ = 300; 371 372// This is a process bound by the system (or other app) that's more important than services but 373// not so perceptible that it affects the user immediately if killed. 374static final int PERCEPTIBLE_LOW_APP_ADJ = 250; 375 376// This is a process only hosting components that are perceptible to the 377// user, and we really want to avoid killing them, but they are not 378// immediately visible. An example is background music playback. 379static final int PERCEPTIBLE_APP_ADJ = 200; 380 381// This is a process only hosting activities that are visible to the 382// user, so we'd prefer they don't disappear. 383static final int VISIBLE_APP_ADJ = 100; 384 385// This is a process that was recently TOP and moved to FGS. Continue to treat it almost 386// like a foreground app for a while. 387// @see TOP_TO_FGS_GRACE_PERIOD 388static final int PERCEPTIBLE_RECENT_FOREGROUND_APP_ADJ = 50; 389 390// This is the process running the current foreground app. We'd really 391// rather not kill it! 392static final int FOREGROUND_APP_ADJ = 0; 393 394// This is a process that the system or a persistent process has bound to, 395// and indicated it is important. 396static final int PERSISTENT_SERVICE_ADJ = -700; 397 398// This is a system persistent process, such as telephony. Definitely 399// don't want to kill it, but doing so is not completely fatal. 400static final int PERSISTENT_PROC_ADJ = -800; 401 402// The system process runs at the default adjustment. 403static final int SYSTEM_ADJ = -900; 404 405// Special code for native processes that are not being managed by the system (so 406// don't have an oom adj assigned by the system). 407static final int NATIVE_ADJ = -1000; 408``` 409 410[man-proc]: https://manpages.debian.org/stretch/manpages/proc.5.en.html 411