1# CPU Scheduling events 2 3On Android and Linux Perfetto can gather scheduler traces via the Linux Kernel 4[ftrace](https://www.kernel.org/doc/Documentation/trace/ftrace.txt) 5infrastructure. 6 7This allows to get fine grained scheduling events such as: 8 9* Which threads were scheduling on which CPU cores at any point in time, with 10 nanosecond accuracy. 11* The reason why a running thread got descheduled (e.g. pre-emption, blocked on 12 a mutex, blocking syscall or any other wait queue). 13* The point in time when a thread became eligible to be executed, even if it was 14 not put immediately on any CPU run queue, together with the source thread that 15 made it executable. 16 17## UI 18 19When zoomed out, the UI shows a quantized view of CPU usage, which collapses the 20scheduling information: 21 22![](/docs/images/cpu-bar-graphs.png "Quantized view of CPU run queues") 23 24However, by zooming in, the individual scheduling events become visible: 25 26![](/docs/images/cpu-zoomed.png "Detailed view of CPU run queues") 27 28Clicking on a CPU slice shows the relevant information in the details panel: 29 30![](/docs/images/cpu-sched-details.png "CPU scheduling details") 31 32Scrolling down, when expanding individual processes, the scheduling events also 33create one track for each thread, which allows to follow the evolution of the 34state of individual threads: 35 36![](/docs/images/thread-states.png "States of individual threads") 37 38 39```protobuf 40data_sources { 41 config { 42 name: "linux.ftrace" 43 ftrace_config { 44 ftrace_events: "sched/sched_switch" 45 ftrace_events: "sched/sched_waking" 46 } 47 } 48} 49``` 50 51## SQL 52 53At the SQL level, the scheduling data is exposed in the 54[`sched_slice`](/docs/analysis/sql-tables.autogen#sched_slice) table. 55 56```sql 57select ts, dur, cpu, end_state, priority, process.name, thread.name 58from sched_slice left join thread using(utid) left join process using(upid) 59``` 60 61ts | dur | cpu | end_state | priority | process.name, | thread.name 62---|-----|-----|-----------|----------|---------------|------------ 63261187012170995 | 247188 | 2 | S | 130 | /system/bin/logd | logd.klogd 64261187012418183 | 12812 | 2 | D | 120 | /system/bin/traced_probes | traced_probes0 65261187012421099 | 220000 | 4 | D | 120 | kthreadd | kworker/u16:2 66261187012430995 | 72396 | 2 | D | 120 | /system/bin/traced_probes | traced_probes1 67261187012454537 | 13958 | 0 | D | 120 | /system/bin/traced_probes | traced_probes0 68261187012460318 | 46354 | 3 | S | 120 | /system/bin/traced_probes | traced_probes2 69261187012468495 | 10625 | 0 | R | 120 | [NULL] | swapper/0 70261187012479120 | 6459 | 0 | D | 120 | /system/bin/traced_probes | traced_probes0 71261187012485579 | 7760 | 0 | R | 120 | [NULL] | swapper/0 72261187012493339 | 34896 | 0 | D | 120 | /system/bin/traced_probes | traced_probes0 73 74## TraceConfig 75 76```protobuf 77data_sources: { 78 config { 79 name: "linux.ftrace" 80 ftrace_config { 81 ftrace_events: "sched/sched_switch" 82 ftrace_events: "sched/sched_process_exit" 83 ftrace_events: "sched/sched_process_free" 84 ftrace_events: "task/task_newtask" 85 ftrace_events: "task/task_rename" 86 } 87 } 88} 89 90# This is to get full process name and thread<>process relationships. 91data_sources: { 92 config { 93 name: "linux.process_stats" 94 } 95} 96``` 97 98## Scheduling wakeups and latency analysis 99 100By further enabling the following in the TraceConfig, the ftrace data source 101will record also scheduling wake up events: 102 103```protobuf 104 ftrace_events: "sched/sched_wakeup_new" 105 ftrace_events: "sched/sched_waking" 106``` 107 108While `sched_switch` events are emitted only when a thread is in the 109`R(unnable)` state AND is running on a CPU run queue, `sched_waking` events are 110emitted when any event causes a thread state to change. 111 112Consider the following example: 113 114``` 115Thread A 116condition_variable.wait() 117 Thread B 118 condition_variable.notify() 119``` 120 121When Thread A suspends on the wait() it will enter the state `S(sleeping)` and 122get removed from the CPU run queue. When Thread B notifies the variable, the 123kernel will transition Thread A into the `R(unnable)` state. Thread A at that 124point is eligible to be put back on a run queue. However this might not happen 125for some time because, for instance: 126 127* All CPUs might be busy running some other thread, and Thread A needs to wait 128 to get a run queue slot assigned (or the other threads have higher priority). 129* Some other CPUs other than the current one, but the scheduler load balancer 130 might take some time to move the thread on another CPU. 131 132Unless using real-time thread priorities, most Linux Kernel scheduler 133configurations are not strictly work-conserving. For instance the scheduler 134might prefer to wait some time in the hope that the thread running on the 135current CPU goes to idle, avoiding a cross-cpu migration which might be more 136costly both in terms of overhead and power. 137 138NOTE: `sched_waking` and `sched_wakeup` provide nearly the same information. The 139 difference lies in wakeup events across CPUs, which involve 140 inter-processor interrupts. The former is emitted on the source (wakee) 141 CPU, the latter on the destination (waked) CPU. `sched_waking` is usually 142 sufficient for latency analysis, unless you are looking into breaking down 143 latency due to inter-processor signaling. 144 145When enabling `sched_waking` events, the following will appear in the UI when 146selecting a CPU slice: 147 148![](/docs/images/latency.png "Scheduling wake-up events in the UI") 149 150