1# Trace-based metrics 2 3_The metrics subsystem is a part of the 4[trace processor](/docs/analysis/trace-processor.md) which uses traces to 5compute reproducible metrics. It can be used in a wide range of situations; 6examples include benchmarks, lab tests and on large corpuses of traces._ 7 8![Block diagram of metrics](/docs/images/metrics-summary.png) 9 10## Quickstart 11 12The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview 13on how to compute trace-based metrics traces using trace processor. 14 15## Introduction 16 17### Motivation 18 19Performance metrics are useful to monitor for the health of a system and ensure 20that a system does not regress over time as new features are added. 21 22However, metrics retrieved directly from the system have a downside: if there is 23a regression, it is difficult to root-cause the issue. Often, the problem may 24not be reproducible or may rely on a particular setup. 25 26Trace-based metrics are one possible solution to this problem. Instead of 27collecting metrics directly on the system, a trace is collected and metrics are 28computed from the trace. If a regression in the metric is spotted, the developer 29can look directly at the trace to understand why the regression has occurred 30instead of having to reproduce the issue. 31 32### Metric subsystem 33 34The metric subsystem is a part of the 35[trace processor](/docs/analysis/trace-processor.md) which executes SQL queries 36against traces and produces a metric which summarizes some performance attribute 37(e.g. CPU, memory, startup latency etc.). 38 39For example, generating the Android CPU metrics on a trace is as simple as: 40 41```python 42> ./trace_processor --run-metrics android_cpu <trace> 43android_cpu { 44 process_info { 45 name: "/system/bin/init" 46 threads { 47 name: "init" 48 core { 49 id: 1 50 metrics { 51 mcycles: 1 52 runtime_ns: 570365 53 min_freq_khz: 1900800 54 max_freq_khz: 1900800 55 avg_freq_khz: 1902017 56 } 57 } 58 ... 59 } 60 ... 61 } 62 ... 63} 64``` 65 66### Case for upstreaming 67 68NOTE: Googlers: for internal usage of metrics in Google3 (i.e. metrics which are 69confidential), please see [this internal page](https://goto.google.com/viecd). 70 71Authors are strongly encouraged to add all metrics derived on Perfetto traces to 72the Perfetto repo unless there is a clear usecase (e.g. confidentiality) why 73these metrics should not be publicly available. 74 75In return for upstreaming metrics, authors will have first class support for 76running metrics locally and the confidence that their metrics will remain stable 77as trace processor is developed. 78 79As well as scaling upwards while developing from running on a single trace 80locally to running on a large set of traces, the reverse is also very useful. 81When an anomaly is observed in the metrics of a lab benchmark, a representative 82trace can be downloaded and the same metric can be run locally in trace 83processor. 84 85Since the same code is running locally and remotely, developers can be confident 86in reproducing the issue and use the trace processor and/or the Perfetto UI to 87identify the problem. 88 89## Walkthrough: prototyping a metric 90 91TIP: To see how to add a new metric to trace processor, see the checklist 92[here](/docs/contributing/common-tasks.md#new-metric) 93 94This walkthrough will outline how to prototype a metric locally without needing 95to compile trace processor. This metric will compute the CPU time for every 96process in the trace and list the names of the top 5 processes (by CPU time) and 97the number of threads created by the process. 98 99NOTE: See this [GitHub gist][gist] to see how the code should look at the end of 100 the walkthrough. The prerequisites and Step 4 below give instructions on 101 how to get trace processor and run the metrics code. 102 103[gist]: https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080 104 105### Prerequisites 106 107As a setup step, create a folder to act as a scratch workspace; this folder will be referred to using the env variable `$WORKSPACE` in Step 4. 108 109The other requirement is trace processor. This can downloaded from [here](https://get.perfetto.dev/trace_processor) or can be built from source 110using the instructions [here](trace-processor.md). Whichever method is chosen, $TRACE_PROCESSOR env variable will be used to refer to the location of the binary in Step 4. 111 112### Step 1 113 114As all metrics in the metrics platform are defined using protos, the metric needs to be structured as a proto. For this metric, there needs to be some notion of a process name along with its CPU time and number of threads. 115 116Starting off, in a file named `top_five_processes.proto` in our workspace, create a basic proto message called ProcessInfo with those three fields: 117 118```protobuf 119message ProcessInfo { 120 optional string process_name = 1; 121 optional uint64 cpu_time_ms = 2; 122 optional uint32 num_threads = 3; 123} 124``` 125 126Next , create a wrapping message which will hold the repeated field containing the top 5 processes. 127 128```protobuf 129message TopProcesses { 130 repeated ProcessInfo process_info = 1; 131} 132``` 133 134Finally, define an extension to the root proto for all metrics (the [TraceMetrics](https://android.googlesource.com/platform/external/perfetto/+/HEAD/protos/perfetto/metrics/metrics.proto#39) proto). 135 136```protobuf 137extend TraceMetrics { 138 optional TopProcesses top_five_processes = 450; 139} 140``` 141 142Adding this extension field allows trace processor to link the newly defined 143metric to the `TraceMetrics` proto. 144 145_Notes:_ 146 147- The field ids 450-500 are reserved for local development so any of them can be used as the field id for the extension field. 148- The choice of field name here is important as the SQL file and the final table generated in SQL will be based on this name. 149 150Putting everything together, along with some boilerplate preamble gives: 151 152```protobuf 153syntax = "proto2"; 154 155package perfetto.protos; 156 157import "protos/perfetto/metrics/metrics.proto"; 158 159message ProcessInfo { 160 optional string process_name = 1; 161 optional int64 cpu_time_ms = 2; 162 optional uint32 num_threads = 3; 163} 164 165message TopProcesses { 166 repeated ProcessInfo process_info = 1; 167} 168 169extend TraceMetrics { 170 optional TopProcesses top_five_processes = 450; 171} 172``` 173 174### Step 2 175 176Next, write the SQL to generate the table of the top 5 processes ordered by the 177sum of the CPU time they ran for and the number of threads which were associated 178with the process. 179 180The following SQL should be added to a file called `top_five_processes.sql` in 181the workspace: 182 183```sql 184CREATE VIEW top_five_processes_by_cpu AS 185SELECT 186 process.name as process_name, 187 CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms, 188 COUNT(DISTINCT utid) as num_threads 189FROM sched 190INNER JOIN thread USING(utid) 191INNER JOIN process USING(upid) 192GROUP BY process.name 193ORDER BY cpu_time_ms DESC 194LIMIT 5; 195``` 196 197Let's break this query down: 198 1991. The first table used is the `sched` table. This contains all the scheduling 200 data available in the trace. Each scheduling "slice" is associated with a 201 thread which is uniquely identified in Perfetto traces using its `utid`. The 202 two pieces of information needed from the sched table are the `dur` - 203 short for duration, this is the amount of time the slice lasted - and the 204 `utid` which will be used to join with the thread table. 2052. The next table is the thread table. This gives us a lot of information which 206 is not particularly interesting (including its thread name) but it does give 207 us the `upid`. Similar to `utid`, `upid` is the unique identifier for a 208 process in a Perfetto trace. In this case, `upid` will refer to the process 209 which hosts the thread given by `utid`. 2103. The final table is the process table. This gives the name of the process 211 associated with the original sched slice. 2124. With the process, thread and duration for each sched slice, all the slices 213 for a single processes are collected and their durations summed to get the 214 CPU time (dividing by 1e6 as sched's duration is in nanoseconds) and 215 the number of distinct threads. 2165. Finally, we order by the cpu time and limit to the top 5 results. 217 218### Step 3 219 220Now that the result of the metric has been expressed as an SQL table, it needs 221to be converted to a proto. The metrics platform has built-in support for emitting 222protos using SQL functions; something which is used extensively in this step. 223 224Let's look at how it works for our table above. 225 226```sql 227CREATE VIEW top_five_processes_output AS 228SELECT TopProcesses( 229 'process_info', ( 230 SELECT RepeatedField( 231 ProcessInfo( 232 'process_name', process_name, 233 'cpu_time_ms', cpu_time_ms, 234 'num_threads', num_threads 235 ) 236 ) 237 FROM top_five_processes_by_cpu 238 ) 239); 240``` 241 242Breaking this down again: 243 2441. Starting from the inner-most SELECT statement, there is what looks like 245 a function call to the ProcessInfo function; in fact this is no coincidence. 246 For each proto that the metrics platform knows about, an SQL function is 247 generated with the same name as the proto. This function takes key value 248 pairs with the key as the name of the proto field to fill and the value being 249 the data to store in the field. The output is the proto created by writing 250 the fields described in the function. (\*) 251 252 In this case, this function is called once for each row in the 253 `top_five_processes_by_cpu` table. The output will be the fully filled 254 ProcessInfo proto. 255 256 The call to the `RepeatedField` function is the most interesting part and 257 also the most important. In technical terms, `RepeatedField` is an aggregate 258 function. Practically, this means that it takes a full table of values and 259 generates a single array which contains all the values passed to it. 260 261 Therefore, the output of this whole SELECT statement is an array of 5 262 ProcessInfo protos. 263 2642. Next is creation of the `TopProcesses` proto. By now, the syntax should 265 already feel somewhat familiar; the proto builder function is called to fill 266 in the `process_info` field with the array of protos from the inner function. 267 268 The output of this SELECT is a single `TopProcesses` proto containing the 269 ProcessInfos as a repeated field. 270 2713. Finally, the view is created. This view is specially named to allow the 272 metrics platform to query it to obtain the root proto for each metric 273 (in this case `TopProcesses`). See the note below as to the pattern behind 274 this view's name. 275 276(\*) _This is not strictly true. To type-check the protos, some metadata 277is returned about the type of the proto but this is unimportant for metric 278authors._ 279 280NOTE: It is important that the views be named {name of TraceMetrics extension 281 field}\_output. This is the pattern used and expected by the metrics 282 platform for all metrics. 283 284The final file should look like so: 285 286```sql 287CREATE VIEW top_five_processes_by_cpu AS 288SELECT 289 process.name as process_name, 290 CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms, 291 COUNT(DISTINCT utid) as num_threads 292FROM sched 293INNER JOIN thread USING(utid) 294INNER JOIN process USING(upid) 295GROUP BY process.name 296ORDER BY cpu_time_ms DESC 297LIMIT 5; 298 299CREATE top_five_processes_output AS 300SELECT TopProcesses( 301 'process_info', ( 302 SELECT RepeatedField( 303 ProcessInfo( 304 'process_name', process_name, 305 'cpu_time_ms', cpu_time_ms, 306 'num_threads', num_threads 307 ) 308 ) 309 FROM top_five_processes_by_cpu 310 ) 311); 312``` 313 314NOTE: The name of the SQL file should be the same as the name of TraceMetrics 315 extension field. This is to allow the metrics platform to associated the 316 proto extension field with the SQL which needs to be run to generate it. 317 318### Step 4 319 320For this step, invoke trace processor shell to run the metrics (see the 321[Quickstart](/docs/quickstart/trace-analysis.md) for downloading instructions): 322 323```shell 324$TRACE_PROCESSOR --run-metrics $WORKSPACE/top_five_processes.sql $TRACE 2> /dev/null 325``` 326 327(For an example trace to test this on, see the Notes section below.) 328 329By passing the SQL file for the metric to be computed, trace processor uses the name of this file to find the proto and to figure out the name of the output table for the proto and the name of the extension field for `TraceMetrics`; this is the reason it was important to choose the names of these other objects carefully. 330 331_Notes:_ 332 333- If something doesn't work as intended, check that the workspace looks the same as the contents of this [GitHub gist](https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080). 334- A good example trace for this metric is the Android example trace used by the Perfetto UI found [here](https://storage.googleapis.com/perfetto-misc/example_android_trace_30s_1). 335- stderr is redirected to remove any noise from parsing the trace that trace processor generates. 336 337If everything went successfully, the following output should be visible (specifically this is the output for the Android example trace linked above): 338 339``` 340[perfetto.protos.top_five_processes] { 341 process_info { 342 process_name: "com.google.android.GoogleCamera" 343 cpu_time_ms: 15154 344 num_threads: 125 345 } 346 process_info { 347 process_name: "sugov:4" 348 cpu_time_ms: 6846 349 num_threads: 1 350 } 351 process_info { 352 process_name: "system_server" 353 cpu_time_ms: 6809 354 num_threads: 66 355 } 356 process_info { 357 process_name: "cds_ol_rx_threa" 358 cpu_time_ms: 6684 359 num_threads: 1 360 } 361 process_info { 362 process_name: "com.android.chrome" 363 cpu_time_ms: 5125 364 num_threads: 49 365 } 366} 367``` 368 369### Next steps 370 371* The [common tasks](/docs/contributing/common-tasks.md) page gives a list of 372 steps on how new metrics can be added to the trace processor. 373