• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Trace-based metrics
2
3_The metrics subsystem is a part of the
4[trace processor](/docs/analysis/trace-processor.md) which uses traces to
5compute reproducible metrics. It can be used in a wide range of situations;
6examples include benchmarks, lab tests and on large corpuses of traces._
7
8![Block diagram of metrics](/docs/images/metrics-summary.png)
9
10## Quickstart
11
12The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview
13on how to compute trace-based metrics traces using trace processor.
14
15## Introduction
16
17### Motivation
18
19Performance metrics are useful to monitor for the health of a system and ensure
20that a system does not regress over time as new features are added.
21
22However, metrics retrieved directly from the system have a downside: if there is
23a regression, it is difficult to root-cause the issue. Often, the problem may
24not be reproducible or may rely on a particular setup.
25
26Trace-based metrics are one possible solution to this problem. Instead of
27collecting metrics directly on the system, a trace is collected and metrics are
28computed from the trace. If a regression in the metric is spotted, the developer
29can look directly at the trace to understand why the regression has occurred
30instead of having to reproduce the issue.
31
32### Metric subsystem
33
34The metric subsystem is a part of the
35[trace processor](/docs/analysis/trace-processor.md) which executes SQL queries
36against traces and produces a metric which summarizes some performance attribute
37(e.g. CPU, memory, startup latency etc.).
38
39For example, generating the Android CPU metrics on a trace is as simple as:
40
41```python
42> ./trace_processor --run-metrics android_cpu <trace>
43android_cpu {
44  process_info {
45    name: "/system/bin/init"
46    threads {
47      name: "init"
48      core {
49        id: 1
50        metrics {
51          mcycles: 1
52          runtime_ns: 570365
53          min_freq_khz: 1900800
54          max_freq_khz: 1900800
55          avg_freq_khz: 1902017
56        }
57      }
58      ...
59    }
60    ...
61  }
62  ...
63}
64```
65
66### Case for upstreaming
67
68Authors are strongly encouraged to add all metrics derived on Perfetto traces to
69the Perfetto repo unless there is a clear usecase (e.g. confidentiality) why
70these metrics should not be publicly available.
71
72In return for upstreaming metrics, authors will have first class support for
73running metrics locally and the confidence that their metrics will remain stable
74as trace processor is developed.
75
76As well as scaling upwards while developing from running on a single trace
77locally to running on a large set of traces, the reverse is also very useful.
78When an anomaly is observed in the metrics of a lab benchmark, a representative
79trace can be downloaded and the same metric can be run locally in trace
80processor.
81
82Since the same code is running locally and remotely, developers can be confident
83in reproducing the issue and use the trace processor and/or the Perfetto UI to
84identify the problem.
85
86## Walkthrough: prototyping a metric
87
88TIP: To see how to add to add a new metric to trace processor, see the checklist
89[here](/docs/contributing/common-tasks.md#new-metric)
90
91This walkthrough will outline how to prototype a metric locally without needing
92to compile trace processor. This metric will compute the CPU time for every
93process in the trace and list the names of the top 5 processes (by CPU time) and
94the number of threads created by the process.
95
96NOTE: See this [GitHub gist][gist] to see how the code should look at the end of
97      the walkthrough. The prerequisites and Step 4 below give instructions on
98      how to get trace processor and run the metrics code.
99
100[gist]: https://gist.github.com/tilal6991/c221cf0cae17e298dfa82b118edf9080
101
102### Prerequisites
103
104As a setup step, create a folder to act as a scratch workspace; this folder will be referred to using the env variable `$WORKSPACE` in Step 4.
105
106The other requirement is trace processor. This can downloaded from [here](https://get.perfetto.dev/trace_processor) or can be built from source
107using the instructions [here](trace-processor.md). Whichever method is chosen, $TRACE_PROCESSOR env variable will be used to refer to the location of the binary in Step 4.
108
109### Step 1
110
111As all metrics in the metrics platform are defined using protos, the metric needs to be structured as a proto. For this metric, there needs to be some notion of a process name along with its CPU time and number of threads.
112
113Starting off, in a file named `top_five_processes.proto` in our workspace, create a basic proto message called ProcessInfo with those three fields:
114
115```protobuf
116message ProcessInfo {
117  optional string process_name = 1;
118  optional uint64 cpu_time_ms = 2;
119  optional uint32 num_threads = 3;
120}
121```
122
123Next , create a wrapping message which will hold the repeated field containing the top 5 processes.
124
125```protobuf
126message TopProcesses {
127  repeated ProcessInfo process_info = 1;
128}
129```
130
131Finally, define an extension to the root proto for all metrics (the [TraceMetrics](https://android.googlesource.com/platform/external/perfetto/+/HEAD/protos/perfetto/metrics/metrics.proto#39) proto).
132
133```protobuf
134extend TraceMetrics {
135  optional TopProcesses top_processes = 450;
136}
137```
138
139Adding this extension field allows trace processor to link the newly defined
140metric to the `TraceMetrics` proto.
141
142_Notes:_
143
144- The field ids 450-500 are reserved for local development so any of them can be used as the field id for the extension field.
145- The choice of field name here is important as the SQL file and the final table generated in SQL will be based on this name.
146
147Putting everything together, along with some boilerplate preamble gives:
148
149```protobuf
150syntax = "proto2";
151
152package perfetto.protos;
153
154import "protos/perfetto/metrics/metrics.proto";
155
156message ProcessInfo {
157  optional string process_name = 1;
158  optional int64 cpu_time_ms = 2;
159  optional uint32 num_threads = 3;
160}
161
162message TopProcesses {
163  repeated ProcessInfo process_info = 1;
164}
165
166extend TraceMetrics {
167  optional TopProcesses top_processes = 450;
168}
169```
170
171### Step 2
172
173Next, write the SQL to generate the table of the top 5 processes ordered by the
174sum of the CPU time they ran for and the number of threads which were associated
175with the process.
176
177The following SQL should added to a file called `top_five_processes.sql` in the
178workspace:
179
180```sql
181CREATE VIEW top_five_processes_by_cpu
182SELECT
183  process.name as process_name,
184  CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms,
185  COUNT(DISTINCT utid) as num_threads
186FROM sched
187INNER JOIN thread USING(utid)
188INNER JOIN process USING(upid)
189GROUP BY process.name
190ORDER BY cpu_time_ms DESC
191LIMIT 5;
192```
193
194Let's break this query down:
195
1961. The first table used is the `sched` table. This contains all the scheduling
197   data available in the trace. Each scheduling "slice" is associated with a
198   thread which is uniquely identified in Perfetto traces using its `utid`. The
199   two pieces of information needed from the sched table are the `dur` -
200   short for duration, this is the amount of time the slice lasted - and the
201   `utid` which will be used to join with the thread table.
2022. The next table is the thread table. This gives us a lot of information which
203   is not particularly interesting (including its thread name) but it does give
204   us the `upid`. Similar to `utid`, `upid` is the unique identifier for a
205   process in a Perfetto trace. In this case, `upid` will refer to the process
206   which hosts the thread given by `utid`.
2073. The final table is the process table. This gives the name of the process
208   associated with the original sched slice.
2094. With the process, thread and duration for each sched slice, all the slices
210   for a single processes are collected and their durations summed to get the
211   CPU time (dividing by 1e6 as sched's duration is in nanoseconds) and
212   the number of distinct threads.
2135. Finally, we order by the cpu time and limit to the top 5 results.
214
215### Step 3
216
217Now that the result of the metric has been expressed as an SQL table, it needs
218to be converted to a proto. The metrics platform has built-in support for emitting
219protos using SQL functions; something which is used extensively in this step.
220
221Let's look at how it works for our table above.
222
223```sql
224CREATE VIEW top_processes_output AS
225SELECT TopProcesses(
226  'process_info', (
227    SELECT RepeatedField(
228      ProcessInfo(
229        'process_name', process_name,
230        'cpu_time_ms', cpu_time_ms,
231        'num_threads', num_threads
232      )
233    )
234    FROM top_five_processes_by_cpu
235  )
236);
237```
238
239Breaking this down again:
240
2411. Starting from the inner-most SELECT statement, there is what looks like
242   a function call to the ProcessInfo function; in fact this is no coincidence.
243   For each proto that the metrics platform knows about, an SQL function is
244   generated with the same name as the proto. This function takes key value
245   pairs with the key as the name of the proto field to fill and the value being
246   the data to store in the field. The output is the proto created by writing
247   the fields described in the function. (\*)
248
249   In this case, this function is called once for each row in the
250   `top_five_processes_by_cpu` table. The output will be the fully filled
251   ProcessInfo proto.
252
253   The call to the `RepeatedField` function is the most interesting part and
254   also the most important. In technical terms, `RepeatedField` is an aggregate
255   function. Practically, this means that it takes a full table of values and
256   generates a single array which contains all the values passed to it.
257
258   Therefore, the output of this whole SELECT statement is an array of 5
259   ProcessInfo protos.
260
2612. Next is creation of the `TopProcesses` proto. By now, the syntax should
262   already feel somewhat familiar; the proto builder function is called to fill
263   in the `process_info` field with the array of protos from the inner function.
264
265   The output of this SELECT is a single `TopProcesses` proto containing the
266   ProcessInfos as a repeated field.
267
2683. Finally, the view is created. This view is specially named to allow the
269   metrics platform to query it to obtain the root proto for each metric
270   (in this case `TopProcesses`). See the note below as to the pattern behind
271   this view's name.
272
273(\*) _This is not strictly true. To type-check the protos, some metadata
274is returned about the type of the proto but this is unimportant for metric
275authors._
276
277NOTE: It is important that the views be named {name of TraceMetrics extension
278      field}\_output. This is the pattern used and expected by the metrics
279      platform for all metrics.
280
281The final file should look like so:
282
283```sql
284CREATE VIEW top_five_processes_by_cpu AS
285SELECT
286  process.name as process_name,
287  CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms,
288  COUNT(DISTINCT utid) as num_threads
289FROM sched
290INNER JOIN thread USING(utid)
291INNER JOIN process USING(upid)
292GROUP BY process.name
293ORDER BY cpu_time_ms DESC
294LIMIT 5;
295
296CREATE top_processes_output AS
297SELECT TopProcesses(
298  'process_info', (
299    SELECT RepeatedField(
300      ProcessInfo(
301        'process_name', process_name,
302        'cpu_time_ms', cpu_time_ms,
303        'num_threads', num_threads
304      )
305    )
306    FROM top_five_processes_by_cpu
307  )
308);
309```
310
311NOTE: The name of the SQL file should be the same as the name of TraceMetrics
312      extension field. This is to allow the metrics platform to associated the
313      proto extension field with the SQL which needs to be run to generate it.
314
315### Step 4
316
317For this step, invoke trace processor shell to run the metrics (see the
318[Quickstart](/docs/quickstart/trace-analysis.md) for downloading instructions):
319
320```shell
321$TRACE_PROCESSOR --run-metrics $WORKSPACE/top_five_processes.sql $TRACE 2> /dev/null
322```
323
324(For an example trace to test this on, see the Notes section below.)
325
326By passing the SQL file for the metric to be computed, trace processor uses the name of this file to find the proto and to figure out the name of the output table for the proto and the name of the extension field for `TraceMetrics`; this is the reason it was important to choose the names of these other objects carefully.
327
328_Notes:_
329
330- If something doesn't work as intended, check that the workspace looks the same as the contents of this [GitHub gist](https://gist.github.com/tilal6991/c221cf0cae17e298dfa82b118edf9080).
331- A good example trace for this metric is the Android example trace used by the Perfetto UI found [here](https://storage.googleapis.com/perfetto-misc/example_android_trace_30s_1).
332- stderr is redirected to remove any noise from parsing the trace that trace processor generates.
333
334If everything went successfully, the following output should be visible (specifically this is the output for the Android example trace linked above):
335
336```
337[perfetto.protos.top_five_processes] {
338  process_info {
339    process_name: "com.google.android.GoogleCamera"
340    cpu_time_ms: 15154
341    num_threads: 125
342  }
343  process_info {
344    process_name: "sugov:4"
345    cpu_time_ms: 6846
346    num_threads: 1
347  }
348  process_info {
349    process_name: "system_server"
350    cpu_time_ms: 6809
351    num_threads: 66
352  }
353  process_info {
354    process_name: "cds_ol_rx_threa"
355    cpu_time_ms: 6684
356    num_threads: 1
357  }
358  process_info {
359    process_name: "com.android.chrome"
360    cpu_time_ms: 5125
361    num_threads: 49
362  }
363}
364```
365
366### Next steps
367
368* The [common tasks](/docs/contributing/common-tasks.md) page gives a list of
369  steps on how new metrics can be added to the trace processor.
370