• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Trace Processor (C++)
2
3_The Trace Processor is a C++ library
4([/src/trace_processor](/src/trace_processor)) that ingests traces encoded in a
5wide variety of formats and exposes an SQL interface for querying trace events
6contained in a consistent set of tables. It also has other features including
7computation of summary metrics, annotating the trace with user-friendly
8descriptions and deriving new events from the contents of the trace._
9
10![Trace processor block diagram](/docs/images/trace-processor.png)
11
12## Quickstart
13
14The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview
15on how to run SQL queries against traces using trace processor.
16
17## Introduction
18
19Events in a trace are optimized for fast, low-overhead recording. Therefore
20traces need significant data processing to extract meaningful information from
21them. This is compounded by the number of legacy formats which are still in use and
22need to be supported in trace analysis tools.
23
24The trace processor abstracts this complexity by parsing traces, extracting the
25data inside, and exposing it in a set of database tables which can be queried
26with SQL.
27
28Features of the trace processor include:
29
30* Execution of SQL queries on a custom, in-memory, columnar database backed by
31  the SQLite query engine.
32* Metrics subsystem which allows computation of summarized view of the trace
33  (e.g. CPU or memory usage of a process, time taken for app startup etc.).
34* Annotating events in the trace with user-friendly descriptions, providing
35  context and explanation of events to newer users.
36* Creation of new events derived from the contents of the trace.
37
38The formats supported by trace processor include:
39
40* Perfetto native protobuf format
41* Linux ftrace
42* Android systrace
43* Chrome JSON (including JSON embedding Android systrace text)
44* Fuchsia binary format
45* [Ninja](https://ninja-build.org/) logs (the build system)
46
47The trace processor is embedded in a wide variety of trace analysis tools, including:
48
49* [trace_processor](/docs/analysis/trace-processor.md), a standalone binary
50   providing a shell interface (and the reference embedder).
51* [Perfetto UI](https://ui.perfetto.dev), in the form of a WebAssembly module.
52* [Android GPU Inspector](https://gpuinspector.dev/).
53* [Android Studio](https://developer.android.com/studio/).
54
55## Concepts
56
57The trace processor has some foundational terminology and concepts which are
58used in the rest of documentation.
59
60### Events
61
62In the most general sense, a trace is simply a collection of timestamped
63"events". Events can have associated metadata and context which allows them to
64be interpreted and analyzed. Timestamps are in nanoseconds; the values
65themselves depend on the [clock][primary_trace_clock] selected in TraceConfig.
66
67Events form the foundation of trace processor and are one of two types: slices
68and counters.
69
70[primary_trace_clock]: https://cs.android.com/android/platform/superproject/main/+/main:external/perfetto/protos/perfetto/config/trace_config.proto;l=114;drc=c74c8cf69e20d7b3261fb8c5ab4d057e8badce3e
71
72#### Slices
73
74![Examples of slices](/docs/images/slices.png)
75
76A slice refers to an interval of time with some data describing what was
77happening in that interval. Some example of slices include:
78
79* Scheduling slices for each CPU
80* Atrace slices on Android
81* Userspace slices from Chrome
82
83#### Counters
84
85![Examples of counters](/docs/images/counters.png)
86
87A counter is a continuous value which varies over time. Some examples of
88counters include:
89
90* CPU frequency for each CPU core
91* RSS memory events - both from the kernel and polled from /proc/stats
92* atrace counter events from Android
93* Chrome counter events
94
95### Tracks
96
97A track is a named partition of events of the same type and the same associated
98context. For example:
99
100* Scheduling slices have one track for each CPU
101* Sync userspace slice have one track for each thread which emitted an event
102* Async userspace slices have one track for each “cookie” linking a set of async
103  events
104
105The most intuitive way to think of a track is to imagine how they would be drawn
106in a UI; if all the events are in a single row, they belong to the same track.
107For example, all the scheduling events for CPU 5 are on the same track:
108
109![CPU slices track](/docs/images/cpu-slice-track.png)
110
111Tracks can be split into various types based on the type of event they contain
112and the context they are associated with. Examples include:
113
114* Global tracks are not associated to any context and contain slices
115* Thread tracks are associated to a single thread and contain slices
116* Counter tracks are not associated to any context and contain counters
117* CPU counter tracks are associated to a single CPU and contain counters
118
119### Thread and process identifiers
120
121The handling of threads and processes needs special care when considered in the
122context of tracing; identifiers for threads and processes (e.g. `pid`/`tgid` and
123`tid` in Android/macOS/Linux) can be reused by the operating system over the
124course of a trace. This means they cannot be relied upon as a unique identifier
125when querying tables in trace processor.
126
127To solve this problem, the trace processor uses `utid` (_unique_ tid) for
128threads and `upid` (_unique_ pid) for processes. All references to threads and
129processes (e.g. in CPU scheduling data, thread tracks) uses `utid` and `upid`
130instead of the system identifiers.
131
132## Writing Queries
133
134### Context using tracks
135
136A common question when querying tables in trace processor is: "how do I obtain
137the process or thread for a slice?". Phrased more generally, the question is
138"how do I get the context for an event?".
139
140In trace processor, any context associated with all events on a track is found
141on the associated `track` tables.
142
143For example, to obtain the `utid` of any thread which emitted a `measure` slice
144
145```sql
146SELECT utid
147FROM slice
148JOIN thread_track ON thread_track.id = slice.track_id
149WHERE slice.name = 'measure'
150```
151
152Similarly, to obtain the `upid`s of any process which has a `mem.swap` counter
153greater than 1000
154
155```sql
156SELECT upid
157FROM counter
158JOIN process_counter_track ON process_counter_track.id = counter.track_id
159WHERE process_counter_track.name = 'mem.swap' AND value > 1000
160```
161
162### Thread and process tables
163
164While obtaining `utid`s and `upid`s are a step in the right direction, generally
165users want the original `tid`, `pid`, and process/thread names.
166
167The `thread` and `process` tables map `utid`s and `upid`s to threads and
168processes respectively. For example, to lookup the thread with `utid` 10
169
170```sql
171SELECT tid, name
172FROM thread
173WHERE utid = 10
174```
175
176The `thread` and `process` tables can also be joined with the associated track
177tables directly to jump directly from the slice or counter to the information
178about processes and threads.
179
180For example, to get a list of all the threads which emitted a `measure` slice
181
182```sql
183SELECT thread.name AS thread_name
184FROM slice
185JOIN thread_track ON slice.track_id = thread_track.id
186JOIN thread USING(utid)
187WHERE slice.name = 'measure'
188GROUP BY thread_name
189```
190
191## Helper functions
192Helper functions are functions built into C++ which reduce the amount of
193boilerplate which needs to be written in SQL.
194
195### Extract args
196`EXTRACT_ARG` is a helper function which retrieves a property of an
197event (e.g. slice or counter) from the `args` table.
198
199It takes an `arg_set_id` and `key` as input and returns the value looked
200up in the `args` table.
201
202For example, to retrieve the `prev_comm` field for `sched_switch` events in
203the `ftrace_event` table.
204```sql
205SELECT EXTRACT_ARG(arg_set_id, 'prev_comm')
206FROM ftrace_event
207WHERE name = 'sched_switch'
208```
209
210Behind the scenes, the above query would desugar to the following:
211```sql
212SELECT
213  (
214    SELECT string_value
215    FROM args
216    WHERE key = 'prev_comm' AND args.arg_set_id = raw.arg_set_id
217  )
218FROM ftrace_event
219WHERE name = 'sched_switch'
220```
221
222NOTE: while convinient, `EXTRACT_ARG` can inefficient compared to a `JOIN`
223when working with very large tables; a function call is required for every
224row which will be slower than the batch filters/sorts used by `JOIN`.
225
226## Operator tables
227SQL queries are usually sufficient to retrieve data from trace processor.
228Sometimes though, certain constructs can be difficult to express pure SQL.
229
230In these situations, trace processor has special "operator tables" which solve
231a particular problem in C++ but expose an SQL interface for queries to take
232advantage of.
233
234### Span join
235Span join is a custom operator table which computes the intersection of
236spans of time from two tables or views. A span in this concept is a row in a
237table/view which contains a "ts" (timestamp) and "dur" (duration) columns.
238
239A column (called the *partition*) can optionally be specified which divides the
240rows from each table into partitions before computing the intersection.
241
242![Span join block diagram](/docs/images/span-join.png)
243
244```sql
245-- Get all the scheduling slices
246CREATE VIEW sp_sched AS
247SELECT ts, dur, cpu, utid
248FROM sched;
249
250-- Get all the cpu frequency slices
251CREATE VIEW sp_frequency AS
252SELECT
253  ts,
254  lead(ts) OVER (PARTITION BY track_id ORDER BY ts) - ts as dur,
255  cpu,
256  value as freq
257FROM counter
258JOIN cpu_counter_track ON counter.track_id = cpu_counter_track.id
259WHERE cpu_counter_track.name = 'cpufreq';
260
261-- Create the span joined table which combines cpu frequency with
262-- scheduling slices.
263CREATE VIRTUAL TABLE sched_with_frequency
264USING SPAN_JOIN(sp_sched PARTITIONED cpu, sp_frequency PARTITIONED cpu);
265
266-- This span joined table can be queried as normal and has the columns from both
267-- tables.
268SELECT ts, dur, cpu, utid, freq
269FROM sched_with_frequency;
270```
271
272NOTE: A partition can be specified on neither, either or both tables. If
273specified on both, the same column name has to be specified on each table.
274
275WARNING: An important restriction on span joined tables is that spans from
276the same table in the same partition *cannot* overlap. For performance
277reasons, span join does not attempt to detect and error out in this situation;
278instead, incorrect rows will silently be produced.
279
280WARNING: Partitions mush be integers. Importantly, string partitions are *not*
281supported; note that strings *can* be converted to integers by
282applying the `HASH` function to the string column.
283
284Left and outer span joins are also supported; both function analogously to
285the left and outer joins from SQL.
286```sql
287-- Left table partitioned + right table unpartitioned.
288CREATE VIRTUAL TABLE left_join
289USING SPAN_LEFT_JOIN(table_a PARTITIONED a, table_b);
290
291-- Both tables unpartitioned.
292CREATE VIRTUAL TABLE outer_join
293USING SPAN_OUTER_JOIN(table_x, table_y);
294```
295
296NOTE: there is a subtlety if the partitioned table is empty and is
297either a) part of an outer join b) on the right side of a left join.
298In this case, *no* slices will be emitted even if the other table is
299non-empty. This approach was decided as being the most natural
300after considering how span joins are used in practice.
301
302### Ancestor slice
303ancestor_slice is a custom operator table that takes a
304[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and computes
305all slices on the same track that are direct parents above that id (i.e. given
306a slice id it will return as rows all slices that can be found by following
307the parent_id column to the top slice (depth = 0)).
308
309The returned format is the same as the
310[slice table](/docs/analysis/sql-tables.autogen#slice)
311
312For example, the following finds the top level slice given a bunch of slices of
313interest.
314
315```sql
316CREATE VIEW interesting_slices AS
317SELECT id, ts, dur, track_id
318FROM slice WHERE name LIKE "%interesting slice name%";
319
320SELECT
321  *
322FROM
323  interesting_slices LEFT JOIN
324  ancestor_slice(interesting_slices.id) AS ancestor ON ancestor.depth = 0
325```
326
327### Ancestor slice by stack
328ancestor_slice_by_stack is a custom operator table that takes a
329[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and
330finds all slice ids with that stack_id, then, for each id it computes
331all the ancestor slices similarly to
332[ancestor_slice](/docs/analysis/trace-processor#ancestor-slice).
333
334The returned format is the same as the
335[slice table](/docs/analysis/sql-tables.autogen#slice)
336
337For example, the following finds the top level slice of all slices with the
338given name.
339
340```sql
341CREATE VIEW interesting_stack_ids AS
342SELECT stack_id
343FROM slice WHERE name LIKE "%interesting slice name%";
344
345SELECT
346  *
347FROM
348  interesting_stack_ids LEFT JOIN
349  ancestor_slice_by_stack(interesting_stack_ids.stack_id) AS ancestor
350  ON ancestor.depth = 0
351```
352
353### Descendant slice
354descendant_slice is a custom operator table that takes a
355[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and
356computes all slices on the same track that are nested under that id (i.e.
357all slices that are on the same track at the same time frame with a depth
358greater than the given slice's depth.
359
360The returned format is the same as the
361[slice table](/docs/analysis/sql-tables.autogen#slice)
362
363For example, the following finds the number of slices under each slice of
364interest.
365
366```sql
367CREATE VIEW interesting_slices AS
368SELECT id, ts, dur, track_id
369FROM slice WHERE name LIKE "%interesting slice name%";
370
371SELECT
372  *
373  (
374    SELECT
375      COUNT(*) AS total_descendants
376    FROM descendant_slice(interesting_slice.id)
377  )
378FROM interesting_slices
379```
380
381### Descendant slice by stack
382descendant_slice_by_stack is a custom operator table that takes a
383[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and
384finds all slice ids with that stack_id, then, for each id it computes
385all the descendant slices similarly to
386[descendant_slice](/docs/analysis/trace-processor#descendant-slice).
387
388The returned format is the same as the
389[slice table](/docs/analysis/sql-tables.autogen#slice)
390
391For example, the following finds the next level descendant of all slices with
392the given name.
393
394```sql
395CREATE VIEW interesting_stacks AS
396SELECT stack_id, depth
397FROM slice WHERE name LIKE "%interesting slice name%";
398
399SELECT
400  *
401FROM
402  interesting_stacks LEFT JOIN
403  descendant_slice_by_stack(interesting_stacks.stack_id) AS descendant
404  ON descendant.depth = interesting_stacks.depth + 1
405```
406
407### Connected/Following/Preceding flows
408
409DIRECTLY_CONNECTED_FLOW, FOLLOWING_FLOW and PRECEDING_FLOW are custom operator
410tables that take a
411[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and collect
412all entries of [flow table](/docs/analysis/sql-tables.autogen#flow), that are
413directly or indirectly connected to the given starting slice.
414
415`DIRECTLY_CONNECTED_FLOW(start_slice_id)` - contains all entries of
416[flow table](/docs/analysis/sql-tables.autogen#flow) that are present in any
417chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where
418`flow[i].slice_out = flow[i+1].slice_in` and `flow[0].slice_out = start_slice_id
419OR start_slice_id = flow[n].slice_in`.
420
421NOTE: Unlike the following/preceding flow functions, this function will not
422include flows connected to ancestors or descendants while searching for flows
423from a slice. It only includes the slices in the directly connected chain.
424
425`FOLLOWING_FLOW(start_slice_id)` - contains all flows which can be reached from
426a given slice via recursively following from flow's outgoing slice to its
427incoming one and from a reached slice to its child. The return table contains
428all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are
429present in any chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where
430`flow[i+1].slice_out IN DESCENDANT_SLICE(flow[i].slice_in) OR
431flow[i+1].slice_out = flow[i].slice_in` and `flow[0].slice_out IN
432DESCENDANT_SLICE(start_slice_id) OR flow[0].slice_out = start_slice_id`.
433
434`PRECEDING_FLOW(start_slice_id)` - contains all flows which can be reached from
435a given slice via recursively following from flow's incoming slice to its
436outgoing one and from a reached slice to its parent. The return table contains
437all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are
438present in any chain of kind: `flow[n] -> flow[n-1] -> ... -> flow[0]`, where
439`flow[i].slice_in IN ANCESTOR_SLICE(flow[i+1].slice_out) OR flow[i].slice_in =
440flow[i+1].slice_out` and `flow[0].slice_in IN ANCESTOR_SLICE(start_slice_id) OR
441flow[0].slice_in = start_slice_id`.
442
443```sql
444--number of following flows for each slice
445SELECT (SELECT COUNT(*) FROM FOLLOWING_FLOW(slice_id)) as following FROM slice;
446```
447
448## Metrics
449
450TIP: To see how to add to add a new metric to trace processor, see the checklist
451[here](/docs/contributing/common-tasks.md#new-metric).
452
453The metrics subsystem is a significant part of trace processor and thus is
454documented on its own [page](/docs/analysis/metrics.md).
455
456## Python API
457The trace processor's C++ library is also exposed through Python. This
458is documented on a [separate page](/docs/analysis/trace-processor-python.md).
459
460## Testing
461
462Trace processor is mainly tested in two ways:
4631. Unit tests of low-level building blocks
4642. "Diff" tests which parse traces and check the output of queries
465
466### Unit tests
467Unit testing trace processor is the same as in other parts of Perfetto and
468other C++ projects. However, unlike the rest of Perfetto, unit testing is
469relatively light in trace processor.
470
471We have discovered over time that unit tests are generally too brittle
472when dealing with code which parses traces leading to painful, mechanical
473changes being needed when refactorings happen.
474
475Because of this, we choose to focus on diff tests for most areas (e.g.
476parsing events, testing schema of tables, testing metrics etc.) and only
477use unit testing for the low-level building blocks on which the rest of
478trace processor is built.
479
480### Diff tests
481Diff tests are essentially integration tests for trace processor and the
482main way trace processor is tested.
483
484Each diff test takes as input a) a trace file b) a query file *or* a metric
485name. It runs `trace_processor_shell` to parse the trace and then executes
486the query/metric. The result is then compared to a 'golden' file and any
487difference is highlighted.
488
489All diff tests are organized under [test/trace_processor](/test/trace_processor)
490in `tests{_category name}.py` files as methods of a class in each file
491and are run by the script
492[`tools/diff_test_trace_processor.py`](/tools/diff_test_trace_processor.py).
493To add a new test its enough to add a new method starting with `test_` in suitable
494python tests file.
495
496Methods can't take arguments and have to return `DiffTestBlueprint`:
497```python
498class DiffTestBlueprint:
499  trace: Union[Path, Json, Systrace, TextProto]
500  query: Union[str, Path, Metric]
501  out: Union[Path, Json, Csv, TextProto]
502```
503*Trace* and *Out*: For every type apart from `Path`, contents of the object will be treated as
504file contents so it has to follow the same rules.
505
506*Query*: For metric tests it is enough to provide the metric name. For query tests there
507can be a raw SQL statement, for example `"SELECT * FROM SLICE"` or path to an `.sql` file.
508
509
510
511NOTE: `trace_processor_shell` and associated proto descriptors needs to be
512built before running `tools/diff_test_trace_processor.py`. The easiest way
513to do this is to run `tools/ninja -C <out directory>` both initially and on
514every change to trace processor code or builtin metrics.
515
516#### Choosing where to add diff tests
517`diff_tests/` folder contains four directories corresponding to different
518areas of trace processor.
5191. __stdlib__: Tests focusing on testing Perfetto Standard Library, both
520   prelude and the regular modules. The subdirectories in this folder
521   should generally correspond to directories in `perfetto_sql/stdlib`.
5222. __parser__: Tests focusing on ensuring that different trace files are
523   parsed correctly and the corresponding built-in tables are populated.
5243. __metrics__: Tests focusing on testing metrics located in
525   `trace_processor/metrics/sql`. This organisation is mostly historical
526   and code (and corresponding tests) is expected to move to `stdlib` over time.
5274. __syntax__: Tests focusing on testing the core syntax of PerfettoSQL
528   (i.e. `CREATE PERFETTO TABLE` or `CREATE PERFETTO FUNCTION`).
529
530__Scenario__: A new stdlib module `foo/bar.sql` is being added.
531
532_Answer_: Add the test to the `stdlib/foo/bar_tests.py` file.
533
534__Scenario__: A new event is being parsed, the focus of the test is to ensure
535the event is being parsed correctly.
536
537_Answer_: Add the test in one of the `parser` subdirectories. Prefer adding a
538test to an existing related directory (i.e. `sched`, `power`) if one exists.
539
540__Scenario__: A new metric is being added and the focus of the test is to
541ensure the metric is being correctly computed.
542
543_Answer_: Add the test in one of the `metrics` subdirectories. Prefer adding a
544test to an existing related directory if one exists. Also consider adding the
545code in question to stdlib.
546
547__Scenario__: A new dynamic table is being added and the focus of the test is to
548ensure the dynamic table is being correctly computed...
549
550_Answer_: Add the test to the `stdlib/dynamic_tables` folder
551
552__Scenario__: The interals of trace processor are being modified and the test
553is to ensure the trace processor is correctly filtering/sorting important
554built-in tables.
555
556_Answer_: Add the test to the `parser/core_tables` folder.
557