1# Trace Processor (C++) 2 3_The Trace Processor is a C++ library 4([/src/trace_processor](/src/trace_processor)) that ingests traces encoded in a 5wide variety of formats and exposes an SQL interface for querying trace events 6contained in a consistent set of tables. It also has other features including 7computation of summary metrics, annotating the trace with user-friendly 8descriptions and deriving new events from the contents of the trace._ 9 10 11 12## Quickstart 13 14The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview 15on how to run SQL queries against traces using trace processor. 16 17## Introduction 18 19Events in a trace are optimized for fast, low-overhead recording. Therefore 20traces need significant data processing to extract meaningful information from 21them. This is compounded by the number of legacy formats which are still in use and 22need to be supported in trace analysis tools. 23 24The trace processor abstracts this complexity by parsing traces, extracting the 25data inside, and exposing it in a set of database tables which can be queried 26with SQL. 27 28Features of the trace processor include: 29 30* Execution of SQL queries on a custom, in-memory, columnar database backed by 31 the SQLite query engine. 32* Metrics subsystem which allows computation of summarized view of the trace 33 (e.g. CPU or memory usage of a process, time taken for app startup etc.). 34* Annotating events in the trace with user-friendly descriptions, providing 35 context and explanation of events to newer users. 36* Creation of new events derived from the contents of the trace. 37 38The formats supported by trace processor include: 39 40* Perfetto native protobuf format 41* Linux ftrace 42* Android systrace 43* Chrome JSON (including JSON embedding Android systrace text) 44* Fuchsia binary format 45* [Ninja](https://ninja-build.org/) logs (the build system) 46 47The trace processor is embedded in a wide variety of trace analysis tools, including: 48 49* [trace_processor](/docs/analysis/trace-processor.md), a standalone binary 50 providing a shell interface (and the reference embedder). 51* [Perfetto UI](https://ui.perfetto.dev), in the form of a WebAssembly module. 52* [Android GPU Inspector](https://gpuinspector.dev/). 53* [Android Studio](https://developer.android.com/studio/). 54 55## Concepts 56 57The trace processor has some foundational terminology and concepts which are 58used in the rest of documentation. 59 60### Events 61 62In the most general sense, a trace is simply a collection of timestamped 63"events". Events can have associated metadata and context which allows them to 64be interpreted and analyzed. Timestamps are in nanoseconds; the values 65themselves depend on the [clock][primary_trace_clock] selected in TraceConfig. 66 67Events form the foundation of trace processor and are one of two types: slices 68and counters. 69 70[primary_trace_clock]: https://cs.android.com/android/platform/superproject/main/+/main:external/perfetto/protos/perfetto/config/trace_config.proto;l=114;drc=c74c8cf69e20d7b3261fb8c5ab4d057e8badce3e 71 72#### Slices 73 74 75 76A slice refers to an interval of time with some data describing what was 77happening in that interval. Some example of slices include: 78 79* Scheduling slices for each CPU 80* Atrace slices on Android 81* Userspace slices from Chrome 82 83#### Counters 84 85 86 87A counter is a continuous value which varies over time. Some examples of 88counters include: 89 90* CPU frequency for each CPU core 91* RSS memory events - both from the kernel and polled from /proc/stats 92* atrace counter events from Android 93* Chrome counter events 94 95### Tracks 96 97A track is a named partition of events of the same type and the same associated 98context. For example: 99 100* Scheduling slices have one track for each CPU 101* Sync userspace slice have one track for each thread which emitted an event 102* Async userspace slices have one track for each “cookie” linking a set of async 103 events 104 105The most intuitive way to think of a track is to imagine how they would be drawn 106in a UI; if all the events are in a single row, they belong to the same track. 107For example, all the scheduling events for CPU 5 are on the same track: 108 109 110 111Tracks can be split into various types based on the type of event they contain 112and the context they are associated with. Examples include: 113 114* Global tracks are not associated to any context and contain slices 115* Thread tracks are associated to a single thread and contain slices 116* Counter tracks are not associated to any context and contain counters 117* CPU counter tracks are associated to a single CPU and contain counters 118 119### Thread and process identifiers 120 121The handling of threads and processes needs special care when considered in the 122context of tracing; identifiers for threads and processes (e.g. `pid`/`tgid` and 123`tid` in Android/macOS/Linux) can be reused by the operating system over the 124course of a trace. This means they cannot be relied upon as a unique identifier 125when querying tables in trace processor. 126 127To solve this problem, the trace processor uses `utid` (_unique_ tid) for 128threads and `upid` (_unique_ pid) for processes. All references to threads and 129processes (e.g. in CPU scheduling data, thread tracks) uses `utid` and `upid` 130instead of the system identifiers. 131 132## Writing Queries 133 134### Context using tracks 135 136A common question when querying tables in trace processor is: "how do I obtain 137the process or thread for a slice?". Phrased more generally, the question is 138"how do I get the context for an event?". 139 140In trace processor, any context associated with all events on a track is found 141on the associated `track` tables. 142 143For example, to obtain the `utid` of any thread which emitted a `measure` slice 144 145```sql 146SELECT utid 147FROM slice 148JOIN thread_track ON thread_track.id = slice.track_id 149WHERE slice.name = 'measure' 150``` 151 152Similarly, to obtain the `upid`s of any process which has a `mem.swap` counter 153greater than 1000 154 155```sql 156SELECT upid 157FROM counter 158JOIN process_counter_track ON process_counter_track.id = counter.track_id 159WHERE process_counter_track.name = 'mem.swap' AND value > 1000 160``` 161 162### Thread and process tables 163 164While obtaining `utid`s and `upid`s are a step in the right direction, generally 165users want the original `tid`, `pid`, and process/thread names. 166 167The `thread` and `process` tables map `utid`s and `upid`s to threads and 168processes respectively. For example, to lookup the thread with `utid` 10 169 170```sql 171SELECT tid, name 172FROM thread 173WHERE utid = 10 174``` 175 176The `thread` and `process` tables can also be joined with the associated track 177tables directly to jump directly from the slice or counter to the information 178about processes and threads. 179 180For example, to get a list of all the threads which emitted a `measure` slice 181 182```sql 183SELECT thread.name AS thread_name 184FROM slice 185JOIN thread_track ON slice.track_id = thread_track.id 186JOIN thread USING(utid) 187WHERE slice.name = 'measure' 188GROUP BY thread_name 189``` 190 191## Helper functions 192Helper functions are functions built into C++ which reduce the amount of 193boilerplate which needs to be written in SQL. 194 195### Extract args 196`EXTRACT_ARG` is a helper function which retrieves a property of an 197event (e.g. slice or counter) from the `args` table. 198 199It takes an `arg_set_id` and `key` as input and returns the value looked 200up in the `args` table. 201 202For example, to retrieve the `prev_comm` field for `sched_switch` events in 203the `ftrace_event` table. 204```sql 205SELECT EXTRACT_ARG(arg_set_id, 'prev_comm') 206FROM ftrace_event 207WHERE name = 'sched_switch' 208``` 209 210Behind the scenes, the above query would desugar to the following: 211```sql 212SELECT 213 ( 214 SELECT string_value 215 FROM args 216 WHERE key = 'prev_comm' AND args.arg_set_id = raw.arg_set_id 217 ) 218FROM ftrace_event 219WHERE name = 'sched_switch' 220``` 221 222NOTE: while convinient, `EXTRACT_ARG` can inefficient compared to a `JOIN` 223when working with very large tables; a function call is required for every 224row which will be slower than the batch filters/sorts used by `JOIN`. 225 226## Operator tables 227SQL queries are usually sufficient to retrieve data from trace processor. 228Sometimes though, certain constructs can be difficult to express pure SQL. 229 230In these situations, trace processor has special "operator tables" which solve 231a particular problem in C++ but expose an SQL interface for queries to take 232advantage of. 233 234### Span join 235Span join is a custom operator table which computes the intersection of 236spans of time from two tables or views. A span in this concept is a row in a 237table/view which contains a "ts" (timestamp) and "dur" (duration) columns. 238 239A column (called the *partition*) can optionally be specified which divides the 240rows from each table into partitions before computing the intersection. 241 242 243 244```sql 245-- Get all the scheduling slices 246CREATE VIEW sp_sched AS 247SELECT ts, dur, cpu, utid 248FROM sched; 249 250-- Get all the cpu frequency slices 251CREATE VIEW sp_frequency AS 252SELECT 253 ts, 254 lead(ts) OVER (PARTITION BY track_id ORDER BY ts) - ts as dur, 255 cpu, 256 value as freq 257FROM counter 258JOIN cpu_counter_track ON counter.track_id = cpu_counter_track.id 259WHERE cpu_counter_track.name = 'cpufreq'; 260 261-- Create the span joined table which combines cpu frequency with 262-- scheduling slices. 263CREATE VIRTUAL TABLE sched_with_frequency 264USING SPAN_JOIN(sp_sched PARTITIONED cpu, sp_frequency PARTITIONED cpu); 265 266-- This span joined table can be queried as normal and has the columns from both 267-- tables. 268SELECT ts, dur, cpu, utid, freq 269FROM sched_with_frequency; 270``` 271 272NOTE: A partition can be specified on neither, either or both tables. If 273specified on both, the same column name has to be specified on each table. 274 275WARNING: An important restriction on span joined tables is that spans from 276the same table in the same partition *cannot* overlap. For performance 277reasons, span join does not attempt to detect and error out in this situation; 278instead, incorrect rows will silently be produced. 279 280WARNING: Partitions mush be integers. Importantly, string partitions are *not* 281supported; note that strings *can* be converted to integers by 282applying the `HASH` function to the string column. 283 284Left and outer span joins are also supported; both function analogously to 285the left and outer joins from SQL. 286```sql 287-- Left table partitioned + right table unpartitioned. 288CREATE VIRTUAL TABLE left_join 289USING SPAN_LEFT_JOIN(table_a PARTITIONED a, table_b); 290 291-- Both tables unpartitioned. 292CREATE VIRTUAL TABLE outer_join 293USING SPAN_OUTER_JOIN(table_x, table_y); 294``` 295 296NOTE: there is a subtlety if the partitioned table is empty and is 297either a) part of an outer join b) on the right side of a left join. 298In this case, *no* slices will be emitted even if the other table is 299non-empty. This approach was decided as being the most natural 300after considering how span joins are used in practice. 301 302### Ancestor slice 303ancestor_slice is a custom operator table that takes a 304[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and computes 305all slices on the same track that are direct parents above that id (i.e. given 306a slice id it will return as rows all slices that can be found by following 307the parent_id column to the top slice (depth = 0)). 308 309The returned format is the same as the 310[slice table](/docs/analysis/sql-tables.autogen#slice) 311 312For example, the following finds the top level slice given a bunch of slices of 313interest. 314 315```sql 316CREATE VIEW interesting_slices AS 317SELECT id, ts, dur, track_id 318FROM slice WHERE name LIKE "%interesting slice name%"; 319 320SELECT 321 * 322FROM 323 interesting_slices LEFT JOIN 324 ancestor_slice(interesting_slices.id) AS ancestor ON ancestor.depth = 0 325``` 326 327### Ancestor slice by stack 328ancestor_slice_by_stack is a custom operator table that takes a 329[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and 330finds all slice ids with that stack_id, then, for each id it computes 331all the ancestor slices similarly to 332[ancestor_slice](/docs/analysis/trace-processor#ancestor-slice). 333 334The returned format is the same as the 335[slice table](/docs/analysis/sql-tables.autogen#slice) 336 337For example, the following finds the top level slice of all slices with the 338given name. 339 340```sql 341CREATE VIEW interesting_stack_ids AS 342SELECT stack_id 343FROM slice WHERE name LIKE "%interesting slice name%"; 344 345SELECT 346 * 347FROM 348 interesting_stack_ids LEFT JOIN 349 ancestor_slice_by_stack(interesting_stack_ids.stack_id) AS ancestor 350 ON ancestor.depth = 0 351``` 352 353### Descendant slice 354descendant_slice is a custom operator table that takes a 355[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and 356computes all slices on the same track that are nested under that id (i.e. 357all slices that are on the same track at the same time frame with a depth 358greater than the given slice's depth. 359 360The returned format is the same as the 361[slice table](/docs/analysis/sql-tables.autogen#slice) 362 363For example, the following finds the number of slices under each slice of 364interest. 365 366```sql 367CREATE VIEW interesting_slices AS 368SELECT id, ts, dur, track_id 369FROM slice WHERE name LIKE "%interesting slice name%"; 370 371SELECT 372 * 373 ( 374 SELECT 375 COUNT(*) AS total_descendants 376 FROM descendant_slice(interesting_slice.id) 377 ) 378FROM interesting_slices 379``` 380 381### Descendant slice by stack 382descendant_slice_by_stack is a custom operator table that takes a 383[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and 384finds all slice ids with that stack_id, then, for each id it computes 385all the descendant slices similarly to 386[descendant_slice](/docs/analysis/trace-processor#descendant-slice). 387 388The returned format is the same as the 389[slice table](/docs/analysis/sql-tables.autogen#slice) 390 391For example, the following finds the next level descendant of all slices with 392the given name. 393 394```sql 395CREATE VIEW interesting_stacks AS 396SELECT stack_id, depth 397FROM slice WHERE name LIKE "%interesting slice name%"; 398 399SELECT 400 * 401FROM 402 interesting_stacks LEFT JOIN 403 descendant_slice_by_stack(interesting_stacks.stack_id) AS descendant 404 ON descendant.depth = interesting_stacks.depth + 1 405``` 406 407### Connected/Following/Preceding flows 408 409DIRECTLY_CONNECTED_FLOW, FOLLOWING_FLOW and PRECEDING_FLOW are custom operator 410tables that take a 411[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and collect 412all entries of [flow table](/docs/analysis/sql-tables.autogen#flow), that are 413directly or indirectly connected to the given starting slice. 414 415`DIRECTLY_CONNECTED_FLOW(start_slice_id)` - contains all entries of 416[flow table](/docs/analysis/sql-tables.autogen#flow) that are present in any 417chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where 418`flow[i].slice_out = flow[i+1].slice_in` and `flow[0].slice_out = start_slice_id 419OR start_slice_id = flow[n].slice_in`. 420 421NOTE: Unlike the following/preceding flow functions, this function will not 422include flows connected to ancestors or descendants while searching for flows 423from a slice. It only includes the slices in the directly connected chain. 424 425`FOLLOWING_FLOW(start_slice_id)` - contains all flows which can be reached from 426a given slice via recursively following from flow's outgoing slice to its 427incoming one and from a reached slice to its child. The return table contains 428all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are 429present in any chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where 430`flow[i+1].slice_out IN DESCENDANT_SLICE(flow[i].slice_in) OR 431flow[i+1].slice_out = flow[i].slice_in` and `flow[0].slice_out IN 432DESCENDANT_SLICE(start_slice_id) OR flow[0].slice_out = start_slice_id`. 433 434`PRECEDING_FLOW(start_slice_id)` - contains all flows which can be reached from 435a given slice via recursively following from flow's incoming slice to its 436outgoing one and from a reached slice to its parent. The return table contains 437all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are 438present in any chain of kind: `flow[n] -> flow[n-1] -> ... -> flow[0]`, where 439`flow[i].slice_in IN ANCESTOR_SLICE(flow[i+1].slice_out) OR flow[i].slice_in = 440flow[i+1].slice_out` and `flow[0].slice_in IN ANCESTOR_SLICE(start_slice_id) OR 441flow[0].slice_in = start_slice_id`. 442 443```sql 444--number of following flows for each slice 445SELECT (SELECT COUNT(*) FROM FOLLOWING_FLOW(slice_id)) as following FROM slice; 446``` 447 448## Metrics 449 450TIP: To see how to add to add a new metric to trace processor, see the checklist 451[here](/docs/contributing/common-tasks.md#new-metric). 452 453The metrics subsystem is a significant part of trace processor and thus is 454documented on its own [page](/docs/analysis/metrics.md). 455 456## Python API 457The trace processor's C++ library is also exposed through Python. This 458is documented on a [separate page](/docs/analysis/trace-processor-python.md). 459 460## Testing 461 462Trace processor is mainly tested in two ways: 4631. Unit tests of low-level building blocks 4642. "Diff" tests which parse traces and check the output of queries 465 466### Unit tests 467Unit testing trace processor is the same as in other parts of Perfetto and 468other C++ projects. However, unlike the rest of Perfetto, unit testing is 469relatively light in trace processor. 470 471We have discovered over time that unit tests are generally too brittle 472when dealing with code which parses traces leading to painful, mechanical 473changes being needed when refactorings happen. 474 475Because of this, we choose to focus on diff tests for most areas (e.g. 476parsing events, testing schema of tables, testing metrics etc.) and only 477use unit testing for the low-level building blocks on which the rest of 478trace processor is built. 479 480### Diff tests 481Diff tests are essentially integration tests for trace processor and the 482main way trace processor is tested. 483 484Each diff test takes as input a) a trace file b) a query file *or* a metric 485name. It runs `trace_processor_shell` to parse the trace and then executes 486the query/metric. The result is then compared to a 'golden' file and any 487difference is highlighted. 488 489All diff tests are organized under [test/trace_processor](/test/trace_processor) 490in `tests{_category name}.py` files as methods of a class in each file 491and are run by the script 492[`tools/diff_test_trace_processor.py`](/tools/diff_test_trace_processor.py). 493To add a new test its enough to add a new method starting with `test_` in suitable 494python tests file. 495 496Methods can't take arguments and have to return `DiffTestBlueprint`: 497```python 498class DiffTestBlueprint: 499 trace: Union[Path, Json, Systrace, TextProto] 500 query: Union[str, Path, Metric] 501 out: Union[Path, Json, Csv, TextProto] 502``` 503*Trace* and *Out*: For every type apart from `Path`, contents of the object will be treated as 504file contents so it has to follow the same rules. 505 506*Query*: For metric tests it is enough to provide the metric name. For query tests there 507can be a raw SQL statement, for example `"SELECT * FROM SLICE"` or path to an `.sql` file. 508 509 510 511NOTE: `trace_processor_shell` and associated proto descriptors needs to be 512built before running `tools/diff_test_trace_processor.py`. The easiest way 513to do this is to run `tools/ninja -C <out directory>` both initially and on 514every change to trace processor code or builtin metrics. 515 516#### Choosing where to add diff tests 517`diff_tests/` folder contains four directories corresponding to different 518areas of trace processor. 5191. __stdlib__: Tests focusing on testing Perfetto Standard Library, both 520 prelude and the regular modules. The subdirectories in this folder 521 should generally correspond to directories in `perfetto_sql/stdlib`. 5222. __parser__: Tests focusing on ensuring that different trace files are 523 parsed correctly and the corresponding built-in tables are populated. 5243. __metrics__: Tests focusing on testing metrics located in 525 `trace_processor/metrics/sql`. This organisation is mostly historical 526 and code (and corresponding tests) is expected to move to `stdlib` over time. 5274. __syntax__: Tests focusing on testing the core syntax of PerfettoSQL 528 (i.e. `CREATE PERFETTO TABLE` or `CREATE PERFETTO FUNCTION`). 529 530__Scenario__: A new stdlib module `foo/bar.sql` is being added. 531 532_Answer_: Add the test to the `stdlib/foo/bar_tests.py` file. 533 534__Scenario__: A new event is being parsed, the focus of the test is to ensure 535the event is being parsed correctly. 536 537_Answer_: Add the test in one of the `parser` subdirectories. Prefer adding a 538test to an existing related directory (i.e. `sched`, `power`) if one exists. 539 540__Scenario__: A new metric is being added and the focus of the test is to 541ensure the metric is being correctly computed. 542 543_Answer_: Add the test in one of the `metrics` subdirectories. Prefer adding a 544test to an existing related directory if one exists. Also consider adding the 545code in question to stdlib. 546 547__Scenario__: A new dynamic table is being added and the focus of the test is to 548ensure the dynamic table is being correctly computed... 549 550_Answer_: Add the test to the `stdlib/dynamic_tables` folder 551 552__Scenario__: The interals of trace processor are being modified and the test 553is to ensure the trace processor is correctly filtering/sorting important 554built-in tables. 555 556_Answer_: Add the test to the `parser/core_tables` folder. 557