• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Tracing 101
2*This page provides a birds-eye view of performance analysis.
3The aim is to orient people who have no idea what "tracing" is.*
4
5## Introduction to...
6### Performance
7Performance analysis is concerned with making software run *better*.
8The definition of *better* varies widely and depends on the situation.
9Examples include:
10* performing the same work using fewer resources (CPU, memory,
11  network, battery, etc.)
12* increasing utilization of available resources
13* identifying and eliminating unnecessary work altogether
14
15Much of the difficulty in improving performance comes from
16identifying the root cause of performance issues. Modern software systems are
17complicated, having a lot of components and a web of cross-interactions.
18Techniques which help engineers understand the execution of a system
19and pinpoint issues that are critical.
20
21**Tracing** and **profiling** are two such widely-used techniques for
22performance analysis. **Perfetto** is an open-source suite of tools, combining
23tracing and profiling to give users powerful insights into their system.
24
25### Tracing
26**Tracing** involves collecting highly detailed data about the execution
27of a system. A single continuous session of recording is called a trace file
28or **trace** for short.
29
30Traces contain enough detail to fully reconstruct the timeline of events.
31They often include low-level kernel events like scheduler context switches,
32thread wakeups, syscalls, etc. With the "right" trace, reproduction of a
33performance bug is not needed as the trace provides all necessary context.
34
35Application code is also **instrumented** in areas of the program which are
36considered to be *important*. This instrumentation keeps track of what the
37program was doing over time (e.g. which functions were being run, or how long
38each call took) and context about the execution (e.g. what were the parameters
39to a function call, or why was a function run).
40
41The level of detail in traces makes it impractical to read traces directly
42like a log file in all but the simplest cases. Instead, a combination of
43**trace analysis** libraries and **trace viewers** are used. Trace analysis
44libraries provide a way for users to extract and summarize trace events in
45a programmatic manner. Trace viewers visualize the events in a trace on a
46timeline which give users a graphical view of what their system was doing
47over time.
48
49#### Logging vs tracing
50A good intuition is that logging is to functional testing what
51tracing is to performance analysis. Tracing is, in a sense, "structured"
52logging: instead of having arbitrary strings emitted from parts of the system,
53tracing reflects the detailed state of a system in a structured way to allow
54reconstruction of the timeline of events.
55
56Moreover, tracing frameworks (like Perfetto) place heavy emphasis
57on having minimal overhead. This is essential so that the framework
58does not significantly disrupt whatever is being measured: modern frameworks
59are fast enough that they can measure execution at the nanosecond level
60without significantly impacting the execution speed of the program.
61
62*Small aside: theoretically, tracing frameworks are powerful enough to act as
63a logging system as well. However, the utilization of each in practice is
64different enough that the two tend to be separate.*
65
66#### Metrics vs tracing
67Metrics are numerical values which track the performance of a system over time.
68Usually metrics map to high-level concepts. Examples of metrics include: CPU
69usage, memory usage, network bandwidth, etc. Metrics are collected directly from
70the app or operating system while the program is running.
71
72After glimpsing the power of tracing, a natural question arises: why bother
73with high level metrics at all? Why not instead just use use tracing and
74compute metrics on resulting traces? In some settings, this may indeed be the
75right approach. In local and lab situations using **trace-based metrics**,
76where metrics are computed from traces instead of collecting them directly,
77is a powerful approach. If a metric regresses, it's easy to open the trace
78to root cause why that happened.
79
80However, trace-based metrics are not a universal solution. When running in
81production, the heavyweight nature of traces can make it impractical to collect
82them 24/7. Computing a metric with a trace can take megabytes of data vs bytes
83for direct metric collection.
84
85Using metrics is the right choice when you want to understand the performance
86of a system over time but do not want to or can not pay the cost of collecting
87traces. In these situations, traces should be used as a **root-causing** tool.
88When your metrics show there is a problem, targeted tracing can be rolled out
89to understand why the regression may have happened.
90
91### Profiling
92**Profiling** involves sampling some usage of a resource by
93a program. A single continuous session of recording is known as a **profile**.
94
95Each sample collects the function callstack (i.e. the line of code along with
96all calling functions). Generally this information is aggregated across the
97profile. For each seen callstack, the aggregation gives the percentage of usage
98of the resource by that callstack. By far the most common types of profiling are
99**memory profiling** and **CPU profiling**.
100
101Memory profiling is used to understand which parts of a program are allocating
102memory on the heap. The profiler generally hooks into `malloc` (and `free`)
103calls of a native (C/C++/Rust/etc.) program to sample the callstacks
104calling `malloc`. Information about how many bytes were allocated is also
105retained. CPU profiling is used for understanding where the program is
106spending CPU time. The profiler captures the callstack running on a CPU
107over time. Generally this is done periodically (e.g. every 50ms), but can be
108also be done when certain events happen in the operating system.
109
110#### Profiling vs tracing
111There are two main questions for comparing profiling and tracing:
1121. Why profile my program statistically when I can just trace *everything*?
1132. Why use tracing to reconstruct the timeline of events when profiling gives me
114   the exact line of code using the most resources?
115
116##### When to use profiling over tracing
117Traces cannot feasibly capture execution of extreme high frequency
118events e.g. every function call. Profiling tools fill this niche: by
119sampling, they can significantly cut down on how much information they store.
120The statistical nature of profilers are rarely a problem; the sampling
121algorithms for profilers are specifically designed to capture data which is
122highly representative of the real resource use.
123
124*Aside: a handful of very specialized tracing tools exist which
125can capture every function call (e.g.
126[magic-trace](https://github.com/janestreet/magic-trace)) but they output
127*gigabytes* of data every second which make them impractical for anything
128beyond investigating tiny snippets of code. They also generally have higher
129overhead than general purpose tracing tools.*
130
131##### When to use tracing over profiling
132While profilers give callstacks where resources are being used, they lack
133information about *why* that happened. For example, why was malloc being called
134by function *foo()* so many times? All they say is *foo()* allocated X bytes
135over Y calls to `malloc`. Traces are excellent at providing this exact context:
136application instrumentation and low-level kernel events together provide
137deep insight into why code was run in the first place.
138
139NOTE: Perfetto supports collecting, analyzing and visualizing both profiles
140and traces at the same time so you can have the best of both worlds!
141
142## Perfetto
143Perfetto is a suite of tools for performance analysis of software. Its purpose
144is to empower engineers to understand where resources are being used by their
145systems. It helps identify the changes they can make to improve performance
146and verify the impact of those changes.
147
148NOTE: In Perfetto, since profiles and traces can be collected simultaneously,
149we call everything a "trace" even if it may contain (only) profiling data
150inside.
151
152### Recording traces
153Perfetto is highly configurable when it comes to recording traces. There are
154literally hundreds of knobs which can be tweaked to control what data is
155collected, how it should be collected, how much information a trace should
156contain etc.
157
158[Record traces on Linux quickstart](/docs/quickstart/linux-tracing.md) is
159a good place to start if you're unfamiliar with Perfetto. For Android
160developers,
161[Record traces on Android quickstart](/docs/quickstart/android-tracing.md) will
162be more applicable. The [trace configuration](/docs/concepts/config.md) page
163is also useful to consult as a reference.
164
165The following sub-sections give an overview of various points worth considering
166when recording Perfetto traces.
167
168#### Kernel tracing
169Perfetto integrates closely with the Linux kernel's
170[ftrace](https://www.kernel.org/doc/Documentation/trace/ftrace.txt) tracing
171system to record kernel events (e.g. scheduling, syscalls, wakeups). The
172[scheduling](/docs/data-sources/cpu-scheduling.md),
173[syscall](/docs/data-sources/syscalls.md) and
174[CPU frequency](/docs/data-sources/cpu-freq.md) data source pages give
175examples of configuring ftrace collection.
176
177Natively supported ftrace events can be found in the fields of
178[this proto message](/docs/reference/trace-packet-proto.autogen#FtraceEvent).
179Perfetto also supports collecting ftrace events it does not natively understand
180(i.e. it does not have a protobuf message for) as a
181["generic"](/docs/reference/trace-packet-proto.autogen#GenericFtraceEvent)
182events. These events are encoded as key-value pairs, similar to a JSON
183dictionary.
184
185It is strongly discouraged to rely on generic events for production use cases:
186the inefficient encoding causes trace size bloat and the
187[trace processor](/docs/analysis/trace-processor.md) cannot parse them
188meaningfully. Instead, support should be added for parsing important ftrace
189events to Perfetto:
190[here](/docs/contributing/common-tasks.md#add-a-new-ftrace-event) is a simple
191set of steps to follow which are found.
192
193#### Instrumentation with Perfetto SDK
194Perfetto has a [C++ SDK](https://perfetto.dev/docs/instrumentation/tracing-sdk)
195which can be used to instrument programs to emit tracing events. The SDK is
196designed to be very low-overhead and is distributed in an "amalgamated" form
197of a one `.cc` and one `.h` file, making it easy to integrate in any build
198system.
199
200A C SDK is under active development and should be available for general
201usage by Q2 2023. See [this doc](https://bit.ly/perfetto-c) for details (note
202viewing this doc requires being a member of
203[this group](https://groups.google.com/forum/#!forum/perfetto-dev))
204
205A Java/Kotlin SDK for Android (as a
206[JetPack library](https://developer.android.com/jetpack/androidx)).
207This is under development but there is no set timescale for when an official
208release will happen.
209
210##### android.os.Trace (atrace) vs Perfetto SDK
211NOTE: This section is only relevant for Android platform developers or Android
212app developers with tracing experience. Other readers can safely skip this
213section.
214
215Perfetto has significant advantages over atrace. Some of the biggest advantages
216include:
217* performance: tracing to Perfetto from system/app code requires just a memory
218  write which is far faster than the syscall latency imposed by atrace. This
219  generally makes Perfetto anywhere from 3-4x faster than atrace
220* features: atrace's API is extremely limited, lacking support for debug
221  arguments, custom clocks, flow events. Perfetto has a far richer API allowing
222  natural representation of data-flow.
223* trace size: Perfetto supports various features (delta encoded timestamps,
224  interned strings, protobuf encoding) which vastly reduce to size of trace
225  files.
226
227Unfortunately, there are also some downsides:
228* dedicated thread: a thread dedicated to Perfetto is necessary for every
229  process which wants to trace to Perfetto.
230* wakeups on tracing start: currently, when tracing starts, every process
231  registered for tracing is woken up which significantly limits how many
232  processes can be traced. This limitation should be removed in coming quarters.
233
234For now, the recommendation from the Perfetto team is to continue utilizing
235atrace for most usecases: if you think you have a usecase which would benefit
236from the SDK, please reach out to the team directly. By mid-2023, significant
237progress should be made addressing the limitations of the current SDK allowing
238more widespread adoption of the SDK.
239
240<!--
241TODO(lalitm): write the remainder of the doc using the following template
242
243#### Native heap profiling
244
245#### Java heap graphs
246
247#### Callstack sampling
248
249
250#### Flight recorder tracing
251TODO(lalitm): write this.
252
253##### Field tracing
254TODO(lalitm): write this.
255
256#### Clock sync
257TODO(lalitm): write this.
258
259
260#### Analysis
261TODO(lalitm): write this.
262* Trace processing
263* UI
264* httpd mode
265* metrics
266* Python
267
268
269The remainder of this
270page will focus on the applications of Perfetto to solve various performance
271related problems.
272
273## Solving problems with Perfetto
274TODO(lalitm): write this.
275* When to look into callstack sampling
276* When to use memory profiling
277* When to look at scheduling latency
278
279
280TODO(lalitm): write this.
281
282-->