1# Tracing API and ABI: surfaces and stability 2 3This document describes the API and ABI surface of the 4[Perfetto Client Library][cli_lib], what can be expected to be stable long-term 5and what not. 6 7#### In summary 8 9* The public C++ API in `include/perfetto/tracing/` is mostly stable but can 10 occasionally break at compile-time throughout 2020. 11* The C++ API within `include/perfetto/ext/` is internal-only and exposed only 12 for Chromium. 13* The tracing protocol ABI is based on protobuf-over-UNIX-socket and shared 14 memory. It is long-term stable and maintains compatibility in both directions 15 (old service + newer client and vice-versa). 16* The [DataSourceDescriptor][data_source_descriptor.proto], 17 [DataSourceConfig][data_source_config.proto] and 18 [TracePacket][trace-packet-ref] protos are updated maintaining backwards 19 compatibility unless a message is marked as experimental. Trace Processor 20 deals with importing older trace formats. 21* There isn't a version number neither in the trace file nor in the tracing 22 protocol and there will never be one. Feature flags are used when necessary. 23 24## C++ API 25 26The Client Library C++ API allows an app to contribute to the trace with custom 27trace events. Its headers live under [`include/perfetto/`](/include/perfetto). 28 29There are three different tiers of this API, offering increasingly higher 30expressive power, at the cost of increased complexity. The three tiers are built 31on top of each other. (Googlers, for more details see also 32[go/perfetto-client-api](http://go/perfetto-client-api)). 33 34![C++ API](/docs/images/api-and-abi.png) 35 36### Track Event (public) 37 38This mainly consists of the `TRACE_EVENT*` macros defined in 39[`track_event.h`](/include/perfetto/tracing/track_event.h). 40Those macros provide apps with a quick and easy way to add common types of 41instrumentation points (slices, counters, instant events). 42For details and instructions see the [Client Library doc][cli_lib]. 43 44### Custom Data Sources (public) 45 46This consists of the `perfetto::DataSource` base class and the 47`perfetto::Tracing` controller class defined in 48[`tracing.h`](/include/perfetto/tracing.h). 49These classes allow an app to create custom data sources which can get 50notifications about tracing sessions lifecycle and emit custom protos in the 51trace (e.g. memory snapshots, compositor layers, etc). 52 53For details and instructions see the [Client Library doc][cli_lib]. 54 55Both the Track Event API and the custom data source are meant to be a public 56API. 57 58WARNING: The team is still iterating on this API surface. While we try to avoid 59 deliberate breakages, some occasional compile-time breakages might be 60 encountered when updating the library. The interface is expected to 61 stabilize by the end of 2020. 62 63### Producer / Consumer API (internal) 64 65This consists of all the interfaces defined in the 66[`include/perfetto/ext`](/include/perfetto/ext) directory. These provide access 67to the lowest levels of the Perfetto internals (manually registering producers 68and data sources, handling all IPCs). 69 70These interfaces will always be highly unstable. We highly discourage 71any project from depending on this API because it is too complex and extremely 72hard to get right. 73This API surface exists only for the Chromium project, which has unique 74challenges (e.g., its own IPC system, complex sandboxing models) and has dozens 75of subtle use cases accumulated through over ten years of legacy of 76chrome://tracing. The team is continuously reshaping this surface to gradually 77migrate all Chrome Tracing use cases over to Perfetto. 78 79## Tracing Protocol ABI 80 81The Tracing Protocol ABI consists of the following binary interfaces that allow 82various processes in the operating system to contribute to tracing sessions and 83inject tracing data into the tracing service: 84 85 * [Socket protocol](#socket-protocol) 86 * [Shared memory layout](#shmem-abi) 87 * [Protobuf messages](#protos) 88 89The whole tracing protocol ABI is binary stable across platforms and is updated 90maintaining both backwards and forward compatibility. No breaking changes 91have been introduced since its first revision in Android 9 (Pie, 2018). 92See also the [ABI Stability](#abi-stability) section below. 93 94![Tracing protocol](/docs/images/tracing-protocol.png) 95 96### {#socket-protocol} Socket protocol 97 98At the lowest level, the tracing protocol is initiated with a UNIX socket of 99type `SOCK_STREAM` to the tracing service. 100The tracing service listens on two distinct sockets: producer and consumer. 101 102![Socket protocol](/docs/images/socket-protocol.png) 103 104Both sockets use the same wire protocol, the `IPCFrame` message defined in 105[wire_protocol.proto](/protos/perfetto/ipc/wire_protocol.proto). The wire 106protocol is simply based on a sequence of length-prefixed messages of the form: 107``` 108< 4 bytes len little-endian > < proto-encoded IPCFrame > 109 11004 00 00 00 A0 A1 A2 A3 05 00 00 00 B0 B1 B2 B3 B4 ... 111{ len: 4 } [ Frame 1 ] { len: 5 } [ Frame 2 ] 112``` 113 114The `IPCFrame` proto message defines a request/response protocol that is 115compatible with the [protobuf services syntax][proto_rpc]. `IPCFrame` defines 116the following frame types: 117 1181. `BindService {producer, consumer} -> service`<br> 119 Binds to one of the two service ports (either `producer_port` or 120 `consumer_port`). 121 1222. `BindServiceReply service -> {producer, consumer}`<br> 123 Replies to the bind request, listing all the RPC methods available, together 124 with their method ID. 125 1263. `InvokeMethod {producer, consumer} -> service`<br> 127 Invokes a RPC method, identified by the ID returned by `BindServiceReply`. 128 The invocation takes as unique argument a proto sub-message. Each method 129 defines a pair of _request_ and _response_ method types.<br> 130 For instance the `RegisterDataSource` defined in [producer_port.proto] takes 131 a `perfetto.protos.RegisterDataSourceRequest` and returns a 132 `perfetto.protos.RegisterDataSourceResponse`. 133 1344. `InvokeMethodReply service -> {producer, consumer}`<br> 135 Returns the result of the corresponding invocation or an error flag. 136 If a method return signature is marked as `stream` (e.g. 137 `returns (stream GetAsyncCommandResponse)`), the method invocation can be 138 followed by more than one `InvokeMethodReply`, all with the same 139 `request_id`. All replies in the stream but the last one will have 140 `has_more: true`, to notify the client more responses for the same invocation 141 will follow. 142 143Here is how the traffic over the IPC socket looks like: 144 145``` 146# [Prd > Svc] Bind request for the remote service named "producer_port" 147request_id: 1 148msg_bind_service { service_name: "producer_port" } 149 150# [Svc > Prd] Service reply. 151request_id: 1 152msg_bind_service_reply: { 153 success: true 154 service_id: 42 155 methods: {id: 2; name: "InitializeConnection" } 156 methods: {id: 5; name: "RegisterDataSource" } 157 methods: {id: 3; name: "UnregisterDataSource" } 158 ... 159} 160 161# [Prd > Svc] Method invocation (RegisterDataSource) 162request_id: 2 163msg_invoke_method: { 164 service_id: 42 # "producer_port" 165 method_id: 5 # "RegisterDataSource" 166 167 # Proto-encoded bytes for the RegisterDataSourceRequest message. 168 args_proto: [XX XX XX XX] 169} 170 171# [Svc > Prd] Result of RegisterDataSource method invocation. 172request_id: 2 173msg_invoke_method_reply: { 174 success: true 175 has_more: false # EOF for this request 176 177 # Proto-encoded bytes for the RegisterDataSourceResponse message. 178 reply_proto: [XX XX XX XX] 179} 180``` 181 182#### Producer socket 183 184The producer socket exposes the RPC interface defined in [producer_port.proto]. 185It allows processes to advertise data sources and their capabilities, receive 186notifications about the tracing session lifecycle (trace being started, stopped) 187and signal trace data commits and flush requests. 188 189This socket is also used by the producer and the service to exchange a 190tmpfs file descriptor during initialization for setting up the 191[shared memory buffer](/docs/concepts/buffers.md) where tracing data will be 192written (asynchronously). 193 194On Android this socket is linked at `/dev/socket/traced_producer`. On all 195platforms it is overridable via the `PERFETTO_PRODUCER_SOCK_NAME` env var. 196 197On Android all apps and most system processes can connect to it 198(see [`perfetto_producer` in SELinux policies][selinux_producer]). 199 200In the Perfetto codebase, the [`traced_probes`](/src/traced/probes/) and 201[`heapprofd`](/src/profiling/memory) processes use the producer socket for 202injecting system-wide tracing / profiling data. 203 204#### Consumer socket 205 206The consumer socket exposes the RPC interface defined in [consumer_port.proto]. 207The consumer socket allows processes to control tracing sessions (start / stop 208tracing) and read back trace data. 209 210On Android this socket is linked at `/dev/socket/traced_consumer`. On all 211platforms it is overridable via the `PERFETTO_CONSUMER_SOCK_NAME` env var. 212 213Trace data contains sensitive information that discloses the activity the 214system (e.g., which processes / threads are running) and can allow side-channel 215attacks. For this reason the consumer socket is intended to be exposed only to 216a few privileged processes. 217 218On Android, only the `adb shell` domain (used by various UI tools like 219[Perfetto UI](https://ui.perfetto.dev/), 220[Android Studio](https://developer.android.com/studio) or the 221[Android GPU Inspector](https://github.com/google/agi)) 222and few other trusted system services are allowed to access the consumer socket 223(see [traced_consumer in SELinux][selinux_consumer]). 224 225In the Perfetto codebase, the [`perfetto`](/docs/reference/perfetto-cli) 226binary (`/system/bin/perfetto` on Android) provides a consumer implementation 227and exposes it through a command line interface. 228 229#### Socket protocol FAQs 230 231_Why SOCK_STREAM and not DGRAM/SEQPACKET?_ 232 2331. To allow direct passthrough of the consumer socket on Android through 234 `adb forward localabstract` and allow host tools to directly talk to the 235 on-device tracing service. Today both the Perfetto UI and Android GPU 236 Inspector do this. 2372. To allow in future to directly control a remote service over TCP or SSH 238 tunneling. 2393. Because the socket buffer for `SOCK_DGRAM` is extremely limited and 240 and `SOCK_SEQPACKET` is not supported on MacOS. 241 242_Why not gRPC?_ 243 244The team evaluated gRPC in late 2017 as an alternative but ruled it out 245due to: (i) binary size and memory footprint; (ii) the complexity and overhead 246of running a full HTTP/2 stack over a UNIX socket; (iii) the lack of 247fine-grained control on back-pressure. 248 249_Is the UNIX socket protocol used within Chrome processes?_ 250 251No. Within Chrome processes (the browser app, not CrOS) Perfetto doesn't use 252any doesn't use any unix socket. Instead it uses the functionally equivalent 253Mojo endpoints [`Producer{Client,Host}` and `Consumer{Client,Host}`][mojom]. 254 255### Shared memory 256 257This section describes the binary interface of the memory buffer shared between 258a producer process and the tracing service (SMB). 259 260The SMB is a staging area to decouple data sources living in the Producer 261and allow them to do non-blocking async writes. A SMB is small-ish, typically 262hundreds of KB. Its size is configurable by the producer when connecting. 263For more architectural details about the SMB see also the 264[buffers and dataflow doc](/docs/concepts/buffers.md) and the 265[shared_memory_abi.h] sources. 266 267#### Obtaining the SMB 268 269The SMB is obtained by passing a tmpfs file descriptor over the producer socket 270and memory-mapping it both from the producer and service. 271The producer specifies the desired SMB size and memory layout when sending the 272[`InitializeConnectionRequest`][producer_port.proto] request to the 273service, which is the very first IPC sent after connection. 274By default, the service creates the SMB and passes back its file descriptor to 275the producer with the the [`InitializeConnectionResponse`][producer_port.proto] 276IPC reply. Recent versions of the service (Android R / 11) allow the FD to be 277created by the producer and passed down to the service in the request. When the 278service supports this, it acks the request setting 279`InitializeConnectionResponse.using_shmem_provided_by_producer = true`. At the 280time of writing this feature is used only by Chrome for dealing with lazy 281Mojo initialization during startup tracing. 282 283#### SMB memory layout: pages, chunks, fragments and packets 284 285The SMB is partitioned into fixed-size pages. A SMB page must be an integer 286multiple of 4KB. The only valid sizes are: 4KB, 8KB, 16KB, 32KB. 287 288The size of a SMB page is determined by each Producer at connection time, via 289the `shared_memory_page_size_hint_bytes` field of `InitializeConnectionRequest` 290and cannot be changed afterwards. All pages in the SMB have the same size, 291constant throughout the lifetime of the producer process. 292 293![Shared Memory ABI Overview](/docs/images/shmem-abi-overview.png) 294 295**A page** is a fixed-sized partition of the shared memory buffer and is just a 296container of chunks. 297The Producer can partition each Page SMB using a limited number of predetermined 298layouts (1 page : 1 chunk; 1 page : 2 chunks and so on). 299The page layout is stored in a 32-bit atomic word in the page header. The same 30032-bit word contains also the state of each chunk (2 bits per chunk). 301 302Having fixed the total SMB size (hence the total memory overhead), the page 303size is a triangular trade off between: 304 3051. IPC traffic: smaller pages -> more IPCs. 3062. Producer lock freedom: larger pages -> larger chunks -> data sources can 307 write more data without needing to swap chunks and synchronize. 3083. Risk of write-starving the SMB: larger pages -> higher chance that the 309 Service won't manage to drain them and the SMB remains full. 310 311The page size, on the other side, has no implications on memory wasted due to 312fragmentation (see Chunk below). 313 314**A chunk** A chunk is a portion of a Page and contains a linear sequence of 315[`TracePacket(s)`][trace-packet-ref] (the root trace proto). 316 317A Chunk defines the granularity of the interaction between the Producer and 318tracing Service. When a producer fills a chunk it sends `CommitData` IPC to the 319service, asking it to copy its contents into the central non-shared buffers. 320 321A a chunk can be in one of the following four states: 322 323* `Free` : The Chunk is free. The Service shall never touch it, the Producer 324 can acquire it when writing and transition it into the `BeingWritten` state. 325 326* `BeingWritten`: The Chunk is being written by the Producer and is not 327 complete yet (i.e. there is still room to write other trace packets). 328 The Service never alter the state of chunks in the `BeingWritten` state 329 (but will still read them when flushing even if incomplete). 330 331* `Complete`: The Producer is done writing the chunk and won't touch it 332 again. The Service can move it to its non-shared ring buffer and mark the 333 chunk as `BeingRead` -> `Free` when done. 334 335* `BeingRead`: The Service is moving the page into its non-shared ring 336 buffer. Producers never touch chunks in this state. 337 _Note: this state ended up being never used as the service directly 338 transitions chunks from `Complete` back to `Free`_. 339 340A chunk is owned exclusively by one thread of one data source of the producer. 341 342Chunks are essentially single-writer single-thread lock-free arenas. Locking 343happens only when a Chunk is full and a new one needs to be acquired. 344 345Locking happens only within the scope of a Producer process. 346Inter-process locking is not generally allowed. The Producer cannot lock the 347Service and vice versa. In the worst case, any of the two can starve the SMB, by 348marking all chunks as either being read or written. But that has the only side 349effect of losing the trace data. 350The only case when stalling on the writer-side (the Producer) can occur is when 351a data source in a producer opts in into using the 352[`BufferExhaustedPolicy.kStall`](/docs/concepts/buffers.md) policy and the SMB 353is full. 354 355**[TracePacket][trace-packet-ref]** is the atom of tracing. Putting aside 356pages and chunks a trace is conceptually just a concatenation of TracePacket(s). 357A TracePacket can be big (up to 64 MB) and can span across several chunks, hence 358across several pages. 359A TracePacket can therefore be >> chunk size, >> page size and even >> SMB size. 360The Chunk header carries metadata to deal with the TracePacket splitting. 361 362Overview of the Page, Chunk, Fragment and Packet concepts:<br> 363![Shared Memory ABI concepts](/docs/images/shmem-abi-concepts.png) 364 365Memory layout of a Page:<br> 366![SMB Page layout](/docs/images/shmem-abi-page.png) 367 368Because a packet can be larger than a page, the first and the last packets in 369a chunk can be fragments. 370 371![TracePacket spanning across SMB chunks](/docs/images/shmem-abi-spans.png) 372 373#### Post-facto patching through IPC 374 375If a TracePacket is particularly large, it is very likely that the chunk that 376contains its initial fragments is committed into the central buffers and removed 377from the SMB by the time the last fragments of the same packets is written. 378 379Nested messages in protobuf are prefixed by their length. In a zero-copy 380direct-serialization scenario like tracing, the length is known only when the 381last field of a submessage is written and cannot be known upfront. 382 383Because of this, it is possible that when the last fragment of a packet is 384written, the writer needs to backfill the size prefix in an earlier fragment, 385which now might have disappeared from the SMB. 386 387In order to do this, the tracing protocol allows to patch the contents of a 388chunk through the `CommitData` IPC (see 389[`CommitDataRequest.ChunkToPatch`][commit_data_request.proto]) after the tracing 390service copied it into the central buffer. There is no guarantee that the 391fragment will be still there (e.g., it can be over-written in ring-buffer mode). 392The service will patch the chunk only if it's still in the buffer and only if 393the producer ID that wrote it matches the Producer ID of the patch request over 394IPC (the Producer ID is not spoofable and is tied to the IPC socket file 395descriptor). 396 397### Proto definitions 398 399The following protobuf messages are part of the overall trace protocol ABI and 400are updated maintaining backward-compatibility, unless marked as experimental 401in the comments. 402 403TIP: See also the _Updating A Message Type_ section of the 404 [Protobuf Language Guide][proto-updating] for valid ABI-compatible changes 405 when updating the schema of a protobuf message. 406 407#### DataSourceDescriptor 408 409Defined in [data_source_descriptor.proto]. This message is sent 410Producer -> Service through IPC on the Producer socket during the Producer 411initialization, before any tracing session is started. This message is used 412to register advertise a data source and its capabilities (e.g., which GPU HW 413counters are supported, their possible sampling rates). 414 415#### DataSourceConfig 416 417Defined in [data_source_config.proto]. This message is sent: 418 419* Consumer -> Service through IPC on the Consumer socket, as part of the 420 [TraceConfig](/docs/concepts/config.md) when a Consumer starts a new tracing 421 session. 422 423* Service -> Producer through IPC on the Producer socket, as a reaction to the 424 above. The service passes through each `DataSourceConfig` section defined in 425 the `TraceConfig` to the corresponding Producer(s) that advertise that data 426 source. 427 428#### TracePacket 429 430Defined in [trace_packet.proto]. This is the root object written by any data 431source into the SMB when producing any form of trace event. 432See the [TracePacket reference][trace-packet-ref] for the full details. 433 434## {#abi-stability} ABI Stability 435 436All the layers of the tracing protocol ABI are long-term stable and can only 437be changed maintaining backwards compatibility. 438 439This is due to the fact that on every Android release the `traced` service 440gets frozen in the system image while unbundled apps (e.g. Chrome) and host 441tools (e.g. Perfetto UI) can be updated at a more frequently cadence. 442 443Both the following scenarios are possible: 444 445#### Producer/Consumer client older than tracing service 446 447This happens typically during Android development. At some point some newer code 448is dropped in the Android platform and shipped to users, while client software 449and host tools will lag behind (or simply the user has not updated their app / 450tools). 451 452The tracing service needs to support clients talking and older version of the 453Producer or Consumer tracing protocol. 454 455* Don't remove IPC methods from the service. 456* Assume that fields added later to existing methods might be absent. 457* For newer Producer/Consumer behaviors, advertise those behaviors through 458 feature flags when connecting to the service. Good examples of this are the 459 `will_notify_on_stop` or `handles_incremental_state_clear` flags in 460 [data_source_descriptor.proto] 461 462#### Producer/Consumer client newer than tracing service 463 464This is the most likely scenario. At some point in 2022 a large number of phones 465will still run Android P or Q, hence running a snapshot of the tracing service 466from ~2018-2020, but will run a recent version Google Chrome. 467Chrome, when configured in system-tracing mode (i.e. system-wide + in-app 468tracing), connects to the Android's `traced` producer socket and talks the 469latest version of the tracing protocol. 470 471The producer/consumer client code needs to be able to talk with an older version of the 472service, which might not support some newer features. 473 474* Newer IPC methods defined in [producer_port.proto] won't exist in the older 475 service. When connecting on the socket the service lists its RPC methods 476 and the client is able to detect if a method is available or not. 477 At the C++ IPC layer, invoking a method that doesn't exist on the service 478 causes the `Deferred<>` promise to be rejected. 479 480* Newer fields in existing IPC methods will just be ignored by the older version 481 of the service. 482 483* If the producer/consumer client depends on a new behavior of the service, and 484 that behavior cannot be inferred by the presence of a method, a new feature 485 flag must be exposed through the `QueryCapabilities` method. 486 487## Static linking vs shared library 488 489The Perfetto Client Library is only available in the form of a static library 490and a single-source amalgamated SDK (which is effectively a static library). 491The library implements the Tracing Protocol ABI so, once statically linked, 492depends only on the socket and shared memory protocol ABI, which are guaranteed 493to be stable. 494 495No shared library distributions are available. We strongly discourage teams from 496attempting to build the tracing library as shared library and use it from a 497different linker unit. It is fine to link AND use the client library within 498the same shared library, as long as none of the perfetto C++ API is exported. 499 500The `PERFETTO_EXPORT` annotations are only used when building the third tier of 501the client library in chromium component builds and cannot be easily repurposed 502for delineating shared library boundaries for the other two API tiers. 503 504This is because the C++ the first two tiers of the Client Library C++ API make 505extensive use of inline headers and C++ templates, in order to allow the 506compiler to see through most of the layers of abstraction. 507 508Maintaining the C++ ABI across hundreds of inlined functions and a shared 509library is prohibitively expensive and too prone to break in extremely subtle 510ways. For this reason the team has ruled out shared library distributions for 511the time being. 512 513[cli_lib]: /docs/instrumentation/tracing-sdk.md 514[selinux_producer]: https://cs.android.com/search?q=perfetto_producer%20f:sepolicy.*%5C.te&sq= 515[selinux_consumer]:https://cs.android.com/search?q=f:sepolicy%2F.*%5C.te%20traced_consumer&sq= 516[mojom]: https://source.chromium.org/chromium/chromium/src/+/master:services/tracing/public/mojom/perfetto_service.mojom?q=producer%20f:%5C.mojom$%20perfetto&ss=chromium&originalUrl=https:%2F%2Fcs.chromium.org%2F 517[proto_rpc]: https://developers.google.com/protocol-buffers/docs/proto#services 518[producer_port.proto]: /protos/perfetto/ipc/producer_port.proto 519[consumer_port.proto]: /protos/perfetto/ipc/consumer_port.proto 520[trace_packet.proto]: /protos/perfetto/trace/trace_packet.proto 521[data_source_descriptor.proto]: /protos/perfetto/common/data_source_descriptor.proto 522[data_source_config.proto]: /protos/perfetto/config/data_source_config.proto 523[trace-packet-ref]: /docs/reference/trace-packet-proto.autogen 524[shared_memory_abi.h]: /include/perfetto/ext/tracing/core/shared_memory_abi.h 525[commit_data_request.proto]: /protos/perfetto/common/commit_data_request.proto 526[proto-updating]: https://developers.google.com/protocol-buffers/docs/proto#updating 527