1<!--* 2# Document freshness: For more information, see go/fresh-source. 3freshness: { owner: 'haberman' reviewed: '2023-02-24' } 4*--> 5 6# upb vs. C++ Protobuf Design 7 8[upb](https://github.com/protocolbuffers/protobuf/tree/main/upb) is a small C 9protobuf library. While some of the design follows in the footsteps of the C++ 10Protobuf Library, upb departs from C++'s design in several key ways. This 11document compares and contrasts the two libraries on several design points. 12 13## Design Goals 14 15Before we begin, it is worth calling out that upb and C++ have different design 16goals, and this motivates some of the differences we will see. 17 18C++ protobuf is a user-level library: it is designed to be used directly by C++ 19applications. These applications will expect a full-featured C++ API surface 20that uses C++ idioms. The C++ library is also willing to add features to 21increase server performance, even if these features would add size or complexity 22to the library. Because C++ protobuf is a user-level library, API stability is 23of utmost importance: breaking API changes are rare and carefully managed when 24they do occur. The focus on C++ also means that ABI compatibility with C is not 25a priority. 26 27upb, on the other hand, is designed primarily to be wrapped by other languages. 28It is a C protobuf kernel that forms the basis on which a user-level protobuf 29library can be built. This means we prefer to keep the API surface as small and 30orthogonal as possible. While upb supports all protobuf features required for 31full conformance, upb prioritizes simplicity and small code size, and avoids 32adding features like lazy fields that can accelerate some use cases but at great 33cost in terms of complexity. As upb is not aimed directly at users, there is 34much more freedom to make API-breaking changes when necessary, which helps the 35core to stay small and simple. We want to be compatible with all FFI 36interfaces, so C ABI compatibility is a must. 37 38Despite these differences, C++ protos and upb offer [roughly the same core set 39of 40features](https://github.com/protocolbuffers/protobuf/tree/main/upb#features). 41 42## Arenas 43 44upb and C++ protos both offer arena allocation, but there are some key 45differences. 46 47### C++ 48 49As a matter of history, when C++ protos were open-sourced in 2008, they did not 50support arenas. Originally there was only unique ownership, whereby each 51message uniquely owns all child messages and will free them when the parent is 52freed. 53 54Arena allocation was added as a feature in 2014 as a way of dramatically 55reducing allocation and (especially) deallocation costs. But the library was 56not at liberty to remove the unique ownership model, because it would break far 57too many users. As a result, C++ has supported a **hybrid allocation model** 58ever since, allowing users to allocate messages either directly from the 59stack/heap or from an arena. The library attempts to ensure that there are 60no dangling pointers by performing automatic copies in some cases (for example 61`a->set_allocated_b(b)`, where `a` and `b` are on different arenas). 62 63C++'s arena object itself `google::protobuf::Arena` is **thread-safe** by 64design, which allows users to allocate from multiple threads simultaneously 65without external synchronization. The user can supply an initial block of 66memory to the arena, and can choose some parameters to control the arena block 67size. The user can also supply block alloc/dealloc functions, but the alloc 68function is expected to always return some memory. The C++ library in general 69does not attempt to handle out of memory conditions. 70 71### upb 72 73upb uses **arena allocation exclusively**. All messages must be allocated from 74an arena, and can only be freed by freeing the arena. It is entirely the user's 75responsibility to ensure that there are no dangling pointers: when a user sets a 76message field, this will always trivially overwrite the pointer and will never 77perform an implicit copy. 78 79upb's `upb::Arena` is **thread-compatible**, which means it cannot be used 80concurrently without synchronization. The arena can be seeded with an initial 81block of memory, but it does not explicitly support any parameters for choosing 82block size. It supports a custom alloc/dealloc function, and this function is 83allowed to return `NULL` if no dynamic memory is available. This allows upb 84arenas to have a max/fixed size, and makes it possible in theory to write code 85that is tolerant to out-of-memory errors. 86 87upb's arena also supports a novel operation known as **fuse**, which joins two 88arenas together into a single lifetime. Though both arenas must still be freed 89separately, none of the memory will actually be freed until *both* arenas have 90been freed. This is useful for avoiding dangling pointers when reparenting a 91message with one that may be on a different arena. 92 93### Comparison 94 95**hybrid allocation vs. arena-only** 96 97* The C++ hybrid allocation model introduces a great deal of complexity and 98 unpredictability into the library. upb benefits from having a much simpler 99 and more predictable design. 100* Some of the complexity in C++'s hybrid model arises from the fact that arenas 101 were added after the fact. Designing for a hybrid model from the outset 102 would likely yield a simpler result. 103* Unique ownership does support some usage patterns that arenas cannot directly 104 accommodate. For example, you can reparent a message and the child will precisely 105 follow the lifetime of its new parent. An arena would require you to either 106 perform a deep copy or extend the lifetime. 107 108**thread-compatible vs. thread-safe arena** 109 110* A thread-safe arena (as in C++) is safer and easier to use. A thread-compatible 111 arena requires that the user prove that the arena cannot be used concurrently. 112* [Thread Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual) 113 is far more accessible than it was in 2014 (when C++ introduced a thread-safe 114 arena). We now have more tools at our disposal to ensure that we do not trigger 115 data races in a thread-compatible arena like upb. 116* Thread-compatible arenas are more performant. 117* Thread-compatible arenas have a far simpler implementation. The C++ thread-safe 118 arena relies on thread-local variables, which introduce complications on some 119 platforms. It also requires far more subtle reasoning for correctness and 120 performance. 121 122**fuse vs. no fuse** 123 124* The `upb_Arena_Fuse()` operation is a key part of how upb supports reparenting 125 of messages when the parent may be on a different arena. Without this, upb has 126 no way of supporting `foo.bar = bar` in dynamic languages without performing a 127 deep copy. 128* A downside of `upb_Arena_Fuse()` is that passing an arena to a function can allow 129 that function to extend the lifetime of the arena in potentially 130 unpredictable ways. This can be prevented if necessary, as fuse can fail, eg. if 131 one arena has an initial block. But this adds some complexity by requiring callers 132 to handle the case where fuse fails. 133 134## Code Generation vs. Tables 135 136The C++ protobuf library has always been built around code generation, while upb 137generates only tables. In other words, `foo.pb.cc` files contain functions, 138whereas `foo.upb.c` files emit only data structures. 139 140### C++ 141 142C++ generated code emits a large number of functions into `foo.pb.cc` files. 143An incomplete list: 144 145* `FooMsg::FooMsg()` (constructor): initializes all fields to their default value. 146* `FooMsg::~FooMsg()` (destructor): frees any present child messages. 147* `FooMsg::Clear()`: clears all fields back to their default/empty value. 148* `FooMsg::_InternalParse()`: generated code for parsing a message. 149* `FooMsg::_InternalSerialize()`: generated code for serializing a message. 150* `FooMsg::ByteSizeLong()`: calculates serialized size, as a first pass before serializing. 151* `FooMsg::MergeFrom()`: copies/appends present fields from another message. 152* `FooMsg::IsInitialized()`: checks whether required fields are set. 153 154This code lives in the `.text` section and contains function calls to the generated 155classes for child messages. 156 157### upb 158 159upb does not generate any code into `foo.upb.c` files, only data structures. upb uses a 160compact data table known as a *mini table* to represent the schema and all fields. 161 162upb uses mini tables to perform all of the operations that would traditionally be done 163with generated code. Revisiting the list from the previous section: 164 165* `FooMsg::FooMsg()` (constructor): upb instead initializes all messages with `memset(msg, 0, size)`. 166 Non-zero defaults are injected in the accessors. 167* `FooMsg::~FooMsg()` (destructor): upb messages are freed by freeing the arena. 168* `FooMsg::Clear()`: can be performed with `memset(msg, 0, size)`. 169* `FooMsg::_InternalParse()`: upb's parser uses mini tables as data, instead of generating code. 170* `FooMsg::_InternalSerialize()`: upb's serializer also uses mini-tables instead of generated code. 171* `FooMsg::ByteSizeLong()`: upb performs serialization in reverse so that an initial pass is not required. 172* `FooMsg::MergeFrom()`: upb supports this via serialize+parse from the other message. 173* `FooMsg::IsInitialized()`: upb's encoder and decoder have special flags to check for required fields. 174 A util library `upb/util/required_fields.h` handles the corner cases. 175 176### Comparison 177 178If we compare compiled code size, upb is far smaller. Here is a comparison of the code 179size of a trivial binary that does nothing but a parse and serialize of `descriptor.proto`. 180This means we are seeing both the overhead of the core library itself as well as the 181generated code (or table) for `descriptor.proto`. (For extra clarity we should break this 182down by generated code vs core library in the future). 183 184 185| Library | `.text` | `.data` | `.bss` | 186|------------ |---------|---------|--------| 187| upb | 26Ki | 0.6Ki | 0.01Ki | 188| C++ (lite) | 187Ki | 2.8Ki | 1.25Ki | 189| C++ (code size) | 904Ki | 6.1Ki | 1.88Ki | 190| C++ (full) | 983Ki | 6.1Ki | 1.88Ki | 191 192"C++ (code size)" refers to protos compiled with `optimize_for = CODE_SIZE`, a mode 193in which generated code contains reflection only, in an attempt to make the 194generated code size smaller (however it requires the full runtime instead 195of the lite runtime). 196 197## Bifurcated vs. Optional Reflection 198 199upb and C++ protos both offer reflection without making it mandatory. However 200the models for enabling/disabling reflection are very different. 201 202### C++ 203 204C++ messages offer full reflection by default. Messages in C++ generally 205derive from `Message`, and the base class provides a member function 206`Reflection* Message::GetReflection()` which returns the reflection object. 207 208It follows that any message deriving from `Message` will always have reflection 209linked into the binary, whether or not the reflection object is ever used. 210Because `GetReflection()` is a function on the base class, it is not possible 211to statically determine if a given message's reflection is used: 212 213```c++ 214Reflection* GetReflection(const Message& message) { 215 // Can refer to any message in the whole binary. 216 return message.GetReflection(); 217} 218``` 219 220The C++ library does provide a way of omitting reflection: `MessageLite`. We can 221cause a message to be lite in two different ways: 222 223* `optimize_for = LITE_RUNTIME` in a `.proto` file will cause all messages in that 224 file to be lite. 225* `lite` as a codegen param: this will force all messages to lite, even if the 226 `.proto` file does not have `optimize_for = LITE_RUNTIME`. 227 228A lite message will derive from `MessageLite` instead of `Message`. Since 229`MessageLite` has no `GetReflection()` function, this means no reflection is 230available, so we can avoid taking the code size hit. 231 232### upb 233 234upb does not have the `Message` vs. `MessageLite` bifurcation. There is only one 235kind of message type `upb_Message`, which means there is no need to configure in 236a `.proto` file which messages will need reflection and which will not. 237Every message has the *option* to link in reflection from a separate `foo.upbdefs.o` 238file, without needing to change the message itself in any way. 239 240upb does not provide the equivalent of `Message::GetReflection()`: there is no 241facility for retrieving the reflection of a message whose type is not known statically. 242It would be possible to layer such a facility on top of the upb core, though this 243would probably require some kind of code generation. 244 245### Comparison 246 247* Most messages in C++ will not bother to declare themselves as "lite". This means 248 that many C++ messages will link in reflection even when it is never used, bloating 249 binaries unnecessarily. 250* `optimize_for = LITE_RUNTIME` is difficult to use in practice, because it prevents 251 any non-lite protos from `import`ing that file. 252* Forcing all protos to lite via a codegen parameter (for example, when building for 253 mobile) is more practical than `optimize_for = LITE_RUNTIME`. But this will break 254 the compile for any code that tries to upcast to `Message`, or tries to use a 255 non-lite method. 256* The one major advantage of the C++ model is that it can support `msg.DebugString()` 257 on a type-erased proto. For upb you have to explicitly pass the `upb_MessageDef*` 258 separately if you want to perform an operation like printing a proto to text format. 259 260## Explicit Registration vs. Globals 261 262TODO 263