1# How To Implement Field Presence for Proto3 2 3Protobuf release 3.12 adds experimental support for `optional` fields in 4proto3. Proto3 optional fields track presence like in proto2. For background 5information about what presence tracking means, please see 6[docs/field_presence](field_presence.md). 7 8## Document Summary 9 10This document is targeted at developers who own or maintain protobuf code 11generators. All code generators will need to be updated to support proto3 12optional fields. First-party code generators developed by Google are being 13updated already. However third-party code generators will need to be updated 14independently by their authors. This includes: 15 16- implementations of Protocol Buffers for other languges. 17- alternate implementations of Protocol Buffers that target specialized use 18 cases. 19- RPC code generators that create generated APIs for service calls. 20- code generators that implement some utility code on top of protobuf generated 21 classes. 22 23While this document speaks in terms of "code generators", these same principles 24apply to implementations that dynamically generate a protocol buffer API "on the 25fly", directly from a descriptor, in languages that support this kind of usage. 26 27## Background 28 29Presence tracking was added to proto3 in response to user feedback, both from 30inside Google and [from open-source 31users](https://github.com/protocolbuffers/protobuf/issues/1606). The [proto3 32wrapper 33types](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto) 34were previously the only supported presence mechanism for proto3. Users have 35pointed to both efficiency and usability issues with the wrapper types. 36 37Presence in proto3 uses exactly the same syntax and semantics as in proto2. 38Proto3 Fields marked `optional` will track presence like proto2, while fields 39without any label (known as "singular fields"), will continue to omit presence 40information. The `optional` keyword was chosen to minimize differences with 41proto2. 42 43Unfortunately, for the current descriptor protos and `Descriptor` API (as of 443.11.4) it is not possible to use the same representation as proto2. Proto3 45descriptors already use `LABEL_OPTIONAL` for proto3 singular fields, which do 46not track presence. There is a lot of existing code that reflects over proto3 47protos and assumes that `LABEL_OPTIONAL` in proto3 means "no presence." Changing 48the semantics now would be risky, since old software would likely drop proto3 49presence information, which would be a data loss bug. 50 51To minimize this risk we chose a descriptor representation that is semantically 52compatible with existing proto3 reflection. Every proto3 optional field is 53placed into a one-field `oneof`. We call this a "synthetic" oneof, as it was not 54present in the source `.proto` file. 55 56Since oneof fields in proto3 already track presence, existing proto3 57reflection-based algorithms should correctly preserve presence for proto3 58optional fields with no code changes. For example, the JSON and TextFormat 59parsers/serializers in C++ and Java did not require any changes to support 60proto3 presence. This is the major benefit of synthetic oneofs. 61 62This design does leave some cruft in descriptors. Synthetic oneofs are a 63compatibility measure that we can hopefully clean up in the future. For now 64though, it is important to preserve them across different descriptor formats and 65APIs. It is never safe to drop synthetic oneofs from a proto schema. Code 66generators can (and should) skip synthetic oneofs when generating a user-facing 67API or user-facing documentation. But for any schema representation that is 68consumed programmatically, it is important to keep the synthetic oneofs around. 69 70In APIs it can be helpful to offer separate accessors that refer to "real" 71oneofs (see [API Changes](#api-changes) below). This is a convenient way to omit 72synthetic oneofs in code generators. 73 74## Updating a Code Generator 75 76When a user adds an `optional` field to proto3, this is internally rewritten as 77a one-field oneof, for backward-compatibility with reflection-based algorithms: 78 79```protobuf 80syntax = "proto3"; 81 82message Foo { 83 // Experimental feature, not generally supported yet! 84 optional int32 foo = 1; 85 86 // Internally rewritten to: 87 // oneof _foo { 88 // int32 foo = 1 [proto3_optional=true]; 89 // } 90 // 91 // We call _foo a "synthetic" oneof, since it was not created by the user. 92} 93``` 94 95As a result, the main two goals when updating a code generator are: 96 971. Give `optional` fields like `foo` normal field presence, as described in 98 [docs/field_presence](field_presence.md) If your implementation already 99 supports proto2, a proto3 `optional` field should use exactly the same API 100 and internal implementation as proto2 `optional`. 1012. Avoid generating any oneof-based accessors for the synthetic oneof. Its only 102 purpose is to make reflection-based algorithms work properly if they are 103 not aware of proto3 presence. The synthetic oneof should not appear anywhere 104 in the generated API. 105 106### Satisfying the Experimental Check 107 108If you try to run `protoc` on a file with proto3 `optional` fields, you will get 109an error because the feature is still experimental: 110 111``` 112$ cat test.proto 113syntax = "proto3"; 114 115message Foo { 116 // Experimental feature, not generally supported yet! 117 optional int32 a = 1; 118} 119$ protoc --cpp_out=. test.proto 120test.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set. 121``` 122 123There are two options for getting around this error: 124 1251. Pass `--experimental_allow_proto3_optional` to protoc. 1262. Make your filename (or a directory name) contain the string 127 `test_proto3_optional`. This indicates that the proto file is specifically 128 for testing proto3 optional support, so the check is suppressed. 129 130These options are demonstrated below: 131 132``` 133# One option: 134$ ./src/protoc test.proto --cpp_out=. --experimental_allow_proto3_optional 135 136# Another option: 137$ cp test.proto test_proto3_optional.proto 138$ ./src/protoc test_proto3_optional.proto --cpp_out=. 139$ 140``` 141 142The experimental check will be removed in a future release, once we are ready 143to make this feature generally available. Ideally this will happen for the 3.13 144release of protobuf, sometime in mid-2020, but there is not a specific date set 145for this yet. Some of the timing will depend on feedback we get from the 146community, so if you have questions or concerns please get in touch via a 147GitHub issue. 148 149### Signaling That Your Code Generator Supports Proto3 Optional 150 151If you now try to invoke your own code generator with the test proto, you will 152run into a different error: 153 154``` 155$ ./src/protoc test_proto3_optional.proto --my_codegen_out=. 156test_proto3_optional.proto: is a proto3 file that contains optional fields, but 157code generator --my_codegen_out hasn't been updated to support optional fields in 158proto3. Please ask the owner of this code generator to support proto3 optional. 159``` 160 161This check exists to make sure that code generators get a chance to update 162before they are used with proto3 `optional` fields. Without this check an old 163code generator might emit obsolete generated APIs (like accessors for a 164synthetic oneof) and users could start depending on these. That would create 165a legacy migration burden once a code generator actually implements the feature. 166 167To signal that your code generator supports `optional` fields in proto3, you 168need to tell `protoc` what features you support. The method for doing this 169depends on whether you are using the C++ 170`google::protobuf::compiler::CodeGenerator` 171framework or not. 172 173If you are using the CodeGenerator framework: 174 175```c++ 176class MyCodeGenerator : public google::protobuf::compiler::CodeGenerator { 177 // Add this method. 178 uint64_t GetSupportedFeatures() const override { 179 // Indicate that this code generator supports proto3 optional fields. 180 // (Note: don't release your code generator with this flag set until you 181 // have actually added and tested your proto3 support!) 182 return FEATURE_PROTO3_OPTIONAL; 183 } 184} 185``` 186 187If you are generating code using raw `CodeGeneratorRequest` and 188`CodeGeneratorResponse` messages from `plugin.proto`, the change will be very 189similar: 190 191```c++ 192void GenerateResponse() { 193 CodeGeneratorResponse response; 194 response.set_supported_features(CodeGeneratorResponse::FEATURE_PROTO3_OPTIONAL); 195 196 // Generate code... 197} 198``` 199 200Once you have added this, you should now be able to successfully use your code 201generator to generate a file containing proto3 optional fields: 202 203``` 204$ ./src/protoc test_proto3_optional.proto --my_codegen_out=. 205``` 206 207### Updating Your Code Generator 208 209Now to actually add support for proto3 optional to your code generator. The goal 210is to recognize proto3 optional fields as optional, and suppress any output from 211synthetic oneofs. 212 213If your code generator does not currently support proto2, you will need to 214design an API and implementation for supporting presence in scalar fields. 215Generally this means: 216 217- allocating a bit inside the generated class to represent whether a given field 218 is present or not. 219- exposing a `has_foo()` method for each field to return the value of this bit. 220- make the parser set this bit when a value is parsed from the wire. 221- make the serializer test this bit to decide whether to serialize. 222 223If your code generator already supports proto2, then most of your work is 224already done. All you need to do is make sure that proto3 optional fields have 225exactly the same API and behave in exactly the same way as proto2 optional 226fields. 227 228From experience updating several of Google's code generators, most of the 229updates that are required fall into one of several patterns. Here we will show 230the patterns in terms of the C++ CodeGenerator framework. If you are using 231`CodeGeneratorRequest` and `CodeGeneratorReply` directly, you can translate the 232C++ examples to your own language, referencing the C++ implementation of these 233methods where required. 234 235#### To test whether a field should have presence 236 237Old: 238 239```c++ 240bool MessageHasPresence(const google::protobuf::Descriptor* message) { 241 return message->file()->syntax() == 242 google::protobuf::FileDescriptor::SYNTAX_PROTO2; 243} 244``` 245 246New: 247 248```c++ 249// Presence is no longer a property of a message, it's a property of individual 250// fields. 251bool FieldHasPresence(const google::protobuf::FieldDescriptor* field) { 252 return field->has_presence(); 253 // Note, the above will return true for fields in a oneof. 254 // If you want to filter out oneof fields, write this instead: 255 // return field->has_presence && !field->real_containing_oneof() 256} 257``` 258 259#### To test whether a field is a member of a oneof 260 261Old: 262 263```c++ 264bool FieldIsInOneof(const google::protobuf::FielDescriptor* field) { 265 return field->containing_oneof() != nullptr; 266} 267``` 268 269New: 270 271```c++ 272bool FieldIsInOneof(const google::protobuf::FielDescriptor* field) { 273 // real_containing_oneof() returns nullptr for synthetic oneofs. 274 return field->real_containing_oneof() != nullptr; 275} 276``` 277 278#### To iterate over all oneofs 279 280Old: 281 282```c++ 283bool IterateOverOneofs(const google::protobuf::Descriptor* message) { 284 for (int i = 0; i < message->oneof_decl_count(); i++) { 285 const google::protobuf::OneofDescriptor* oneof = message->oneof(i); 286 // ... 287 } 288} 289``` 290 291New: 292 293```c++ 294bool IterateOverOneofs(const google::protobuf::Descriptor* message) { 295 // Real oneofs are always first, and real_oneof_decl_count() will return the 296 // total number of oneofs, excluding synthetic oneofs. 297 for (int i = 0; i < message->real_oneof_decl_count(); i++) { 298 const google::protobuf::OneofDescriptor* oneof = message->oneof(i); 299 // ... 300 } 301} 302``` 303 304## Updating Reflection 305 306If your implementation offers reflection, there are a few other changes to make: 307 308### API Changes 309 310The API for reflecting over fields and oneofs should make the following changes. 311These match the changes implemented in C++ reflection. 312 3131. Add a `FieldDescriptor::has_presence()` method returning `bool` 314 (adjusted to your language's naming convention). This should return true 315 for all fields that have explicit presence, as documented in 316 [docs/field_presence](field_presence.md). In particular, this includes 317 fields in a oneof, proto2 scalar fields, and proto3 `optional` fields. 318 This accessor will allow users to query what fields have presence without 319 thinking about the difference between proto2 and proto3. 3202. As a corollary of (1), please do *not* expose an accessor for the 321 `FieldDescriptorProto.proto3_optional` field. We want to avoid having 322 users implement any proto2/proto3-specific logic. Users should use the 323 `has_presence()` function instead. 3243. You may also wish to add a `FieldDescriptor::has_optional_keyword()` method 325 returning `bool`, which indicates whether the `optional` keyword is present. 326 Message fields will always return `true` for `has_presence()`, so this method 327 can allow a user to know whether the user wrote `optional` or not. It can 328 occasionally be useful to have this information, even though it does not 329 change the presence semantics of the field. 3304. If your reflection API may be used for a code generator, you may wish to 331 implement methods to help users tell the difference between real and 332 synthetic oneofs. In particular: 333 - `OneofDescriptor::is_synthetic()`: returns true if this is a synthetic 334 oneof. 335 - `FieldDescriptor::real_containing_oneof()`: like `containing_oneof()`, 336 but returns `nullptr` if the oneof is synthetic. 337 - `Descriptor::real_oneof_decl_count()`: like `oneof_decl_count()`, but 338 returns the number of real oneofs only. 339 340### Implementation Changes 341 342Proto3 `optional` fields and synthetic oneofs must work correctly when 343reflected on. Specifically: 344 3451. Reflection for synthetic oneofs should work properly. Even though synthetic 346 oneofs do not really exist in the message, you can still make reflection work 347 as if they did. In particular, you can make a method like 348 `Reflection::HasOneof()` or `Reflection::GetOneofFieldDescriptor()` look at 349 the hasbit to determine if the oneof is present or not. 3502. Reflection for proto3 optional fields should work properly. For example, a 351 method like `Reflection::HasField()` should know to look for the hasbit for a 352 proto3 `optional` field. It should not be fooled by the synthetic oneof into 353 thinking that there is a `case` member for the oneof. 354 355Once you have updated reflection to work properly with proto3 `optional` and 356synthetic oneofs, any code that *uses* your reflection interface should work 357properly with no changes. This is the benefit of using synthetic oneofs. 358 359In particular, if you have a reflection-based implementation of protobuf text 360format or JSON, it should properly support proto3 optional fields without any 361changes to the code. The fields will look like they all belong to a one-field 362oneof, and existing proto3 reflection code should know how to test presence for 363fields in a oneof. 364 365So the best way to test your reflection changes is to try round-tripping a 366message through text format, JSON, or some other reflection-based parser and 367serializer, if you have one. 368 369### Validating Descriptors 370 371If your reflection implementation supports loading descriptors at runtime, 372you must verify that all synthetic oneofs are ordered after all "real" oneofs. 373 374Here is the code that implements this validation step in C++, for inspiration: 375 376```c++ 377 // Validation that runs for each message. 378 // Synthetic oneofs must be last. 379 int first_synthetic = -1; 380 for (int i = 0; i < message->oneof_decl_count(); i++) { 381 const OneofDescriptor* oneof = message->oneof_decl(i); 382 if (oneof->is_synthetic()) { 383 if (first_synthetic == -1) { 384 first_synthetic = i; 385 } 386 } else { 387 if (first_synthetic != -1) { 388 AddError(message->full_name(), proto.oneof_decl(i), 389 DescriptorPool::ErrorCollector::OTHER, 390 "Synthetic oneofs must be after all other oneofs"); 391 } 392 } 393 } 394 395 if (first_synthetic == -1) { 396 message->real_oneof_decl_count_ = message->oneof_decl_count_; 397 } else { 398 message->real_oneof_decl_count_ = first_synthetic; 399 } 400``` 401