1# Life of an Edition 2 3**Author:** [@mcy](https://github.com/mcy) 4 5How to use Protobuf Editions to construct a large-scale change that modifies the 6semantics of Protobuf in some way. 7 8## Overview 9 10This document describes how to use the Protobuf Editions mechanism (both 11editions, themselves, and [features](protobuf-editions-design-features.md)) for 12designing migrations and large-scale changes intended to solve a particular kind 13of defect in the language. 14 15This document describes: 16 17* How features are added to the language. 18* How editions are defined and "proclaimed". 19* How to build different kinds of large-scale changes. 20* Tooling in `protoc` to support large-scale changes. 21* An OSS strategy. 22 23## Defining Features 24 25There are two kinds of features: 26 27* Global features, which are the fields of `proto.Features`. In this document, 28 we refer to them as `features.<name>`, e.g. `features.enum`. 29* Language-scoped features, which are defined in a typed extension field for 30 that language. In this document, we refer to them as 31 `features.(<lang>).name`, e.g. `features.(proto.cpp).legacy_string`. 32 33Global features require a `descriptor.h` change, and are relatively heavy 34weight, since defining one will also require providing helpers in `Descriptor` 35wrapper classes to avoid the need for users to resolve inheritance. Because they 36are not specific to a language, they need to be carefully, visibility 37documented. 38 39Language-scoped features require only a change in a backend's feature extension, 40which has a smaller blast radius (except in C++ and Java). Often these are 41relevant only for codegen and do not require reflective introspection. 42 43Adding a feature is never a breaking change. 44 45### Feature Lifetime 46 47In general, features should have an *original default* and a *desired default*: 48features are intended to gradually flip from one value to another throughout the 49ecosystem as migrations progress. This is not always true, but this means most 50features will be bools or enums. 51 52Any migration that introduces a feature should plan to eventually deprecate and 53remove that feature from both our internal codebase and open source, generally 54with a multi-year horizon. Features are *transient*. 55 56Removing a feature is a breaking change, but it does not need to be tied to an 57edition. Feature removal in OSS must thus be batched into a breaking release. 58Deletion of a feature should generally be announced to OSS a year in advance. 59 60### Do's and Don'ts 61 62Here are some things that we could use features for, very broadly: 63 64* Changing the generated API of any syntax production (name, behavior, 65 signature, whether it is generated at all). E.g. 66 `features.(proto.cpp).legacy_string`. 67* Changing the serialization encoding of a field (so long as it does not break 68 readers). E.g., `features.packed`, eventually `features.group_encoding`. 69* Changing the deserialization semantics of a field. E.g., `features.enum`, 70 `features.utf8`. 71 72Although almost any semantic change can be feature-controlled, some things would 73be a bit tricky to use a feature for: 74 75* Changing syntax. If we introduce a new syntax production, gating it doesn't 76 do people much good and is just noise. We should avoid changing how things 77 are spelled. In Protobuf's history, it has been incredibly rare that we have 78 needed to do this. 79* Shape of a descriptor. Features should generally not cause fields, message, 80 or enum descriptors to appear or disappear. 81* Names and field numbers. Features should not change the names or field 82 numbers of syntax entities as seen in a descriptor. This is separate from 83 using features to change generated API names. 84* Changing the wire encoding in an incompatible way. Using features to change 85 the wire format has some long horizons and caveats described below. 86 87## Proclaiming an Edition 88 89An *edition* is a set of default values for all features that `protoc`'s 90frontend, and its backends, understand. Edition numbers are announced by 91protobuf-team, but not necessarily defined by us. `protoc` only defines the 92edition defaults for global features, and each backend defines the edition 93defaults for its features. 94 95### Total Ordering of Editions 96 97The `FileDescriptorProto.edition` field is a string, so that we can avoid nasty 98surprises around needing to mint multiple editions per year: even if we mint 99`edition = "2022";`, we can mint `edition = "2022.1";` in a pinch. 100 101However, protobuf-team does not define editions, it only proclaims them. 102Third-party backends are responsible for changing defaults across editions. To 103minimize the amount of synchronization, we introduce a *total order* on 104editions. 105 106This means that a backend can pick the default not by looking at the edition, 107but by asking "is this proto older than this edition, where I introduced this 108default?" 109 110The total order is thus: the edition string is split on `'.'`. Each component is 111then ordered by `a.len < b.len && a < b`. This ensures that `9 < 10`, for 112example. 113 114By convention, we will make the edition be either the year, like `2022`, or the 115year followed by a revision, like `2022.1`. Thus, we have the following total 116ordering on editions: 117 118``` 1192022 < 2022.0 < 2022.1 < ... < 2022.9 < 2022.10 < ... < 2023 < ... < 2024 < ... 120``` 121 122(**Note:** The above edition ordering is updated in 123[Edition Naming](edition-naming.md).) 124 125Thus, if an imaginary Haskell backend defines a feature 126`feature.(haskell).more_monads`, which becomes true in 2023, the backend can ask 127`file.EditionIsLaterThan("2023")`. If it becomes false in 2023.1, a future 128version would ask `file.EditionIsBetween("2023", "2023.1")`. 129 130This means that backends only need to change when they make a change to 131defaults. However, backends cannot add things to editions willy-nilly. A backend 132can only start observing an edition after protobuf-team proclaims the next 133edition number, and may not use edition numbers we do not proclaim. 134 135### Proclamation 136 137"Proclamation" is done via a two-step process: first, we announce an upcoming 138edition some months ahead of time to OSS, and give an approximate date on which 139we plan to release a non-breaking version that causes protoc to accept the new 140edition. Around the time of that release, backends should make a release adding 141support for that edition, if they want to change a default. It is a faux-pas, 142but ultimately has no enforcement mechanism, for the meaning of an edition to 143change long (> 1 month) after it has been released. 144 145We promise to proclaim an edition once per calendar year, even if first-party 146backends will not use it. In the event of an emergency (whatever that means), we 147can proclaim a `Y.1`, `Y.2`, and so on. Because of the total order, only 148backends that desperately need a new edition need to pay attention to the 149announcement. As we gain experience, we should define guidelines for third 150parties to request an unscheduled edition bump, but for the time being we will 151deal with things case-by-case. 152 153We may want to have a canonical way for finding out what the latest edition is. 154It should be included in large print on our landing page, and `protoc 155--latest-edition` should print the newest edition known to `protoc`. The intent 156is for tooling that wants to generate `.proto` templates externally can choose 157to use the latest edition for new messages. 158 159## Large-scale Change Templates 160 161The following are sketches of large-scale change designs for feature changes we 162would like to execute, presented as example use-cases. 163 164### Large-scale Changes with No Functional Changes: Edition Zero 165 166We need to get the ecosystem into the `"editions"` syntax. This migration is 167probably unique because we are not changing any behavior, just the spelling of a 168bunch of things. 169 170We also need to track down and upgrade (by hand) any code that is using the 171value of `syntax`. This will likely be a manual large-scale change performed 172either by Busy Beavers or a handful of protobuf-team members furnished with 173appropriate stimulants (coffee, diet mountain dew, etc). Once we have migrated 17495% of callers of `syntax`, we will mark all accessors of that field in various 175languages as deprecated. 176 177Because the value of `syntax` becomes unreliable at this point, this will be a 178breaking change. 179 180Next, we will introduce the features defined in 181[Edition Zero Features](edition-zero-features.md). We will then implement 182tooling that can take a `proto2` or `proto3` file and add `edition = "2023";` 183and `option features.* = ...;` as appropriate, so that each file retains its 184original behavior. 185 186This second large-scale change can be fully automated, and does not require 187breaking changes. 188 189### Large-scale Changes with Features Only: Immolation of `required` 190 191We can use features to move fields off of `features.field_presence = 192LEGACY_REQUIRED` (the edition’s spelling of `required`) and onto 193`features.field_presence = EXPLICIT_PRESENCE`. 194 195To do this, we introduce a new value for `features.field_presence`, 196`ALWAYS_SERIALIZE`, which behaves like `EXPLICIT_PRESENCE`, but, if the has-bit 197is not set, the default is serialized. (This is sort of like a cross between 198`required` and `proto3` no-label.) 199 200It is always safe to turn a proto from `LEGACY_REQUIRED` to `ALWAYS_SERIALIZE`, 201because `required` is a constraint on initialization checking, i.e., that the 202value was present. This means the only requirement is that old readers not 203break, which is accomplished by always providing *a* value. Because `required` 204fields don't set the value anyways, this is not a behavioral change, but it now 205permits writers to veer off of actually setting the value. 206 207After an appropriate build horizon, we can assume that all readers are tolerant 208of a potentially missing value (even though no writer would actually be omitting 209it). At this point we can migrate from `ALWAYS_SERIALIZE` to 210`EXPLICIT_PRESENCE`. If a reader does not see a record for the field, attempting 211to access it will produce the default value; it is not likely that callers are 212actually checking for presence of `required` fields, even though that is 213technically a thing you can do. 214 215Once all required fields have gone through both steps, `LEGACY_REQUIRED` and 216`ALWAYS_SERIALIZE` can be removed as variants (breaking change). 217 218### Large-scale Changes with Editions: absl::string_view Accessors 219 220In C++, a `string` or `bytes` typed field has accessors that produce `const 221std::string&`s. The missed optimizations of doing this are well-understood, so 222we won't rehash that discussion. 223 224We would like to migrate all of them to return `absl::string_view`, a-la 225`ctype = STRING_PIECE`. 226 227To do this, we introduce `features.(proto.cpp).legacy_string`[^1], a boolean 228feature by default true. When false on a field of appropriate type, it does the 229needful and causes accessors to become representationally opaque. 230 231The feature can be set at file or field scope; tooling (see below) can be used 232to minimize the diff impact of these changes. Changing a field may also require 233changing code that was previously assuming they could write `std::string x = 234proto.string_field();`. This has the usual "unspooling string" migration 235caveats. 236 237Once we have applied 95% of internal changes, we will upgrade the C++ backend at 238the next edition to default `legacy_string` to false in the new edition. Tooling 239(again, below) can be used to automatically delete explicit settings of the 240feature throughout our internal codebase, as a second large-scale change. This 241can happen in parallel to closing the loop on the last 5% of required internal 242changes. 243 244Once we have eliminated all the legacy accessors, we will remove the feature 245(breaking change). 246 247### Large-scale Changes with Wire Format Break: Group-Encoded Messages 248 249It turns out that encoding and decoding groups (end-marker-delimited 250submessages) is cheaper than handling length-prefixed messages. There are 251likely CPU and RAM savings in switching messages to use the group encoding. 252Unfortunately, that would be a wire-breaking change, causing old readers to be 253unable to parse new messages. 254 255We can do what we did for `packed`. First, we modify parsers to accept message 256fields that are encoded as either groups or messages (i.e., `TYPE_MESSAGE` and 257`TYPE_GROUP` become synonyms in the deserializer). We will let this soak for 258three years[^2] and bide our time. 259 260After those three years, we can begin a large-scale change to add 261`features.group_encoded` to message fields throughout our internal codebase 262(note that groups don't actually exist in editions; they are just messages with 263`features.group_encoded`). Because of our long waiting period, it is (hopefully) 264unlikely that old readers will be caught by surprise. 265 266Once we are 95% done, we will upgrade protoc to set `features.group_encoded` to 267true by default in new editions. Tooling can be used to clean up features as 268before. 269 270We will probably never completely eliminate length-prefixed messages, so this 271is a rare case where the feature lives on forever. 272 273## Large-scale Change Tooling 274 275We will need a few different tools for minimizing migration toil, all of which 276will be released in OSS. These are: 277 278* The features GC. Running `protoc --gc-features foo.proto` on a file in 279 editions mode will compute the minimal (or a heuristically minimal, if this 280 proves expensive) set of features to set on things, given the edition 281 specified in the file. This will produce a Protochangifier `ProtoChangeSpec` 282 that describes how to clean up the file. 283 284* The editions "adopter". Running `protoc --upgrade-edition -I... file.proto` 285 figure out how to update `file.proto` from `proto2` or `proto3` to the 286 latest edition, adding features as necessary. It will emit this information 287 as a `ProtoChangeSpec`, implicitly running features GC. 288 289* The editions "upgrader". Running `protoc --upgrade-edition` as above on a 290 file that is already in editions mode will bump it up to the latest edition 291 known to `protoc` and add features as necessary. Again, this emits a 292 features GC'd `ProtoChangeSpec`. 293 294This is by no means all the tooling we need, but it will simplify the work of 295robots and beavers, along with any bespoke, internal-codebase-specific tooling 296we build. 297 298## The OSS Story 299 300We need to export our large-scale changes into open source to have any hope of 301editions not splitting the ecosystem. It is impossible to do this the way we do 302large-scale changes in our internal codebase, where we have global approvals and 303a finite but nonzero supply of bureaucratic sticks to motivate reluctant users. 304 305In OSS, we have neither of these things. The only stick we have is breaking 306changes, and the only carrots we can offer are new features. There is no "global 307approval" or "TAP" for OSS. 308 309Our strategy must be a mixture of: 310 311* Convincing users this is a good thing that will help us make Protobuf easier 312 to use, cheaper to deploy, and faster in production. 313* Gently steering users to the new edition in new Protobuf definitions, 314 through protoc diagnostics (when an old edition is going or has gone out of 315 date) and developer tooling (editor integration, new-file-boilerplate 316 templates). 317* Convincing third-party backend vendors (such as Apple, for Swift) that they 318 can leverage editions to fix mistakes. We should go out of our way to design 319 attractive migrations for them to execute. 320* Providing Google-class tooling for migrations. This includes the large-scale 321 change tooling above, and, where possible, specialized tooling. When it is 322 not possible to provide tooling, we should provide detailed migration guides 323 that highlight the benefits. 324* Being clear that we have a breaking changes policy and that we will 325 regularly remove old features after a pre-announced horizon, locking new 326 improvements behind completing migrations. This is a risky proposition, 327 because users may react by digging in their heels. Comms planning is 328 critical. 329 330The common theme is comms and making it clear that these are improvements 331everyone can benefit from, and that there is no "I" in "ecosystem": using 332Protobuf, just like using Abseil, means accepting upgrades as a fact of life, 333not something to be avoided. 334 335We should lean in on lessons learned by Go (see: their `go fix` tool) and Rust 336(see: their `rustfix` tool); Rust in particular has an editions/epoch mechanism 337like we do; they also have feature gates, but those are not the same concept as 338*our* features. We should also lean on the Carbon team's public messaging about 339upgrading being a fact of life, to provide a unified Google front on the matter 340from the view of observers. 341 342### Prior Art: Rust Editions 343 344The design of [Protobuf Editions](what-are-protobuf-editions.md) is directly 345inspired by Rust's own 346[edition system](https://doc.rust-lang.org/edition-guide/editions/index.html)[^3]. 347 348Rust defines and ships a new edition every three years, and focuses on changes 349to the surface language that do not inhibit interop: crates of different 350editions can always be linked together, and "edition" is a parallel ratchet to 351the language/compiler version. 352 353For example, keywords (like `async`) have been introduced using editions. 354Editions have also been used to change the semantics of the borrow checker to 355allow new programs, and to change name resolution rules to be more intuitive. 356For Rust, an edition may require changes to existing code to be able to compile 357again, but *only* at the point that the crate opts into the new edition, to 358obtain some benefit from doing so. 359 360Unlike Protobuf, Rust commits to supporting *all* past editions in perpetuity: 361there is no ratcheting forward of the whole ecosystem. However, Rust does ship 362with `rustfix` (runnable on Cargo projects via `cargo fix`), a tool that can 363upgrade crates to a new edition. Edition changes are *required* to come with a 364migration plan to enable `rustfix`. 365 366Crates therefore have limited pressure to upgrade to the latest edition. It 367provides better features, but because there is no EOL horizon, crates tend to 368stay on old editions to support old compilers. For users, this is a great story, 369and allows old code to work indefinitely. However, there is a maintenance burden 370on the compiler that old editions and new language features (mostly) work 371correctly together. 372 373In Rust, macros present a challenge: rich support for interpreted, declarative 374macros and compiled, fully procedural macros, mean that macros written for older 375editions may not work well in crates written on newer editions, or vice versa. 376There are mitigations for this in the compiler, but such fixes cannot be 377perfect, so this is a source of difficulties in getting total conversion. 378Protobuf does not have macros, but it does have rich descriptors that mirror 379input files, and this is a potential source of problems to watch out for. 380 381Overall, Rust's migration story is poor: they have accepted they need to support 382old editions indefinitely, but only produce an edition every three years. 383Protobuf plans to be much more aggressive, and we should study where Rust's 384leniency to old versions is unavoidable and where it is an explicit design 385choice. 386 387## Notes 388 389[^1]: `ctype` has baggage and I am going to ignore it for the purposes of 390 discussion. The feature is spelled `legacy_string` because adding string 391 view accessors is not likely the only thing to do, given we probably want 392 to change the mutators as well. 393[^2]: The correct size of the horizon is arbitrary, due to the "budget phones in 394 India" problem. Realistically we would need to pick one, start the 395 migration, and halt it if we encounter problems. It is quite difficult to 396 do better than "hope" as our strategy, but `packed` is an existence proof 397 that this is not insurmountable, merely very expensive. 398[^3]: Rust also has feature gates, used mostly so that people may start trying 399 out experimental unstable features. These are largely orthogonal to 400 editions, and tied to compiler versions. Rust's feature gates generally do 401 not change the semantics of existing programs, they just cause new 402 programs to be valid. When a feature is "stabilized", the feature flag is 403 removed. Feature flags do not participate in Rust's stability promises. 404