1# Editions Tooling 2 3**Authors:** [@mcy](https://github.com/mcy) 4 5**Approved:** 2022-08-09 6 7## Overview 8 9[Protobuf Editions](../editions/what-are-protobuf-editions.md) aims to introduce 10new semantics for Protobuf, but with a major emphasis on mechanical, incremental 11upgradability, to avoid the two systems problem of proto2 and proto3. The first 12edition (likely "2023") will introduce *converged semantics* for Protobuf that 13permit everything that proto2 and proto3 permitted: any non-editions file can 14become an editions file with minimal human intervention. 15 16We plan to achieve this with a strong tooling story. These tools are intended to 17fully automate major steps in editions-related upgrade operations, for both large-scale changes 18and open source software strategic reasons. In particular: 19 20* Non-automated large-scale change work in the editions space can be constrained to fixing 21 uses of generated code and flipping features on specific fields (or other 22 declarations). 23* We can give our external users the most painless migration possible, which 24 consists of "run this tool and commit the results". 25 26This document describes the detailed design of the tools we need. This document 27presupposes *Protochangifier Backend Design Doc* (not available externally) integrated into protoc as a prerequisite, so we 28can ship the tooling as part of protoc. Because the tooling must know the full 29definition of an edition to work (see below), it seems to more-or-less place a 30hard requirement of being linked to protoc. 31 32There are three tools we will build. 33 341. The "features janitor". This is a mode of `protoc` which consumes a `.proto` 35 file and produces a `ProtoChangeSpec` that describes how to add and remove 36 features such that the resulting janitor'ed file has fewer explicit 37 features, but is not semantically different. 382. The "editions adopter". This is another mode of `protoc`, which produces a 39 `ProtoChangeSpec` that describes how to bring a `proto2` or `proto3` file 40 into editions mode, starting at a specific edition. 413. The "editions upgrader". This is a generalization of the adopter, which 42 takes an editions file and produces a `ProtoChangeSpec` that brings it into 43 a newer edition. 44 45These tools will fundamentally speak `ProtoChangeSpec`, but we should also 46provide in-place versions, since those will likely be more useful to OSS users 47that just want to run the tool atomically on their entire project. 48 49## The Janitor 50 51The features janitor is intended to be used as part of migrations to 52periodically clean up any messes made by flipping lots of features. 53Conceptually, it turns this proto file 54 55``` 56edition = "2023"; 57message Foo { 58 optional string a = 1 [features.(pb.cpp).string_type = VIEW]; 59 optional string b = 2 [features.(pb.cpp).string_type = VIEW]; 60 optional string c = 3 [features.(pb.cpp).string_type = VIEW]; 61 optional string d = 4 [features.(pb.cpp).string_type = VIEW]; 62 optional string e = 5 [features.(pb.cpp).string_type = VIEW]; 63} 64message Bar { 65 optional string a = 1 [features.(pb.cpp).string_type = VIEW]; 66 optional string b = 2; 67 optional string c = 3; 68 optional string d = 4; 69 optional string e = 5; 70} 71``` 72 73into this one: 74 75``` 76edition = "2023"; 77message Foo { 78 option features.(pb.cpp).string_type = VIEW; 79 optional string a = 1; 80 optional string b = 2; 81 optional string c = 3; 82 optional string d = 4; 83 optional string e = 5; 84} 85message Bar { 86 optional string a = 1 [features.(pb.cpp).string_type = VIEW]; 87 optional string b = 2; 88 optional string c = 3; 89 optional string d = 4; 90 optional string e = 5; 91} 92``` 93 94Specifically, the janitor tries to minimize the number of explicit features on 95the Protobuf schema. Actually doing this minimally feels like it's nonlinear, so 96we should invent a heuristic. A sketch of what this could look like: 97 981. Each feature that can appear explicitly on an AST node is either *critical* 99 for that node or only for grouping. For example, `string_type` is critical 100 for fields but not for messages. 1012. Propagate features explicitly to every node, including edition defaults. 1023. For each feature `f`, for each node `n` that `f` is non-critical for that 103 contains (recursively) nodes that it is critical for (in DFS order): 104 1. Set `f` for `n` to the value for `f` that the plurality of its direct 105 children have, and remove the explicit `f` from those. If tied, choose 106 the edition default if it is among the plurals, or else choose randomly. 1074. Once repeated up to the root, delete all explicit features that are 108 reachable from the root without crossing another explicit feature that isn't 109 the edition default. I.e., those features which are implied by the edition 110 defaults. 111 112It is easy to construct cases where this is not optimal, but that is not 113important. This merely exists to make files prettier while keeping them 114equivalent. It is easy to see that, by construction, this algorithm satisfies 115the "semantic no-op" requirement. 116 117## The Adopter and the Updater 118 119The adopter is merely a special case of the updater where `proto2` and `proto3` 120are viewed as editions (in the sense that an edition is a set of defaults), so 121we will only describe the updater. 122 123To update one edition ("old") to another ("new", although not necessarily a 124newer edition): 125 1261. Features that are not already explicitly set at the top level are set to the 127 default given by "old"; they are only set on the outermost scope that does 128 not have an explicit feature. For example, for file-level features, this 129 means making all features explicit at the file level. For message-level 130 features that are not file-level, this means placing an explicit feature on 131 all top-level messages. This is a no-op, because `edition = "old";` implies 132 this. 1332. The file's edition is set from "old" to "new". Because every feature that 134 could be explicit is explicit, this is a no-op. 1353. Feature janitor runs. This explicitly propagates all features (all of which 136 are set explicitly at the top level), and then cleans them up with respect 137 to the "new" edition; note that feature janitor gives preference to editions 138 defaults. This is a no-op, because feature janitor is a no-op. 139 140## UX Concerns 141 142Bundling the editions tooling with `protoc` ensures that it is easy to find. The 143following will be the pattern for all Protochangifier tooling bundled into 144`protoc`: 145 146* There is a flag `--change_spec=changespec.pb` which will cause protoc to 147 apply a changespec to the passed-in `.proto` file, e.g. `protoc 148 --change_spec=spec.pb --change_out=foo-changed.proto foo.proto`. This writes 149 the change to `foo-changed.proto`. This may be the same file as `foo.proto` 150 for in-place updates; it may be left out to have the change printed to 151 stdout. This is the core entry-point for Protochanfigier. 152* There is a flag `--my_analysis` for the given analysis, e.g. `--janitor`. 153 This flag can have an optional argument: if set, it will output the change 154 spec to that path, e.g. `--janitor=spec.pb`. If it is not passed in, the 155 change is applied in place without the need to use `protoc --change_spec`. 156 157Alternatively, we could provide these as standalone tools. However, it seems 158useful from a distribution perspective and user education perspective to say 159"this is just part of the compiler". We expect to produce new migration tooling 160with Protochangifier on an ongoing basis, so teaching users that every analysis 161looks the same is important. Compare `rustfix`, the tool that Rust uses for 162things like upgrading editions. Although it is a separate binary, it is 163accessible through `cargo fix`, and in a lot of ways `cargo` is the user-facing 164interface to Rust; having it be part of the "swiss army knife" helps put it in 165front of users. 166