• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Editions Tooling
2
3**Authors:** [@mcy](https://github.com/mcy)
4
5**Approved:** 2022-08-09
6
7## Overview
8
9[Protobuf Editions](../editions/what-are-protobuf-editions.md) aims to introduce
10new semantics for Protobuf, but with a major emphasis on mechanical, incremental
11upgradability, to avoid the two systems problem of proto2 and proto3. The first
12edition (likely "2023") will introduce *converged semantics* for Protobuf that
13permit everything that proto2 and proto3 permitted: any non-editions file can
14become an editions file with minimal human intervention.
15
16We plan to achieve this with a strong tooling story. These tools are intended to
17fully automate major steps in editions-related upgrade operations, for both large-scale changes
18and open source software strategic reasons. In particular:
19
20*   Non-automated large-scale change work in the editions space can be constrained to fixing
21    uses of generated code and flipping features on specific fields (or other
22    declarations).
23*   We can give our external users the most painless migration possible, which
24    consists of "run this tool and commit the results".
25
26This document describes the detailed design of the tools we need. This document
27presupposes *Protochangifier Backend Design Doc* (not available externally) integrated into protoc as a prerequisite, so we
28can ship the tooling as part of protoc. Because the tooling must know the full
29definition of an edition to work (see below), it seems to more-or-less place a
30hard requirement of being linked to protoc.
31
32There are three tools we will build.
33
341.  The "features janitor". This is a mode of `protoc` which consumes a `.proto`
35    file and produces a `ProtoChangeSpec` that describes how to add and remove
36    features such that the resulting janitor'ed file has fewer explicit
37    features, but is not semantically different.
382.  The "editions adopter". This is another mode of `protoc`, which produces a
39    `ProtoChangeSpec` that describes how to bring a `proto2` or `proto3` file
40    into editions mode, starting at a specific edition.
413.  The "editions upgrader". This is a generalization of the adopter, which
42    takes an editions file and produces a `ProtoChangeSpec` that brings it into
43    a newer edition.
44
45These tools will fundamentally speak `ProtoChangeSpec`, but we should also
46provide in-place versions, since those will likely be more useful to OSS users
47that just want to run the tool atomically on their entire project.
48
49## The Janitor
50
51The features janitor is intended to be used as part of migrations to
52periodically clean up any messes made by flipping lots of features.
53Conceptually, it turns this proto file
54
55```
56edition = "2023";
57message Foo {
58  optional string a = 1 [features.(pb.cpp).string_type = VIEW];
59  optional string b = 2 [features.(pb.cpp).string_type = VIEW];
60  optional string c = 3 [features.(pb.cpp).string_type = VIEW];
61  optional string d = 4 [features.(pb.cpp).string_type = VIEW];
62  optional string e = 5 [features.(pb.cpp).string_type = VIEW];
63}
64message Bar {
65  optional string a = 1 [features.(pb.cpp).string_type = VIEW];
66  optional string b = 2;
67  optional string c = 3;
68  optional string d = 4;
69  optional string e = 5;
70}
71```
72
73into this one:
74
75```
76edition = "2023";
77message Foo {
78  option features.(pb.cpp).string_type = VIEW;
79  optional string a = 1;
80  optional string b = 2;
81  optional string c = 3;
82  optional string d = 4;
83  optional string e = 5;
84}
85message Bar {
86  optional string a = 1 [features.(pb.cpp).string_type = VIEW];
87  optional string b = 2;
88  optional string c = 3;
89  optional string d = 4;
90  optional string e = 5;
91}
92```
93
94Specifically, the janitor tries to minimize the number of explicit features on
95the Protobuf schema. Actually doing this minimally feels like it's nonlinear, so
96we should invent a heuristic. A sketch of what this could look like:
97
981.  Each feature that can appear explicitly on an AST node is either *critical*
99    for that node or only for grouping. For example, `string_type` is critical
100    for fields but not for messages.
1012.  Propagate features explicitly to every node, including edition defaults.
1023.  For each feature `f`, for each node `n` that `f` is non-critical for that
103    contains (recursively) nodes that it is critical for (in DFS order):
104    1.  Set `f` for `n` to the value for `f` that the plurality of its direct
105        children have, and remove the explicit `f` from those. If tied, choose
106        the edition default if it is among the plurals, or else choose randomly.
1074.  Once repeated up to the root, delete all explicit features that are
108    reachable from the root without crossing another explicit feature that isn't
109    the edition default. I.e., those features which are implied by the edition
110    defaults.
111
112It is easy to construct cases where this is not optimal, but that is not
113important. This merely exists to make files prettier while keeping them
114equivalent. It is easy to see that, by construction, this algorithm satisfies
115the "semantic no-op" requirement.
116
117## The Adopter and the Updater
118
119The adopter is merely a special case of the updater where `proto2` and `proto3`
120are viewed as editions (in the sense that an edition is a set of defaults), so
121we will only describe the updater.
122
123To update one edition ("old") to another ("new", although not necessarily a
124newer edition):
125
1261.  Features that are not already explicitly set at the top level are set to the
127    default given by "old"; they are only set on the outermost scope that does
128    not have an explicit feature. For example, for file-level features, this
129    means making all features explicit at the file level. For message-level
130    features that are not file-level, this means placing an explicit feature on
131    all top-level messages. This is a no-op, because `edition = "old";` implies
132    this.
1332.  The file's edition is set from "old" to "new". Because every feature that
134    could be explicit is explicit, this is a no-op.
1353.  Feature janitor runs. This explicitly propagates all features (all of which
136    are set explicitly at the top level), and then cleans them up with respect
137    to the "new" edition; note that feature janitor gives preference to editions
138    defaults. This is a no-op, because feature janitor is a no-op.
139
140## UX Concerns
141
142Bundling the editions tooling with `protoc` ensures that it is easy to find. The
143following will be the pattern for all Protochangifier tooling bundled into
144`protoc`:
145
146*   There is a flag `--change_spec=changespec.pb` which will cause protoc to
147    apply a changespec to the passed-in `.proto` file, e.g. `protoc
148    --change_spec=spec.pb --change_out=foo-changed.proto foo.proto`. This writes
149    the change to `foo-changed.proto`. This may be the same file as `foo.proto`
150    for in-place updates; it may be left out to have the change printed to
151    stdout. This is the core entry-point for Protochanfigier.
152*   There is a flag `--my_analysis` for the given analysis, e.g. `--janitor`.
153    This flag can have an optional argument: if set, it will output the change
154    spec to that path, e.g. `--janitor=spec.pb`. If it is not passed in, the
155    change is applied in place without the need to use `protoc --change_spec`.
156
157Alternatively, we could provide these as standalone tools. However, it seems
158useful from a distribution perspective and user education perspective to say
159"this is just part of the compiler". We expect to produce new migration tooling
160with Protochangifier on an ongoing basis, so teaching users that every analysis
161looks the same is important. Compare `rustfix`, the tool that Rust uses for
162things like upgrading editions. Although it is a separate binary, it is
163accessible through `cargo fix`, and in a lot of ways `cargo` is the user-facing
164interface to Rust; having it be part of the "swiss army knife" helps put it in
165front of users.
166