• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Edition Zero: Converged Semantics
2
3**Authors:** [@perezd](https://github.com/perezd),
4[@haberman](https://github.com/haberman)
5
6**Approved:** 2021-10-07
7
8## Background
9
10The Protobuf Team has been exploring potential facilities for introducing
11breaking API and semantic changes. This document is an attempt to make use of
12these facilities to unify proto semantics from this point forward, while giving
13customers the ability to more-granularly manage their project's specific needs.
14
15## Objective
16
17We want to reduce complications of API semantics that are coarsely managed
18through the syntax keyword, and instead default to converged proto2/proto3 when
19opting into editions. Where needed, customers will be able to opt out of
20specific semantics that are incompatible with their existing usages, at a
21fine-grained level, using the new capabilities provided by editions + features.
22
23### Why Now?
24
25As we introduce new facilities for managing breaking changes, we have an
26additional opportunity to cutover and rectify a long-standing vision of
27converging proto 2/3 semantics as a natural extension of this.
28
29Doing this in lockstep with the introduction of editions provides the protobuf
30team with a few valuable outcomes:
31
32*   Editions provides us with more granular specification of intent than the
33    existing coarse knob of "proto2" or "proto3." By opting into our first
34    edition, customers are upgrading to what we've referred to in the past as
35    "converged semantics," and if needed can reversibly downgrade back to proto2
36    or proto3 semantics respectively by opt-ing out of the specific features
37    that are incompatible with their existing needs.
38
39*   The protobuf team can avoid the n^2 complexity of considering how an
40    edition/feature will interplay with an explicit syntax designation of
41    "proto2" vs "proto3" for all impacted runtimes. This allows us to transition
42    our thinking/support model to be explicitly feature-centric.
43
44*   The introduction of editions will almost certainly cause a major version
45    bump and gives us ample justification to make breaking changes as we
46    transition to this granular specification.
47
48## Introduction of the `edition` keyword to proto IDL
49
50The `edition` keyword is used to define which semantic version a particular file
51and all of its contents will adhere to as a baseline. Whenever a proto file
52declares an `edition` keyword, it automatically defaults to converged proto2/3
53semantics.
54
55An edition's value is represented as a string, encoded by convention as a year.
56
57## Introduction of `features` option to `descriptor.proto`
58
59This option will be uniformly defined as a repeated set of strings which can be
60used to encode the ability to opt-out of a specific feature (eg:
61`"-string_view"`), or to potentially opt-in to a future/experimental feature
62(eg: `"string_view"`). The `features` option will be added to `descriptor.proto`
63for the following descriptor options:
64
65*   File
66*   Message
67*   Field
68*   Enum
69*   Enum Value
70*   Oneof
71*   Service
72*   Method
73*   Stream (internal repositories only)
74
75Features are only respected when used in conjunction with the `edition` keyword.
76They are not validated for correctness to ensure they are forward/backward
77compatible with releases.
78
79Features may be declared at any descriptor level, however, a feature definition
80may influence descendant types at the discretion of the protobuf team. (e.g., a
81file-level feature opt-out could impact all fields within the file, if it was
82desired).
83
84## A taxonomy of features
85
86Features can be broken down into two main categories: language-specific and
87semantic.
88
89### Language-specific features
90
91Language-specific features pertain to the generated API for a given language.
92Referring to the protobuf breaking changes backlog we can see some examples:
93
94*   (C++) Changing string fields to return `string_view`.
95*   (Java) Removing the confusing `Enum#valueOf(int)` API.
96*   (Java) Rename oneof enums to do appropriate camel casing.
97
98Language-specific features have no meaning for any other language: they can be
99ignored entirely. They are, in essence, a private (tunneled) interface between
100protobuf IDL and the respective code generator. Each language's code generator
101can independently decide what the "base" set of features is for any given
102edition. Each language defines the migration path between editions
103independently.
104
105### Semantic Features
106
107Semantic features define behavior changes that apply to the protobuf data model,
108independent of language. These can also have API implications, but their meaning
109goes deeper than just a surface-level API. Some examples of semantic features:
110
111*   Open enums (enums are placed directly into the field instead of the
112    `UnknownFieldSet`).
113*   Packed (whether repeated fields are packed on the wire)
114
115Semantic features have significantly broader scope, since they must be respected
116across languages, and each language must implement the semantic correctly. This
117also implies that every language must either (1) know the canonical set of "base
118features" for each edition, or (2) that the set of "default" features for the
119edition must be resolved in protoc itself and propagated explicitly into the
120descriptor.
121
122## Rev'ing the protobuf IDL vs. descriptor.proto
123
124Changing `descriptor.proto` to reflect editions is a much more intrusive change
125than changing just the protobuf IDL. The protobuf IDL is parsed and resolved in
126protoc, and we have only a single implementation of that parser. Any change that
127can be resolved in the parser alone is relatively unintrusive (though there are
128build horizon issues since GCL parses protos in prod).
129
130Rev'ing `descriptor.proto` is a far more intrusive change that affects many
131downstream systems. Many systems access descriptors through either a descriptor
132API (for example, `google::protobuf::Descriptor` in C++) or by directly accessing a proto
133from `descriptor.proto` (eg. `google.protobuf.DescriptorProto`). Any changes
134here need to be managed much more delicately.
135
136## Deprecation of the `syntax` keyword from proto IDL
137
138The `syntax` keyword shall no longer be required/observed when an `edition`
139keyword is present, as it is now considered redundant. If `edition` and `syntax`
140are both present, `edition` takes precedence and `syntax` is ignored.
141
142## Migrating from `proto2` and `proto3` to Editions + Features
143
144Today's usage of syntax opaquely bundles a collection of implied feature flags
145that are set based on the presence of `proto2` or `proto3`. This is often a
146source of confusion for customers (eg: what am I gaining by moving to proto3?
147What am I losing?).
148
149By deciding that editions/features exist in a state of proto2/3 convergence,
150this enables customers to decide for themselves what features are important to
151their usage of protos.
152
153Migrating existing users of proto2 and proto3 to editions w/converged semantics
154would mean we'd need to execute a large-scale change to make their
155implicit/implied behavior explicit. Here are examples of implied behavior.
156today:
157
158<table>
159  <tr>
160   <td><strong>Feature</strong>
161   </td>
162   <td><strong><code>proto2</code> implied behavior</strong>
163   </td>
164   <td><strong><code>proto3</code> implied behavior</strong>
165   </td>
166  </tr>
167  <tr>
168   <td>packed_repeated_primitives
169   </td>
170   <td>��
171   </td>
172   <td>✅
173   </td>
174  </tr>
175  <tr>
176   <td>extensions
177   </td>
178   <td>✅
179   </td>
180   <td>��
181   </td>
182  </tr>
183  <tr>
184   <td>required
185   </td>
186   <td>✅
187   </td>
188   <td>��
189   </td>
190  </tr>
191  <tr>
192   <td>groups
193   </td>
194   <td>✅
195   </td>
196   <td>��
197   </td>
198  </tr>
199  <tr>
200   <td>cpp_string_view
201   </td>
202   <td>��
203   </td>
204   <td>��
205   </td>
206  </tr>
207  <tr>
208   <td>java_enum_no_value_of
209   </td>
210   <td>��
211   </td>
212   <td>��
213   </td>
214  </tr>
215  <tr>
216   <td>open_enums
217   </td>
218   <td>��
219   </td>
220   <td>✅
221   </td>
222  </tr>
223  <tr>
224   <td>MORE STUFF ...
225   </td>
226   <td>
227   </td>
228   <td>
229   </td>
230  </tr>
231</table>
232
233### Managing Complexity of `features` for Large Deployments
234
235A separate concept has been established to help mitigate the complexity of
236editions and progressive feature rollouts and synchronizations for larger proto
237projects.
238
239This facility could be used to migrate existing usages of the `syntax` keyword
240to use Editions + Features across google3, for example.
241
242## Prior Work
243
244*   proto{2,3} Convergence Vision (not available externally)
245
246*   Epochs for descriptor.proto (not available externally)
247
248*   Rust editions: https://doc.rust-lang.org/edition-guide/editions/index.html
249