• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Prototiller Requirements for Edition Zero
2
3**Authors:** [@mkruskal-google](https://github.com/mkruskal-google)
4
5**Approved:** 2023-07-07
6
7## Background
8
9[Edition Zero Features](edition-zero-features.md) lays out the design for our
10first edition, which will unify proto2 and proto3 via features. In order to
11migrate internal Google repositories (and aid OSS migrations), we will be
12leveraging Prototiller to upgrade from legacy syntax to the new editions model.
13We will also be using Prototiller for every edition bump, but that's out of
14scope for this document (more general requirements are laid out in
15[Prototiller Requirements for Editions](prototiller-reqs-for-editions.md)).
16
17## Overview
18
19The way the edition zero features were derived, there will always exist a no-op
20transformation from proto2/proto3. However, the details of this transformation
21can be fairly complex and depend on a lot of different factors.
22
23A temporary script has been created as a placeholder, which manages to get
24fairly good coverage of these rules. Notably, it can't handle groups inside
25extensions or oneofs. This, along with its golden tests, can serve as a useful
26benchmark for Prototiller.
27
28### Feature Optimization
29
30One important piece of Prototiller that will ease the friction of the Edition
31Zero large-scale change is a feature optimization phase. If certain features
32aren't necessary to make the upgrade a no-op, we shouldn't add them and should
33instead rely on the edition defaults for future changes. Similarly, we should
34try to minimize the total size of the change by collapsing a feature
35specification to a higher level (e.g. file-level defaults of a field feature).
36
37## Frontend Feature Transformations
38
39This section details all of our frontend features and how to transform from
40proto2/proto3 to edition zero.
41
42### Field Presence
43
44The `field_presence` feature defaults to `EXPLICIT`, which matches proto2/proto3
45optional behavior. The `LEGACY_REQUIRED` value corresponds to proto2 required
46fields, and `IMPLICIT` corresponds to non-optional proto3 fields. In order to
47minimize changes, file-level defaults should be utilized.
48
49Example transformations:
50
51<table>
52  <tr>
53   <td>
54<pre>syntax = "proto2";
55message Foo {
56  optional string bar = 1;
57  required string baz = 2;
58}</pre>
59   </td>
60   <td>
61<pre>edition = "2023";
62
63message Foo {
64  string bar = 1;
65  string baz = 2 [
66    features.field_presence = LEGACY_REQUIRED];
67}</pre>
68   </td>
69  </tr>
70</table>
71
72<table>
73  <tr>
74   <td>
75<pre>syntax = "proto3";
76
77message Foo {
78  optional string bar = 1;
79  string baz = 2;
80  string bam = 3;
81}</pre>
82   </td>
83   <td>
84<pre>edition = "2023";
85features.field_presence = IMPLICIT;
86
87message Foo {
88  string bar = 1 [features.field_presence = EXPLICIT];
89  string baz = 2;
90  string bam = 3;
91}</pre>
92   </td>
93  </tr>
94</table>
95
96<table>
97  <tr>
98   <td>
99<pre>syntax = "proto3";
100
101message Foo {
102  optional string bar = 1;
103  optional string baz = 2;
104}</pre>
105   </td>
106   <td>
107<pre>edition = "2023";
108
109message Foo {
110  string bar = 1;
111  string baz = 2;
112}</pre>
113   </td>
114  </tr>
115</table>
116
117### Enum Type
118
119The `enum_type` feature defaults to `OPEN`, which matches proto3 behavior. The
120`CLOSED` value corresponds to typical proto2 behavior. In order to minimize
121changes, file-level defaults should be utilized.
122
123Example transformations:
124
125<table>
126  <tr>
127   <td>
128<pre>syntax = "proto2";
129
130enum Foo {
131  VALUE1 = 0;
132  VALUE2 = 1;
133}</pre>
134   </td>
135   <td>
136<pre>edition = "2023";
137features.enum_type = CLOSED;
138
139enum Foo {
140  VALUE1 = 0;
141  VALUE2 = 1;
142}</pre>
143   </td>
144  </tr>
145</table>
146
147<table>
148  <tr>
149   <td>
150<pre>syntax = "proto3";
151
152enum Foo {
153  VALUE1 = 0;
154  VALUE2 = 1;
155}</pre>
156   </td>
157   <td>
158<pre>edition = "2023";
159
160enum Foo {
161  VALUE1 = 0;
162  VALUE2 = 1;
163}</pre>
164   </td>
165  </tr>
166</table>
167
168### Repeated Field Encoding
169
170The `repeated_field_encoding` feature defaults to `PACKED`, which matches proto3
171behavior. The `EXPANDED` value corresponds to the default proto2 behavior. Both
172proto2 and proto3 can have the default behavior overridden by using the `packed`
173field option. All of these should be replaced in the migration to edition zero.
174Minimization of changes will be a little more complicated here, since there
175could exist files where the majority of repeated fields have been overridden.
176
177Example transformations:
178
179<table>
180  <tr>
181   <td>
182<pre>syntax = "proto2";
183
184message Foo {
185  repeated int32 bar = 1;
186  repeated int32 baz = 2 [packed = true];
187  repeated int32 bam = 3;
188}</pre>
189   </td>
190   <td>
191<pre>edition = "2023";
192features.repeated_field_encoding = EXPANDED;
193
194message Foo {
195  repeated int32 bar = 1;
196  repeated int32 baz = 2 [
197    features.repeated_field_encoding = PACKED];
198  repeated int32 bar = 3;
199}</pre>
200   </td>
201  </tr>
202</table>
203
204<table>
205  <tr>
206   <td>
207<pre>syntax = "proto3";
208
209message Foo {
210  repeated int32 bar = 1;
211  repeated int32 baz = 2 [packed = false];
212}</pre>
213   </td>
214   <td>
215<pre>edition = "2023";
216
217message Foo {
218  repeated int32 bar = 2;
219  repeated int32 baz = 2 [
220    features.repeated_field_encoding = EXPANDED];
221}</pre>
222   </td>
223  </tr>
224</table>
225
226<table>
227  <tr>
228   <td>
229<pre>syntax = "proto2";
230
231message Foo {
232  repeated int32 x = 1 [packed = true];
233  // Strings are never packed.
234  repeated string z = 1;
235  repeated string w = 2;
236}</pre>
237   </td>
238   <td>
239<pre>edition = "2023";
240
241message Foo {
242  repeated int32 x = 1;
243  repeated string z = 1;
244  repeated string w = 2;
245}</pre>
246   </td>
247  </tr>
248</table>
249
250### Message Encoding
251
252The `message_encoding` feature is designed to replace the proto2-only `group`
253syntax (with value `DELIMITED`), with a default that will always be
254`LENGTH_PREFIXED`. This is a somewhat awkward transformation in the general
255case, since we allow group definitions anywhere fields exist even if message
256definitions can't. The basic transformation is to create a new message type in
257the nearest enclosing scope with the same name as the field, and lowercase the
258field and give it that type.
259
260Example transformations:
261
262<table>
263  <tr>
264   <td>
265<pre>syntax = "proto2";
266
267message Foo {
268  optional group Bar = 1 {
269    optional int32 x = 1;
270  }
271  optional Bar baz = 2;
272}</pre>
273   </td>
274   <td>
275<pre>edition = "2023";
276
277message Foo {
278  message Bar {
279    int32 x = 1;
280  }
281  Bar bar = 1 [features.message_encoding = DELIMITED];
282  Bar baz = 2;
283}</pre>
284   </td>
285  </tr>
286</table>
287
288<table>
289  <tr>
290   <td>
291<pre>syntax = "proto2";
292
293message Foo {
294  oneof foo {
295    group Bar = 1 {
296      optional int32 x = 1;
297    }
298  }
299}</pre>
300   </td>
301   <td>
302<pre>edition = "2023";
303
304message Foo {
305  message Bar {
306    int32 x = 1;
307  }
308  oneof foo {
309    Bar bar = 1 [
310      features.message_encoding = DELIMITED];
311  }
312}</pre>
313   </td>
314  </tr>
315</table>
316
317### JSON Format
318
319The `json_format` feature is a bit of an outlier, because (at least for edition
320zero) it only affects the frontend build of the proto file. The `ALLOW` value
321(proto3 behavior) enables all JSON mapping conflict checks on field names,
322unless `deprecated_legacy_json_field_conflicts` is set. The `LEGACY_BEST_EFFORT`
323value (proto2 behavior) disables these checks. The ideal minimal transformation
324would be to switch to `ALLOW` in all cases except where
325`deprecated_legacy_json_field_conflicts` is set or there exist JSON mapping
326conflicts. In those cases we can fallback to `LEGACY_BEST_EFFORT`.
327
328Alternatively, if it's difficult for Prototiller to handle, we could do a
329followup large-scale change to remove all `LEGACY_BEST_EFFORT` instances that
330pass build.
331
332Example transformations:
333
334<table>
335  <tr>
336   <td>
337<pre>syntax = "proto2";
338
339message Foo {
340  optional string bar = 1;
341  optional string baz = 2;
342}</pre>
343   </td>
344   <td>
345<pre>edition = "2023";
346
347message Foo {
348  string bar = 1;
349  string baz = 2;
350}</pre>
351   </td>
352  </tr>
353</table>
354
355<table>
356  <tr>
357   <td>
358<pre>syntax = "proto3";
359
360message Foo {
361  string bar = 1;
362  string baz = 2;
363}</pre>
364   </td>
365   <td>
366<pre>edition = "2023";
367features.field_presence = IMPLICIT;
368
369message Foo {
370  string bar = 1;
371  string baz = 2;
372}</pre>
373   </td>
374  </tr>
375</table>
376
377<table>
378  <tr>
379   <td>
380<pre>syntax = "proto2";
381
382message Foo {
383  // Warning only
384  string bar = 1;
385  string bar_ = 2;
386}</pre>
387   </td>
388   <td>
389<pre>edition = "2023";
390features.json_format = LEGACY_BEST_EFFORT;
391
392message Foo {
393  string bar = 1;
394  string bar_ = 2;
395}</pre>
396   </td>
397  </tr>
398</table>
399
400<table>
401  <tr>
402   <td>
403<pre>syntax = "proto3";
404
405message Foo {
406  option
407  deprecated_legacy_json_field_conflicts = true;
408  string bar = 1;
409  string baz = 2 [json_name = "bar"];
410}</pre>
411   </td>
412   <td>
413<pre>edition = "2023";
414features.field_presence = IMPLICIT;
415features.json_format = LEGACY_BEST_EFFORT;
416
417message Foo {
418  string bar = 1;
419  string baz = 2;
420}</pre>
421   </td>
422  </tr>
423</table>
424
425## Backend Feature Transformations
426
427This section details our backend-specific features and how to transform from
428proto2/proto3 to edition zero.
429
430In order to limit bloat, it would be ideal if we could check whether or not a
431proto file is ever used to generate code in the target language. If it's not,
432there's no reason to add backend-specific features.
433
434### Legacy Closed Enum
435
436Java and C++ by default treat proto3 enums as closed if they're used in a proto2
437message. The internal `cc_open_enum` field option can override this, but it has
438**very** limited use and may not be worth considering. While the enum type
439behavior should still be determined by Enum Type, we'll need to add this feature
440to proto2 files using proto3 messages (the reverse is disallowed).
441
442Example transformations:
443
444<table>
445  <tr>
446   <td>
447<pre>syntax = "proto2";
448
449import "some_proto3_file.proto"
450
451enum Proto2Enum {
452  BAR = 0;
453}
454
455message Foo {
456  optional Proto3Enum bar = 1;
457  optional Proto2Enum baz = 2;
458}</pre>
459   </td>
460   <td>
461<pre>edition = "2023";
462import "third_party/protobuf/cpp_features.proto"
463import "third_party/protobuf/java_features.proto"
464import "some_proto3_file.proto"
465
466features.enum_type = CLOSED;
467
468message Foo {
469  Proto3Enum bar = 1 [
470    features.(pb.cpp).legacy_closed_enum = true,
471    features.(pb.java).legacy_closed_enum = true];
472  Proto2Enum baz = 2;
473}</pre>
474   </td>
475  </tr>
476</table>
477
478### UTF8 Validation
479
480This feature is pending approval of *Editions Zero Feature: utf8_validation*
481(not available externally).
482
483## Other Transformations
484
485In addition to features, there are some other changes we've made for edition
486zero.
487
488### Reserved Identifier Syntax
489
490With *Protobuf Change Proposal: Reserved Identifiers* (not available
491externally), we've decided to switch from strings to identifiers for reserved
492fields. This *should* be a trivial change, but if the proto file contains
493strings that aren't valid identifiers there's some ambiguity. They're currently
494ignored today, but they could be typos we wouldn't want to just blindly delete.
495So instead, we'll leave behind a comment.
496
497Example transformations:
498
499<table>
500  <tr>
501   <td>
502<pre>syntax = "proto2";
503
504message Foo {
505  reserved "bar", "baz";
506}</pre>
507   </td>
508   <td>
509<pre>edition = "2023";
510
511message Foo {
512  reserved bar, baz;
513}</pre>
514   </td>
515  </tr>
516</table>
517
518<table>
519  <tr>
520   <td>
521<pre>syntax = "proto2";
522
523message Foo {
524  reserved "bar", "1";
525}</pre>
526   </td>
527   <td>
528<pre>edition = "2023";
529
530message Foo {
531  reserved bar;
532  /*reserved "1";*/
533}</pre>
534   </td>
535  </tr>
536</table>
537