1# Prototiller Requirements for Edition Zero 2 3**Authors:** [@mkruskal-google](https://github.com/mkruskal-google) 4 5**Approved:** 2023-07-07 6 7## Background 8 9[Edition Zero Features](edition-zero-features.md) lays out the design for our 10first edition, which will unify proto2 and proto3 via features. In order to 11migrate internal Google repositories (and aid OSS migrations), we will be 12leveraging Prototiller to upgrade from legacy syntax to the new editions model. 13We will also be using Prototiller for every edition bump, but that's out of 14scope for this document (more general requirements are laid out in 15[Prototiller Requirements for Editions](prototiller-reqs-for-editions.md)). 16 17## Overview 18 19The way the edition zero features were derived, there will always exist a no-op 20transformation from proto2/proto3. However, the details of this transformation 21can be fairly complex and depend on a lot of different factors. 22 23A temporary script has been created as a placeholder, which manages to get 24fairly good coverage of these rules. Notably, it can't handle groups inside 25extensions or oneofs. This, along with its golden tests, can serve as a useful 26benchmark for Prototiller. 27 28### Feature Optimization 29 30One important piece of Prototiller that will ease the friction of the Edition 31Zero large-scale change is a feature optimization phase. If certain features 32aren't necessary to make the upgrade a no-op, we shouldn't add them and should 33instead rely on the edition defaults for future changes. Similarly, we should 34try to minimize the total size of the change by collapsing a feature 35specification to a higher level (e.g. file-level defaults of a field feature). 36 37## Frontend Feature Transformations 38 39This section details all of our frontend features and how to transform from 40proto2/proto3 to edition zero. 41 42### Field Presence 43 44The `field_presence` feature defaults to `EXPLICIT`, which matches proto2/proto3 45optional behavior. The `LEGACY_REQUIRED` value corresponds to proto2 required 46fields, and `IMPLICIT` corresponds to non-optional proto3 fields. In order to 47minimize changes, file-level defaults should be utilized. 48 49Example transformations: 50 51<table> 52 <tr> 53 <td> 54<pre>syntax = "proto2"; 55message Foo { 56 optional string bar = 1; 57 required string baz = 2; 58}</pre> 59 </td> 60 <td> 61<pre>edition = "2023"; 62 63message Foo { 64 string bar = 1; 65 string baz = 2 [ 66 features.field_presence = LEGACY_REQUIRED]; 67}</pre> 68 </td> 69 </tr> 70</table> 71 72<table> 73 <tr> 74 <td> 75<pre>syntax = "proto3"; 76 77message Foo { 78 optional string bar = 1; 79 string baz = 2; 80 string bam = 3; 81}</pre> 82 </td> 83 <td> 84<pre>edition = "2023"; 85features.field_presence = IMPLICIT; 86 87message Foo { 88 string bar = 1 [features.field_presence = EXPLICIT]; 89 string baz = 2; 90 string bam = 3; 91}</pre> 92 </td> 93 </tr> 94</table> 95 96<table> 97 <tr> 98 <td> 99<pre>syntax = "proto3"; 100 101message Foo { 102 optional string bar = 1; 103 optional string baz = 2; 104}</pre> 105 </td> 106 <td> 107<pre>edition = "2023"; 108 109message Foo { 110 string bar = 1; 111 string baz = 2; 112}</pre> 113 </td> 114 </tr> 115</table> 116 117### Enum Type 118 119The `enum_type` feature defaults to `OPEN`, which matches proto3 behavior. The 120`CLOSED` value corresponds to typical proto2 behavior. In order to minimize 121changes, file-level defaults should be utilized. 122 123Example transformations: 124 125<table> 126 <tr> 127 <td> 128<pre>syntax = "proto2"; 129 130enum Foo { 131 VALUE1 = 0; 132 VALUE2 = 1; 133}</pre> 134 </td> 135 <td> 136<pre>edition = "2023"; 137features.enum_type = CLOSED; 138 139enum Foo { 140 VALUE1 = 0; 141 VALUE2 = 1; 142}</pre> 143 </td> 144 </tr> 145</table> 146 147<table> 148 <tr> 149 <td> 150<pre>syntax = "proto3"; 151 152enum Foo { 153 VALUE1 = 0; 154 VALUE2 = 1; 155}</pre> 156 </td> 157 <td> 158<pre>edition = "2023"; 159 160enum Foo { 161 VALUE1 = 0; 162 VALUE2 = 1; 163}</pre> 164 </td> 165 </tr> 166</table> 167 168### Repeated Field Encoding 169 170The `repeated_field_encoding` feature defaults to `PACKED`, which matches proto3 171behavior. The `EXPANDED` value corresponds to the default proto2 behavior. Both 172proto2 and proto3 can have the default behavior overridden by using the `packed` 173field option. All of these should be replaced in the migration to edition zero. 174Minimization of changes will be a little more complicated here, since there 175could exist files where the majority of repeated fields have been overridden. 176 177Example transformations: 178 179<table> 180 <tr> 181 <td> 182<pre>syntax = "proto2"; 183 184message Foo { 185 repeated int32 bar = 1; 186 repeated int32 baz = 2 [packed = true]; 187 repeated int32 bam = 3; 188}</pre> 189 </td> 190 <td> 191<pre>edition = "2023"; 192features.repeated_field_encoding = EXPANDED; 193 194message Foo { 195 repeated int32 bar = 1; 196 repeated int32 baz = 2 [ 197 features.repeated_field_encoding = PACKED]; 198 repeated int32 bar = 3; 199}</pre> 200 </td> 201 </tr> 202</table> 203 204<table> 205 <tr> 206 <td> 207<pre>syntax = "proto3"; 208 209message Foo { 210 repeated int32 bar = 1; 211 repeated int32 baz = 2 [packed = false]; 212}</pre> 213 </td> 214 <td> 215<pre>edition = "2023"; 216 217message Foo { 218 repeated int32 bar = 2; 219 repeated int32 baz = 2 [ 220 features.repeated_field_encoding = EXPANDED]; 221}</pre> 222 </td> 223 </tr> 224</table> 225 226<table> 227 <tr> 228 <td> 229<pre>syntax = "proto2"; 230 231message Foo { 232 repeated int32 x = 1 [packed = true]; 233 // Strings are never packed. 234 repeated string z = 1; 235 repeated string w = 2; 236}</pre> 237 </td> 238 <td> 239<pre>edition = "2023"; 240 241message Foo { 242 repeated int32 x = 1; 243 repeated string z = 1; 244 repeated string w = 2; 245}</pre> 246 </td> 247 </tr> 248</table> 249 250### Message Encoding 251 252The `message_encoding` feature is designed to replace the proto2-only `group` 253syntax (with value `DELIMITED`), with a default that will always be 254`LENGTH_PREFIXED`. This is a somewhat awkward transformation in the general 255case, since we allow group definitions anywhere fields exist even if message 256definitions can't. The basic transformation is to create a new message type in 257the nearest enclosing scope with the same name as the field, and lowercase the 258field and give it that type. 259 260Example transformations: 261 262<table> 263 <tr> 264 <td> 265<pre>syntax = "proto2"; 266 267message Foo { 268 optional group Bar = 1 { 269 optional int32 x = 1; 270 } 271 optional Bar baz = 2; 272}</pre> 273 </td> 274 <td> 275<pre>edition = "2023"; 276 277message Foo { 278 message Bar { 279 int32 x = 1; 280 } 281 Bar bar = 1 [features.message_encoding = DELIMITED]; 282 Bar baz = 2; 283}</pre> 284 </td> 285 </tr> 286</table> 287 288<table> 289 <tr> 290 <td> 291<pre>syntax = "proto2"; 292 293message Foo { 294 oneof foo { 295 group Bar = 1 { 296 optional int32 x = 1; 297 } 298 } 299}</pre> 300 </td> 301 <td> 302<pre>edition = "2023"; 303 304message Foo { 305 message Bar { 306 int32 x = 1; 307 } 308 oneof foo { 309 Bar bar = 1 [ 310 features.message_encoding = DELIMITED]; 311 } 312}</pre> 313 </td> 314 </tr> 315</table> 316 317### JSON Format 318 319The `json_format` feature is a bit of an outlier, because (at least for edition 320zero) it only affects the frontend build of the proto file. The `ALLOW` value 321(proto3 behavior) enables all JSON mapping conflict checks on field names, 322unless `deprecated_legacy_json_field_conflicts` is set. The `LEGACY_BEST_EFFORT` 323value (proto2 behavior) disables these checks. The ideal minimal transformation 324would be to switch to `ALLOW` in all cases except where 325`deprecated_legacy_json_field_conflicts` is set or there exist JSON mapping 326conflicts. In those cases we can fallback to `LEGACY_BEST_EFFORT`. 327 328Alternatively, if it's difficult for Prototiller to handle, we could do a 329followup large-scale change to remove all `LEGACY_BEST_EFFORT` instances that 330pass build. 331 332Example transformations: 333 334<table> 335 <tr> 336 <td> 337<pre>syntax = "proto2"; 338 339message Foo { 340 optional string bar = 1; 341 optional string baz = 2; 342}</pre> 343 </td> 344 <td> 345<pre>edition = "2023"; 346 347message Foo { 348 string bar = 1; 349 string baz = 2; 350}</pre> 351 </td> 352 </tr> 353</table> 354 355<table> 356 <tr> 357 <td> 358<pre>syntax = "proto3"; 359 360message Foo { 361 string bar = 1; 362 string baz = 2; 363}</pre> 364 </td> 365 <td> 366<pre>edition = "2023"; 367features.field_presence = IMPLICIT; 368 369message Foo { 370 string bar = 1; 371 string baz = 2; 372}</pre> 373 </td> 374 </tr> 375</table> 376 377<table> 378 <tr> 379 <td> 380<pre>syntax = "proto2"; 381 382message Foo { 383 // Warning only 384 string bar = 1; 385 string bar_ = 2; 386}</pre> 387 </td> 388 <td> 389<pre>edition = "2023"; 390features.json_format = LEGACY_BEST_EFFORT; 391 392message Foo { 393 string bar = 1; 394 string bar_ = 2; 395}</pre> 396 </td> 397 </tr> 398</table> 399 400<table> 401 <tr> 402 <td> 403<pre>syntax = "proto3"; 404 405message Foo { 406 option 407 deprecated_legacy_json_field_conflicts = true; 408 string bar = 1; 409 string baz = 2 [json_name = "bar"]; 410}</pre> 411 </td> 412 <td> 413<pre>edition = "2023"; 414features.field_presence = IMPLICIT; 415features.json_format = LEGACY_BEST_EFFORT; 416 417message Foo { 418 string bar = 1; 419 string baz = 2; 420}</pre> 421 </td> 422 </tr> 423</table> 424 425## Backend Feature Transformations 426 427This section details our backend-specific features and how to transform from 428proto2/proto3 to edition zero. 429 430In order to limit bloat, it would be ideal if we could check whether or not a 431proto file is ever used to generate code in the target language. If it's not, 432there's no reason to add backend-specific features. 433 434### Legacy Closed Enum 435 436Java and C++ by default treat proto3 enums as closed if they're used in a proto2 437message. The internal `cc_open_enum` field option can override this, but it has 438**very** limited use and may not be worth considering. While the enum type 439behavior should still be determined by Enum Type, we'll need to add this feature 440to proto2 files using proto3 messages (the reverse is disallowed). 441 442Example transformations: 443 444<table> 445 <tr> 446 <td> 447<pre>syntax = "proto2"; 448 449import "some_proto3_file.proto" 450 451enum Proto2Enum { 452 BAR = 0; 453} 454 455message Foo { 456 optional Proto3Enum bar = 1; 457 optional Proto2Enum baz = 2; 458}</pre> 459 </td> 460 <td> 461<pre>edition = "2023"; 462import "third_party/protobuf/cpp_features.proto" 463import "third_party/protobuf/java_features.proto" 464import "some_proto3_file.proto" 465 466features.enum_type = CLOSED; 467 468message Foo { 469 Proto3Enum bar = 1 [ 470 features.(pb.cpp).legacy_closed_enum = true, 471 features.(pb.java).legacy_closed_enum = true]; 472 Proto2Enum baz = 2; 473}</pre> 474 </td> 475 </tr> 476</table> 477 478### UTF8 Validation 479 480This feature is pending approval of *Editions Zero Feature: utf8_validation* 481(not available externally). 482 483## Other Transformations 484 485In addition to features, there are some other changes we've made for edition 486zero. 487 488### Reserved Identifier Syntax 489 490With *Protobuf Change Proposal: Reserved Identifiers* (not available 491externally), we've decided to switch from strings to identifiers for reserved 492fields. This *should* be a trivial change, but if the proto file contains 493strings that aren't valid identifiers there's some ambiguity. They're currently 494ignored today, but they could be typos we wouldn't want to just blindly delete. 495So instead, we'll leave behind a comment. 496 497Example transformations: 498 499<table> 500 <tr> 501 <td> 502<pre>syntax = "proto2"; 503 504message Foo { 505 reserved "bar", "baz"; 506}</pre> 507 </td> 508 <td> 509<pre>edition = "2023"; 510 511message Foo { 512 reserved bar, baz; 513}</pre> 514 </td> 515 </tr> 516</table> 517 518<table> 519 <tr> 520 <td> 521<pre>syntax = "proto2"; 522 523message Foo { 524 reserved "bar", "1"; 525}</pre> 526 </td> 527 <td> 528<pre>edition = "2023"; 529 530message Foo { 531 reserved bar; 532 /*reserved "1";*/ 533}</pre> 534 </td> 535 </tr> 536</table> 537