1# Edition Zero Feature: Enum Field Closedness 2 3**Author:** [@mcy](https://github.com/mcy) 4 5**Approved:** 2023-02-13 6 7## Background 8 9On 2023-02-10, a CL [@mcy](https://github.com/mcy) submitted to delete 10`google::protobuf::Reflection::SupportsUnknownEnumValue()`. Oddly, this function used the 11containing message's `syntax`, rather than the enum field's, to determine 12whether the enum was open. 13 14It turns out we misunderstood a critical corner-case of proto3 enums. Consider 15the following proto files: 16 17``` 18// enum.proto 19syntax = "proto3"; 20package oh.no; 21 22enum Enum { 23 A = 0; 24 B = 1; 25} 26 27// message.proto 28syntax = "proto2"; 29package oh.no; 30import "enum.proto"; 31 32message Msg { 33 optional Enum enum = 1; 34} 35``` 36 37If we parse the [Protoscope](https://github.com/protocolbuffers/protoscope) 38value `1: 2` as an `oh.no.Msg`, and look at the value of `oh.no.Msg.enum`, we 39will find that it is not present, and that there is a VARINT of value 2 in the 40`UnknownFieldSet`. 41 42This is because Protobuf sometimes implements the openness of an enum by its 43usage, *not* its definition. 44 45This case is actually quite difficult to observe, because the converse doesn't 46work: the proto compiler rejects proto2-enum-valued fields in proto3 messages, 47because such enums can have nonzero defaults, which proto3 does not support due 48to implicit presence. 49 50### Languages 51 52<table> 53 <tr> 54 <td style="background-color: #cccccc">Language 55 </td> 56 <td style="background-color: #cccccc">Open/Closed handling 57 </td> 58 </tr> 59 <tr> 60 <td>C++ 61 </td> 62 <td>Determined by the field using the enum's file 63 </td> 64 </tr> 65 <tr> 66 <td>Java 67 </td> 68 <td>Determined by the field using the enum's file 69 </td> 70 </tr> 71 <tr> 72 <td>UPB (non-ruby) 73 </td> 74 <td>Determined by the enum's definition file 75 </td> 76 </tr> 77 <tr> 78 <td>UPB (ruby) 79 </td> 80 <td>All enums treated as open 81 </td> 82 </tr> 83 <tr> 84 <td>C# 85 </td> 86 <td><strong>All</strong> enums treated as open 87 </td> 88 </tr> 89 <tr> 90 <td>Obj-C 91 </td> 92 <td>by usage (< 22.x) 93<p> 94by definition (>= 22.x) 95<p> 96It looks like this was handled by the field's usage, but in Nov as part of the syntax cleanup, we stopped looking at syntax and captured things on the enum definition, so it's now defined by the enum. 97 </td> 98 </tr> 99 <tr> 100 <td>Swift 101 </td> 102 <td>Determined by the enum's definition file 103<p> 104Swift uses the ability for enums to have associated values, so an enum defined in a proto3 syntax file gets a value that holds all unknown values. So a proto2 syntax defined message will still end up with the enum using that to hold unknown values. 105 </td> 106 </tr> 107 <tr> 108 <td>Go 109 </td> 110 <td><strong>All</strong> enums treated as open 111 </td> 112 </tr> 113 <tr> 114 <td>Apps JSPB 115 </td> 116 <td><strong>All</strong> enums treated as open 117 </td> 118 </tr> 119 <tr> 120 <td>ImmutableJs 121 </td> 122 <td><strong>All</strong> enums treated as open 123 </td> 124 </tr> 125 <tr> 126 <td>JsProto 127 </td> 128 <td><strong>All</strong> enums treated as open 129 </td> 130 </tr> 131</table> 132 133### Impact 134 135Approximately 2.99% of enum fields import enums across syntaxes and 1.77% of 136enums are imported across syntaxes. 137 1386.14% of fields being enum fields, meaning 0.18% of fields are affected when 139used by affected languages. 140 141## Overview 142 143This document proposes adding an additional feature to 144[Edition Zero Features](edition-zero-features.md), specified as the following 145.proto fragment: 146 147``` 148message Features { 149 // ... 150 optional bool legacy_treat_enum_as_closed = ??? [ 151 retention = RUNTIME, 152 target = FILE, 153 target = FIELD 154 ]; 155} 156``` 157 158The name of this field captures the desired intent: this is a bad legacy 159behavior that we believe is rare and want to stamp out. Edition 2023 would set 160this to false by default, and `proto2` would treat it as implicitly true. It 161also does not permit the converse: you cannot force a field to be open, because 162that is currently not possible and we don't want to add more special cases. 163 164Additionally, we would like to make special dispensation in migration tooling 165for this field: it should not be set unconditionally when migrating from proto2 166-> editions, but *only* on proto2 fields that are of proto3 enum type. We should 167also want to build an allowlist for this, like we do for `required`. 168 169This option can also help in migrating enums from closed to open, since we can 170use it to migrate individual use-sites by marking the enum as open and all of 171its uses as treat-as-closed in one CL, and then deleting the treat-as-closed 172annotations one by one. 173 174An open (lol) question is whether we should move `is_closed` from 175`EnumDescriptor` to `FieldDescriptor`. 176 177## Recommendation 178 179Use the "define official behavior" alternative below. Given the wide variety of 180behavior in different languages, a singular global setting will always leave 181some of our languages in the lurch. As such, we will use per language features 182to allow each language to control its own evolution while we define the 183"correct" behavior. 184 185For example, in C++ we will define: 186 187``` 188// Determines if the given enum field is treated as closed based on legacy 189// non-conformant behavior. 190// 191// Conformant behavior determines closedness based on the enum and 192// can be queried using EnumDescriptor::is_closed(). 193// 194// Some runtimes currently have a quirk where non-closed enums are 195// treated as closed when used as the type of fields defined in a 196// `syntax = proto2;` file. This quirk is not present in all runtimes; as of 197// writing, we know that: 198// 199// - C++, Java, and C++-based Python share this quirk. 200// - UPB and UPB-based Python do not. 201// - PHP and Ruby treat all enums as open regardless of declaration. 202// 203// Care should be taken when using this function to respect the target 204// runtime's enum handling quirks. 205 206bool FieldDescriptor::legacy_enum_field_treated_as_closed() const { 207 return type() == TYPE_ENUM && file().syntax() == FileDescriptor::SYNTAX_PROTO2; 208} 209``` 210 211In Java, `FileDescriptor.supportsUnknownEnumValue()` will need to be deprecated 212and replaced with the above. 213 214## Alternatives 215 216### Define official behavior 217 218Define the official behavior to be "Enums open-ness should be defined by the 219definition of the enum." Add a conformance test for this behavior. Use per 220language features to eventually converge implementations that are out of 221conformance. We choose to define this as "enum openness is defined by the 222definition" because that matches the model for almost all other proto3/proto2 223properties. 224 225#### Pros 226 227* Clarifies desired behavior 228* Existing implementations can change incrementally using editions 229* Avoids complicating global features for something that is a per-language 230 issue 231 232#### Cons 233 234* When migrating from syntax to edition zero, Prototiller will need to know 235 all used languages to make the upgrade a trivial change (this is already the 236 case for other edition upgrades). 237 238### Make `Features.enum` a field-level feature 239 240Here, we don't add `legacy_treat_enum_as_closed` and instead make closeness a 241bona fide property of fields, not enums. 242 243#### Pros 244 245* Reflects the current behavior of Protobuf for our largest languages 246 (C++/Java). 247* Removes the possibility of making a mistake in reflective code that checks 248 `is_closed()` on `EnumDescriptor` rather than `FieldDescriptor`. 249 250#### Cons 251 252* Doesn't handle the case for languages other than C++/Java 253* Harder to migrate individual enums to open, since the property is not in 254 control of the owner of the type. 255* Conceptually unpleasant, since it gives locality to the meaning of 256 `IsValid`, unless we want to believe that `IsValid` merely states whether 257 the value has a name we know of. 258 259### Allow `Features.enum` on both enums and fields 260 261This allows enum owners some more control without needing to introduce a 262strictly "legacy do not use" feature. 263 264#### Pros 265 266* We don't introduce a "legacy do not use" option, and don't need to play the 267 allowlist game. 268 269#### Cons 270 271* We need to support closed-enums-treated-as-open, which is a functionality 272 Protobuf does not offer today. 273 274### Name the feature `Features.treat_as_closed_for_migration` 275 276This is an aesthetic choice if we feel this is a useful knob for migration, that 277still highlights its temporary nature. 278 279#### Pros 280 281* We don't introduce a "legacy do not use" option, and don't need to play the 282 allowlist game. 283* Clearly underscores that this is for migration in a specific desirable 284 direction (closed -> open). 285 286#### Cons 287 288* People may use it because they like closed enums for some reason and don't 289 fully appreciate the ramifications. 290 291### Do Nothing 292 293We can simply keep the current editions enum semantics. 294 295#### Pros 296 297* No extra work. 298 299#### Cons 300 301* proto2 -> editions is not a no-op in some cases. This breaks a lot of the 302 draw of moving to editions, even though it is possible to detect the no-ops 303 in advance. 304* This would immediately add a blocker to our syntax reflection large-scale 305 change 306