• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Editions Feature Visibility
2
3**Authors:** [@mkruskal-google](https://github.com/mkruskal-google)
4
5**Approved:** 2023-09-08
6
7## Background
8
9While [Editions: Life of a FeatureSet](editions-life-of-a-featureset.md) handles
10how we propagate features *to* runtimes, what's left under-specified is how the
11runtimes should expose features to their users. *Exposing Editions Feature Sets*
12(not available externally) was an initial attempt to cover both these topics
13(specifically the C++ API section), but much of it has been redesigned since.
14This is a much more targeted document laying out how features should be treated
15by runtimes.
16
17## Problem Description
18
19There are two main concerns from a runtime's perspective:
20
211.  **Direct access to resolved features protos** - While runtime decisions
22    *should* be made based on the data in these protos, their struct-like nature
23    makes them very rigid. Once users start to depend on the proto API, it makes
24    it very difficult for us to do internal refactoring. These protos are also
25    naturally structured based on how feature *specification* is done in proto
26    files, rather than the actual behaviors they represent. This makes it
27    difficult to guarantee that complex relationships between features and other
28    conditions are being uniformly handled.
29
302.  **Accidental use of unresolved features** - Unresolved features represent a
31    clear foot-gun for users, that could also cause issues for us. Since they
32    share the same type as resolved features, it's not always easy to tell the
33    two apart. If runtime decisions are made using unresolved features, it's
34    very plausible that everything will work as expected in a given edition by
35    coincidence. However, when the proto's edition is bumped, it will very
36    likely break this code unexpectedly.
37
38Some concrete examples to help illustrate these concerns:
39
40*   **Remodeling features** - We've bounced back and forth on how UTF8
41    validation should be modeled as a feature. None of the proposals resulted in
42    any functional changes, since edition zero preserves all proto2/proto3
43    behavior, the question was just about what features should be used to
44    control them. While the `.proto` file large-scale change to bump them to the
45    next edition containing these changes is unavoidable, we'd like to avoid
46    having to update any code simultaneously. If everyone is directly inspecting
47    the `utf8_validation` feature, we would need to do both.
48
49*   **Incomplete features** - Looking at a feature like `packed`, it's really
50    more of a contextual *suggestion* than a strict rule. If it's set at the
51    file level, **all** fields will have the feature even though only packable
52    ones will actually respect it. Giving users direct access to this feature
53    would be problematic, because they would *also* need to check if it's
54    packable before making decisions based on it. Field presence is an even more
55    complicated example, where the logic we want people making runtime decisions
56    based on is distinct from what's specified in the proto file.
57
58*   **Optimizations** - One of the major considerations in *Exposing Editions
59    Feature Sets* (not available externally) was whether or not it would be
60    possible to reduce the cost of editions later. Every descriptor is going to
61    contain two separate features protos, and it's likely this will end up
62    getting expensive as we roll out edition zero. We could decide to optimize
63    this by storing them as a custom class with a much more compact memory
64    layout. This is similar to other optimizations we've done to descriptor
65    classes, where we have the freedom to *because* we don't generally expose
66    them as protos.
67
68*   **Bumpy Edition Large-scale Change** - The proto team is going to be
69    responsible for rolling out the next edition to internal Google repositories
70    every year (at least 80% of it per our churn policy). We *expect* that
71    people are only making decisions based on resolved features, and therefore
72    that Prototiller transformations are behavior-preserving (despite changing
73    the unresolved features). If people have easy access to unresolved features
74    though, we can expect a lot of Hyrum's law issues slowing down these
75    large-scale changes.
76
77## Recommended Solution
78
79We recommend a conservative approach of hiding all `FeatureSet` protos from
80public APIs whenever possible. This means that there should be no public
81`features()` getter, and that features should be stripped from any descriptor
82options. All `options()` getters should have an unset features field. Instead,
83helper methods should be provided on the relevant descriptors to encapsulate the
84behaviors users care about. This has already been done for edition zero features
85(e.g. `has_presence`, `requires_utf8_validation`, etc), and we should continue
86this model.
87
88The one notable place where we *can't* completely hide features is in
89reflection. Most of our runtimes provide APIs for converting descriptors back to
90their original state at runtime (e.g. `CopyTo` and `DebugString` in C++). In
91order to give a faithful representation of the original proto file in these
92cases, we should include the **unresolved** features here. Given how inefficient
93these methods are and how hard the resulting protos are to work with, we expect
94misuse of these unresolved features to be rare.
95
96**Note:** While we may need to adjust this approach in the future, this is the
97one that gives us the most flexibility to do so. Adding a new API when we have
98solid use-cases for it is easy to do. Removing an existing one when we decide we
99don't want it has proven to be very difficult.
100
101### Enforcement
102
103While we make the recommendation above, ultimately this decision should be up to
104the runtime owners. Outside of Google we can't enforce it, and the cost would be
105a worse experience for *their* users (not the entire protobuf ecosystem). Inside
106of Google, we should be more diligent about this, since the cost mostly falls on
107us.
108
109### μpb
110
111One notable standout here is μpb, which is a runtime *implementation*, but not a
112full runtime. Since μpb only provides APIs to the wrapping runtime in a target
113language, it's free to expose features anywhere it wants. The wrapping language
114should be responsible for stripping them out where appropriate.
115
116#### Pros
117
118*   Prevents any direct access to resolved feature protos
119
120    *   Gives us freedom to do internal refactoring
121    *   Allows us to encapsulate more complex relationships
122    *   Users don't have to distinguish between resolved/unresolved features
123
124*   Limits access to unresolved features
125
126    *   Accidental usage of these is less likely (especially considering the
127        above)
128
129*   This should be easy to loosen in the future if we find a real use-case for
130    `features()` getters.
131
132*   More inline with our descriptor APIs, which wrap descriptor protos but
133    aren't strictly 1:1 with them. Options are more an exception here, mostly
134    due to the need to expose extensions.
135
136#### Cons
137
138*   There's no precedent for modifying `options()` like this. Up until now it
139    represented a faithful clone of what was specified in the proto file.
140
141*   Deciding to loosen this in the future would be a bit awkward for
142    `options()`. If we stop stripping it, people will suddenly start seeing a
143    new field and Hyrum's law might result in breakages.
144
145*   Requires duplicating high-level feature behaviors across every language. For
146    example, `has_presence` will need to be implemented identically in every
147    language. We will likely need some kind of conformance test to make sure
148    these all agree.
149
150## Considered Alternatives
151
152### Expose Features
153
154This is the simplest implementation, and was the initial approach taken in
155prototypes. We would just have public `features()` getters in our descriptor
156APIs, and keep the unresolved features in `options()`.
157
158#### Pros
159
160*   Very easy to implement
161
162#### Cons
163
164*   Doesn't solve any of the problems laid out above
165
166*   Difficult to reverse later
167
168### Hide Features in Generated Options
169
170This is a tweak of the recommended solution where we add a hack to the generated
171options messages. Instead of just stripping the features out and leaving an
172empty field, we could give the `features` fields "package-scoped" visibility
173(e.g. access tokens in C++). We would still strip them, but nobody outside of
174our runtimes could even access them to see that they're empty. This eliminates
175the Hyrum's law concern above.
176
177#### Pros
178
179*   Resolves one of the cons in the recommended approach.
180
181#### Cons
182
183*   We'd have to do this separately for each runtime, meaning specific hacks in
184    *every* code generator
185
186*   No clear benefit. This only helps **if** we decide to expose features and
187    **if** a bunch of people start depending on the fact that `features` are
188    always empty.
189
190### ClangTidy warning Options Features
191
192Similar to the above alternative, but leverages ClangTidy to warn users against
193checking `options().features()`.
194
195#### Pros
196
197*   Resolves one of the cons in the recommended approach.
198
199#### Cons
200
201*   Doesn't work in every language
202
203*   Doesn't work in OSS
204