• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# What are Protobuf Editions?
2
3**Authors**: [@mcy](https://github.com/mcy), [@fowles](https://github.com/fowles)
4
5## Summary
6
7This document is an introduction to the Protobuf Editions project, an ambitious
8re-imagining of how we migrate Protobuf users into the future.
9
10## Goal
11
12Enable incremental evolution of Protobuf across the entire ecosystem **without**
13introducing permanent forks in the Protobuf language.
14
15## TL;DR
16
171.  We are replacing
18    [`syntax`](https://protobuf.dev/reference/protobuf/proto3-spec/#syntax) `=
19    ...` with `edition = ...`.
20    *   We plan to produce a new "edition" on a roughly yearly basis.
21    *   We plan to regularly deprecate and remove old editions after a wide
22        horizon.
23    *   This gradual churn is enabled by the
24        [Protobuf Breaking Changes policy](https://protobuf.dev/news/2022-07-06/#library-breaking-change-policy).
252.  "Features" are a special kind of file/message/field/enum/etc option.
26    *   Features control the individual codegen and runtime behavior of fields,
27        messages, enums, etc.
28    *   Features cannot introduce changes that would directly break existing
29        binaries.
30    *   We expect heavy churn of features in `.proto` files, so their design is
31        optimized to minimize diffs to `.proto` files while permitting
32        fine-grained control.
33    *   Features are **usually** attached to the field/message/enum they apply
34        to.
35        *   Features can be specified at a higher-level entity, such as a file,
36            to apply to all definitions inside of that entity. This is called
37            **feature inheritance**.
38        *   Inheritance is intended to allow us to factor frequently-occurring
39            feature declarations, minimizing clutter during migrations.
403.  Editions change only the defaults of features and do not otherwise introduce
41    new behavior.
42    *   New behavior is fundamentally controlled by features (explicitly set or
43        implicit from an edition).
44    *   Editions allow us to ratchet the ecosystem forward.
45        *   Editions can be incremented on a per `.proto` file basis; projects
46            can upgrade incrementally.
474.  Messages with any permutation of features are always interoperable (they can
48    import each other freely and use messages from each other).
49    *   Editions do not split the ecosystem, and migration is largely automated.
50    *   Directly inspired by
51        [Rust editions](https://doc.rust-lang.org/edition-guide/editions/index.html).
52    *   Carbon has a similar philosophy
535.  The `proto2`/`proto3` distinction is going away.
54    *   Editions will support everything from both and allow mixed semantics
55        even within the same message or field.
56    *   Undesirable features will be LSC'd away, using the same template as any
57        other feature/edition migration.
58
59## Motivation
60
61Arguably the biggest hard-earned lesson among Software Foundations is that
62successful migrations are incremental. Most of our experience with these has
63been for internal migrations. Externally, progress has often ossified because of
64a lack of established evolution mechanisms. More recently large projects have
65started planning incremental evolution into their structure. For example, Carbon
66is heavily focused on evolution as a core precept, and Rust has built language
67evolution via editions into its core design..
68
69Protobuf is one of Google's oldest and most successful toolchain projects.
70However, it was designed before we learned and internalized this lesson, making
71modernization difficult and haphazard. We still have `required` and `group`,
72`packed` is not everywhere, and string accessors in C++ still return `const
73std::string&`. The last radical change to Protobuf (`syntax = "proto3";`) split
74the ecosystem.
75
76*Editions* and *features* are new language features that will allow us to
77incrementally evolve Protobuf into the future. This will be done by introducing
78a new `syntax`, hopefully the last syntax addition we will ever need.
79
80This high-level document is intended as an introduction to Protobuf Editions for
81engineers not familiar with the background and the set of tradeoffs that lead us
82here. Low-level technical details are skipped in preference to describing the
83kernel of our proposed design. This document reflects the approximate consensus
84of protobuf-team members who have been developing Protobuf Editions, but please
85beware: many open questions remain.
86
87## What is a feature?
88
89A *feature*, in the narrow context of Protobuf Editions, is an `option` on any
90syntax entity of a `.proto` file that has the following properties:
91
92*   It is a field or extension of a top-level option named `features`, which is
93    present on every syntax entity (file, message, enum, field, etc). It can be
94    of any type, but `bool` and `enum` are the most common.
95*   If a syntax entity's lexical parent has a particular value for a feature,
96    then the child has the same value, unless the feature has a new value
97    specified on the child, explicitly. This is called **feature inheritance**,
98    and applies recursively. Features can specify a new value at any of the
99    points where a feature can be added.
100*   It explicitly specifies what syntax entities it can be set on, similar to
101    Java annotations (although this does not preclude inheritance to or through
102    an entity that it *cannot* be set on).
103
104Features allow us to control the behavior of `protoc`, its backends, and the
105Protobuf runtimes at arbitrary granularity. This is critical for large-scale
106changes: if a message has few usages, features can be changed at a bigger scope,
107minimizing diff churn, but if it has heavy usage and the CL to migrate a single
108field is large, cleanups can happen at the field level, as necessary.
109
110Features won't change a message’s serialization formats (binary, text, or json)
111in incompatible ways except for extreme circumstances that will always be
112managed directly by protobuf-team. It is critical for migrations that any
113behavioral change coming from a feature is the result of a textual change to a
114`.proto` file (either an edition bump or a feature change).
115
116`ctype` is an existing field option that looks exactly like a feature: it
117controls the behavior of the codegen backend, although it does not have the nice
118ratcheting properties of editions.
119
120Because features can be extensions, language backends can specify
121**language-scoped** features. For example, `[ctype = CORD]` could instead be
122phrased as `[features.(pb.cpp).string_type = CORD]`. Codegen backends own the
123definitions of their features.
124
125## What is an Edition?
126
127An *edition* is a collection of defaults for features understood by `protoc` and
128its backends. Editions are year-numbered, although we have defined a breakout in
129case we need multiple editions in a particular year.
130
131Instead of writing `syntax = "...";`, a Protobuf Editions-enabled `.proto` file
132begins with `edition = "2022";` or similar. `edition` implies `syntax =
133"editions";`, and the `syntax` keyword itself becomes deprecated. This is to
134ensure that old tools not owned by protobuf-team, which only work for old
135Protobuf syntaxes, crash or fail quickly and noticeably, instead of wandering
136into a descriptor that they cannot understand (we will attempt to migrate what
137we can, of course).
138
139`protoc` specifies which editions it understands, and will reject `.proto` files
140"from the future", since it cannot meaningfully parse them. `protoc` backends,
141which can specify their own set of language-scoped features, must advertise the
142defaults for a particular edition that they understand (and reject editions that
143they don't). Runtimes must be able to handle descriptors "from the future"; this
144only means that upon encountering a descriptor with an edition or feature it
145does not understand, there must be a reasonable fallback for the runtime's
146behavior.
147
148### What is an Edition used for?
149
150Editions provide the fundamental increments for the lifecycle of a feature. At
151this point it is important to reiterate that most features will be specific to
152particular code generators. What follows is an example life cycle for a
153theoretical feature–`features.(pb.cpp).opaque_repeated_fields`.
154
1551.  Edition “2025” creates `features.(pb.cpp).opaque_repeated_fields` with a
156    default value of `false`. This value is equivalent to the behavior from
157    editions less than “2025”.
158
159    a. The migration to edition “2025” across google will move very fast as it
160    is a no-op.
161
1622.  Migration begins for `features.(pb.cpp).opaque_repeated_fields` (each change
163    in this migration will add `features.(pb.cpp).opaque_repeated_fields = true`
164    and be paired with code changes required to C++ code). It is not anticipated
165    that protos shared between repos will undergo field by field migrations like
166    this as that would cause a large stream of breaking changes, see
167    [Protobuf Editions for schema producers](protobuf-editions-for-schema-producers.md)
168    for more details.
169
1703.  Edition “2027” switches the default of
171    `features.(pb.cpp).opaque_repeated_fields` to `true`.
172
173    a. The migration to “2027” will remove explicit uses of
174    `features.(pb.cpp).opaque_repeated_fields = true` and add explicit uses of
175    `features.(pb.cpp).opaque_repeated_fields = false` where they were implicit
176    before. As above, this migration will be a no-op, so it will move very fast.
177
178    b. Externally, we will release tools and migration guides for OSS customers.
179    The tools will not be fully turnkey, but should provide a strong starting
180    point for user migrations.
181
1824.  Migration continues for `features.(pb.cpp).opaque_repeated_fields` (each
183    change in this migration will remove
184    `features.(pb.cpp).opaque_repeated_fields = false` and be paired with code
185    changes required to C++ code).
186
1875.  At some point, usage will be officially roped off internally, and
188    externally.
189
190    a. Internally, `features.(pb.cpp).opaque_repeated_fields` usage will be
191    blocked with allowlists while we remove the hardest to migrate case.
192
193    b. Externally, `features.(pb.cpp).opaque_repeated_fields` will be marked
194    deprecated in a public edition and removed in a later one. When a feature is
195    removed, the code generators for that behavior and the runtime libraries
196    that support it may also be removed. In this hypothetical, that might be
197    deprecated in “2029” and removed in “2031”. Any release that removes support
198    for a feature would be a major version bump.
199
200The key point to note here is that any `.proto` file that does not use
201deprecated features has a no-op upgrade from one edition to the next and we will
202provide tools to effect that upgrade. Internal users will be migrated centrally
203before a feature is deprecated. External users will have the full window of the
204Google migration as well as the deprecation window to upgrade their own code.
205
206It is also important to note that external users will not receive compiler
207warnings until the feature is actually deprecated, so we provide a period of
208deprecation to ensure that they have time to update their code before forcing
209them to upgrade for an edition update.
210
211Separately from feature evolution, `protoc` itself may remove support for old
212editions entirely after a suitably long window (like 10 years).
213
214## Edition Zero
215
216The first edition of Protobuf Editions, the so-called "edition zero", will
217effectively be a "`proto4`" that introduces the new syntax, and merges the
218semantics of `proto2` and `proto3`. In editions mode, everything that was
219possible in `proto2` and `proto3` will be possible, and the handful of
220irreconcilable differences will be expressed as features.
221
222For example, whether values not specified in an `enum` go into unknown fields vs
223producing an enum value outside of the bounds of the specified values in the
224`.proto` file (i.e., so-called closed and open enums) will be controlled by
225`feature.enum = OPEN` or `feature.enum = CLOSED`.
226
227Edition Zero should be viewed as the "completion" of the union of `proto2` and
228`proto3`: it contains both syntaxes as subsets (although with different
229spellings to disambiguate things) as well as new behavior that was previously
230inexpressible but which is an obvious consequence of allowing everything from
231both. For example, `proto3`-style non-optional singular fields could allow
232non-zero defaults.
233
234Edition Zero is designed in such a way that we can mechanically migrate an
235arbitrary `.proto` file from either `proto2` or `proto3` with no behavioral
236changes, by replacing `syntax` with `edition` and adding features in the
237appropriate locations.
238
239This will form the foundation of Protobuf Editions and the torrent of parallel
240migrations that will follow.
241
242## FAQ
243
244### I only interact with protos by moving them around and editing schemata. How does this affect me?
245
246This will manifest as a handful of new `option`s appearing at the top of your
247files. Going forward, expect new `options` to appear and disappear from your
248`.proto` files as LSCs march across the codebase. We intend to minimize
249disruption, and you should be able to safely ignore them.
250
251In general, you should not need to add `option`s yourself unless we say so in
252documentation. We will try to make sure tooling recommends the latest edition
253when creating new files.
254
255### Are you taking away <thing>?
256
257Everything expressible today will remain so in Edition Zero. Some syntax will
258change: we will have only one way of spelling a singular field (with `optional`
259vs. the `proto3` behavior vs. `required` controlled by a feature), `group`s will
260turn into sub message fields with a special encoding.
261
262### I think <thing> from proto{2,3} is bad. Why are you letting people use it in my files?
263
264Long-term bifurcation of the language has resulted in significant damage to the.
265ecosystem and engineers' mental model of Protobuf. There are features we think
266are questionable, too, and we want to remove them. But we need to break some
267eggs to make an omelet.
268
269As stewards of the Protobuf language, we believe this is the best way to get rid
270of features that were a good idea at the time, but which history has shown to
271have had poor outcomes.
272
273### I manipulate protos reflectively, or have some other complicated use-case
274
275We plan to upgrade reflection to be feature-aware in a way that minimizes code
276we need to change. We do not expect anyone to implement feature-inheritance
277logic themselves; feature inheritance should be fully transparent to users,
278behaving as if features had been placed explicitly everywhere. (Owners of code
279generators should be the only ones that need to know how to correctly propagate
280features.)
281
282We will be partnering with use-cases that are known risks for migration, such as
283storage providers, to minimize toil and disruption on all sides.
284
285### I want to use features to fix a defect in Protobuf
286
287Generally, the owner of the relevant component that ingests a particular feature
288(`protoc` or the appropriate language backend) will own it. We will try to make
289it as straightforward as we can to add a language-scoped feature, but it may
290require some degree of coordination with us to get it into an edition.
291
292Even if it's about one of protobuf-team's backends, we'd love to hear what you
293think we can fix, within the constraints of editions.
294
295### What's your OSS strategy?
296
297We want to share a variant of this document with the OSS community. We plan to
298publish migration guides and, where feasible, any migration tooling, such as the
299`proto2`/`proto3` -> `edition` migrator.
300
301As stated above, we want to minimize friction for non-protobuf-team-owned
302backends, and this ties into helping third party code generators minimize their
303pain.
304
305### I like Protobuf as it is. Can I keep my old files?
306
307Yes, but you get to keep both pieces. Failing to migrate off of old use-cases
308and into newer versions that fix known defects is a risk for the entire
309ecosystem: C++'s disastrous standardization process is a solemn warning of
310failing to do so.
311
312Trying to stay on `proto2` or `proto3` will eventually cease to be supported,
313and old editions (e.g. 5 years) will also cease to be supported. Evolution is at
314the heart of Protobuf, and we want to make it as easy as possible for users to
315keep up with our progress towards a better Protobuf.
316
317### What do you hope to use editions to change in the short/mid term?
318
319An incomplete list of *ideas*, which should be taken as non-committal.
320
321*   Eliminate `required` completely by making a particular field be optional but
322    serialized unconditionally.
323*   Make all uses of `string` require UTF-8 checking, and all uses that don't
324    want/need it `bytes`, fulfilling the original `proto3` vision.
325*   Make every `string` and `bytes` accessor in C++ return `absl::string_view`,
326    unlocking performance optimizations.
327*   Make all scalar `repeated` fields `packed`, improving throughput.
328*   Make `enum` enumerators in C++ use `kName` instead of `NAME`.
329*   Make `enum` declarations in C++ into scoped `enum class`.
330*   Make `CTYPE` into a language-scoped feature.
331*   Replace per-language, file-level options with language-scoped features.
332*   Make reflection opt-in for some languages (C++).
333