• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Writing a schema    {#flatbuffers_guide_writing_schema}
2================
3
4The syntax of the schema language (aka IDL, [Interface Definition Language][])
5should look quite familiar to users of any of the C family of
6languages, and also to users of other IDLs. Let's look at an example
7first:
8
9    // example IDL file
10
11    namespace MyGame;
12
13    attribute "priority";
14
15    enum Color : byte { Red = 1, Green, Blue }
16
17    union Any { Monster, Weapon, Pickup }
18
19    struct Vec3 {
20      x:float;
21      y:float;
22      z:float;
23    }
24
25    table Monster {
26      pos:Vec3;
27      mana:short = 150;
28      hp:short = 100;
29      name:string;
30      friendly:bool = false (deprecated, priority: 1);
31      inventory:[ubyte];
32      color:Color = Blue;
33      test:Any;
34    }
35
36    root_type Monster;
37
38(`Weapon` & `Pickup` not defined as part of this example).
39
40### Tables
41
42Tables are the main way of defining objects in FlatBuffers, and consist
43of a name (here `Monster`) and a list of fields. Each field has a name,
44a type, and optionally a default value (if omitted, it defaults to `0` /
45`NULL`).
46
47Each field is optional: It does not have to appear in the wire
48representation, and you can choose to omit fields for each individual
49object. As a result, you have the flexibility to add fields without fear of
50bloating your data. This design is also FlatBuffer's mechanism for forward
51and backwards compatibility. Note that:
52
53-   You can add new fields in the schema ONLY at the end of a table
54    definition. Older data will still
55    read correctly, and give you the default value when read. Older code
56    will simply ignore the new field.
57    If you want to have flexibility to use any order for fields in your
58    schema, you can manually assign ids (much like Protocol Buffers),
59    see the `id` attribute below.
60
61-   You cannot delete fields you don't use anymore from the schema,
62    but you can simply
63    stop writing them into your data for almost the same effect.
64    Additionally you can mark them as `deprecated` as in the example
65    above, which will prevent the generation of accessors in the
66    generated C++, as a way to enforce the field not being used any more.
67    (careful: this may break code!).
68
69-   You may change field names and table names, if you're ok with your
70    code breaking until you've renamed them there too.
71
72See "Schema evolution examples" below for more on this
73topic.
74
75### Structs
76
77Similar to a table, only now none of the fields are optional (so no defaults
78either), and fields may not be added or be deprecated. Structs may only contain
79scalars or other structs. Use this for
80simple objects where you are very sure no changes will ever be made
81(as quite clear in the example `Vec3`). Structs use less memory than
82tables and are even faster to access (they are always stored in-line in their
83parent object, and use no virtual table).
84
85### Types
86
87Built-in scalar types are
88
89-   8 bit: `byte` (`int8`), `ubyte` (`uint8`), `bool`
90
91-   16 bit: `short` (`int16`), `ushort` (`uint16`)
92
93-   32 bit: `int` (`int32`), `uint` (`uint32`), `float` (`float32`)
94
95-   64 bit: `long` (`int64`), `ulong` (`uint64`), `double` (`float64`)
96
97The type names in parentheses are alias names such that for example
98`uint8` can be used in place of `ubyte`, and `int32` can be used in
99place of `int` without affecting code generation.
100
101Built-in non-scalar types:
102
103-   Vector of any other type (denoted with `[type]`). Nesting vectors
104    is not supported, instead you can wrap the inner vector in a table.
105
106-   `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
107    or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
108
109-   References to other tables or structs, enums or unions (see
110    below).
111
112You can't change types of fields once they're used, with the exception
113of same-size data where a `reinterpret_cast` would give you a desirable result,
114e.g. you could change a `uint` to an `int` if no values in current data use the
115high bit yet.
116
117### (Default) Values
118
119Values are a sequence of digits. Values may be optionally followed by a decimal
120point (`.`) and more digits, for float constants, or optionally prefixed by
121a `-`. Floats may also be in scientific notation; optionally ending with an `e`
122or `E`, followed by a `+` or `-` and more digits.
123
124Only scalar values can have defaults, non-scalar (string/vector/table) fields
125default to `NULL` when not present.
126
127You generally do not want to change default values after they're initially
128defined. Fields that have the default value are not actually stored in the
129serialized data (see also Gotchas below) but are generated in code,
130so when you change the default, you'd
131now get a different value than from code generated from an older version of
132the schema. There are situations, however, where this may be
133desirable, especially if you can ensure a simultaneous rebuild of
134all code.
135
136### Enums
137
138Define a sequence of named constants, each with a given value, or
139increasing by one from the previous one. The default first value
140is `0`. As you can see in the enum declaration, you specify the underlying
141integral type of the enum with `:` (in this case `byte`), which then determines
142the type of any fields declared with this enum type.
143
144Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`,
145`uint`, `long` and `ulong`.
146
147Typically, enum values should only ever be added, never removed (there is no
148deprecation for enums). This requires code to handle forwards compatibility
149itself, by handling unknown enum values.
150
151### Unions
152
153Unions share a lot of properties with enums, but instead of new names
154for constants, you use names of tables. You can then declare
155a union field, which can hold a reference to any of those types, and
156additionally a field with the suffix `_type` is generated that holds
157the corresponding enum value, allowing you to know which type to cast
158to at runtime.
159
160It's possible to give an alias name to a type union. This way a type can even be
161used to mean different things depending on the name used:
162
163    table PointPosition { x:uint; y:uint; }
164    table MarkerPosition {}
165    union Position {
166      Start:MarkerPosition,
167      Point:PointPosition,
168      Finish:MarkerPosition
169    }
170
171Unions contain a special `NONE` marker to denote that no value is stored so that
172name cannot be used as an alias.
173
174Unions are a good way to be able to send multiple message types as a FlatBuffer.
175Note that because a union field is really two fields, it must always be
176part of a table, it cannot be the root of a FlatBuffer by itself.
177
178If you have a need to distinguish between different FlatBuffers in a more
179open-ended way, for example for use as files, see the file identification
180feature below.
181
182There is an experimental support only in C++ for a vector of unions
183(and types). In the example IDL file above, use [Any] to add a
184vector of Any to Monster table.
185
186### Namespaces
187
188These will generate the corresponding namespace in C++ for all helper
189code, and packages in Java. You can use `.` to specify nested namespaces /
190packages.
191
192### Includes
193
194You can include other schemas files in your current one, e.g.:
195
196    include "mydefinitions.fbs";
197
198This makes it easier to refer to types defined elsewhere. `include`
199automatically ensures each file is parsed just once, even when referred to
200more than once.
201
202When using the `flatc` compiler to generate code for schema definitions,
203only definitions in the current file will be generated, not those from the
204included files (those you still generate separately).
205
206### Root type
207
208This declares what you consider to be the root table (or struct) of the
209serialized data. This is particularly important for parsing JSON data,
210which doesn't include object type information.
211
212### File identification and extension
213
214Typically, a FlatBuffer binary buffer is not self-describing, i.e. it
215needs you to know its schema to parse it correctly. But if you
216want to use a FlatBuffer as a file format, it would be convenient
217to be able to have a "magic number" in there, like most file formats
218have, to be able to do a sanity check to see if you're reading the
219kind of file you're expecting.
220
221Now, you can always prefix a FlatBuffer with your own file header,
222but FlatBuffers has a built-in way to add an identifier to a
223FlatBuffer that takes up minimal space, and keeps the buffer
224compatible with buffers that don't have such an identifier.
225
226You can specify in a schema, similar to `root_type`, that you intend
227for this type of FlatBuffer to be used as a file format:
228
229    file_identifier "MYFI";
230
231Identifiers must always be exactly 4 characters long. These 4 characters
232will end up as bytes at offsets 4-7 (inclusive) in the buffer.
233
234For any schema that has such an identifier, `flatc` will automatically
235add the identifier to any binaries it generates (with `-b`),
236and generated calls like `FinishMonsterBuffer` also add the identifier.
237If you have specified an identifier and wish to generate a buffer
238without one, you can always still do so by calling
239`FlatBufferBuilder::Finish` explicitly.
240
241After loading a buffer, you can use a call like
242`MonsterBufferHasIdentifier` to check if the identifier is present.
243
244Note that this is best for open-ended uses such as files. If you simply wanted
245to send one of a set of possible messages over a network for example, you'd
246be better off with a union.
247
248Additionally, by default `flatc` will output binary files as `.bin`.
249This declaration in the schema will change that to whatever you want:
250
251    file_extension "ext";
252
253### RPC interface declarations
254
255You can declare RPC calls in a schema, that define a set of functions
256that take a FlatBuffer as an argument (the request) and return a FlatBuffer
257as the response (both of which must be table types):
258
259    rpc_service MonsterStorage {
260      Store(Monster):StoreResponse;
261      Retrieve(MonsterId):Monster;
262    }
263
264What code this produces and how it is used depends on language and RPC system
265used, there is preliminary support for GRPC through the `--grpc` code generator,
266see `grpc/tests` for an example.
267
268### Comments & documentation
269
270May be written as in most C-based languages. Additionally, a triple
271comment (`///`) on a line by itself signals that a comment is documentation
272for whatever is declared on the line after it
273(table/struct/field/enum/union/element), and the comment is output
274in the corresponding C++ code. Multiple such lines per item are allowed.
275
276### Attributes
277
278Attributes may be attached to a declaration, behind a field, or after
279the name of a table/struct/enum/union. These may either have a value or
280not. Some attributes like `deprecated` are understood by the compiler;
281user defined ones need to be declared with the attribute declaration
282(like `priority` in the example above), and are
283available to query if you parse the schema at runtime.
284This is useful if you write your own code generators/editors etc., and
285you wish to add additional information specific to your tool (such as a
286help text).
287
288Current understood attributes:
289
290-   `id: n` (on a table field): manually set the field identifier to `n`.
291    If you use this attribute, you must use it on ALL fields of this table,
292    and the numbers must be a contiguous range from 0 onwards.
293    Additionally, since a union type effectively adds two fields, its
294    id must be that of the second field (the first field is the type
295    field and not explicitly declared in the schema).
296    For example, if the last field before the union field had id 6,
297    the union field should have id 8, and the unions type field will
298    implicitly be 7.
299    IDs allow the fields to be placed in any order in the schema.
300    When a new field is added to the schema it must use the next available ID.
301-   `deprecated` (on a field): do not generate accessors for this field
302    anymore, code should stop using this data. Old data may still contain this
303    field, but it won't be accessible anymore by newer code. Note that if you
304    deprecate a field that was previous required, old code may fail to validate
305    new data (when using the optional verifier).
306-   `required` (on a non-scalar table field): this field must always be set.
307    By default, all fields are optional, i.e. may be left out. This is
308    desirable, as it helps with forwards/backwards compatibility, and
309    flexibility of data structures. It is also a burden on the reading code,
310    since for non-scalar fields it requires you to check against NULL and
311    take appropriate action. By specifying this field, you force code that
312    constructs FlatBuffers to ensure this field is initialized, so the reading
313    code may access it directly, without checking for NULL. If the constructing
314    code does not initialize this field, they will get an assert, and also
315    the verifier will fail on buffers that have missing required fields. Note
316    that if you add this attribute to an existing field, this will only be
317    valid if existing data always contains this field / existing code always
318    writes this field.
319-   `force_align: size` (on a struct): force the alignment of this struct
320    to be something higher than what it is naturally aligned to. Causes
321    these structs to be aligned to that amount inside a buffer, IF that
322    buffer is allocated with that alignment (which is not necessarily
323    the case for buffers accessed directly inside a `FlatBufferBuilder`).
324    Note: currently not guaranteed to have an effect when used with
325    `--object-api`, since that may allocate objects at alignments less than
326    what you specify with `force_align`.
327-   `bit_flags` (on an unsigned enum): the values of this field indicate bits,
328    meaning that any unsigned value N specified in the schema will end up
329    representing 1<<N, or if you don't specify values at all, you'll get
330    the sequence 1, 2, 4, 8, ...
331-   `nested_flatbuffer: "table_name"` (on a field): this indicates that the field
332    (which must be a vector of ubyte) contains flatbuffer data, for which the
333    root type is given by `table_name`. The generated code will then produce
334    a convenient accessor for the nested FlatBuffer.
335-   `flexbuffer` (on a field): this indicates that the field
336    (which must be a vector of ubyte) contains flexbuffer data. The generated
337    code will then produce a convenient accessor for the FlexBuffer root.
338-   `key` (on a field): this field is meant to be used as a key when sorting
339    a vector of the type of table it sits in. Can be used for in-place
340    binary search.
341-   `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose
342    value during JSON parsing is allowed to be a string, which will then be
343    stored as its hash. The value of attribute is the hashing algorithm to
344    use, one of `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`.
345-   `original_order` (on a table): since elements in a table do not need
346    to be stored in any particular order, they are often optimized for
347    space by sorting them to size. This attribute stops that from happening.
348    There should generally not be any reason to use this flag.
349-   'native_*'.  Several attributes have been added to support the [C++ object
350    Based API](@ref flatbuffers_cpp_object_based_api).  All such attributes
351    are prefixed with the term "native_".
352
353
354## JSON Parsing
355
356The same parser that parses the schema declarations above is also able
357to parse JSON objects that conform to this schema. So, unlike other JSON
358parsers, this parser is strongly typed, and parses directly into a FlatBuffer
359(see the compiler documentation on how to do this from the command line, or
360the C++ documentation on how to do this at runtime).
361
362Besides needing a schema, there are a few other changes to how it parses
363JSON:
364
365-   It accepts field names with and without quotes, like many JSON parsers
366    already do. It outputs them without quotes as well, though can be made
367    to output them using the `strict_json` flag.
368-   If a field has an enum type, the parser will recognize symbolic enum
369    values (with or without quotes) instead of numbers, e.g.
370    `field: EnumVal`. If a field is of integral type, you can still use
371    symbolic names, but values need to be prefixed with their type and
372    need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums
373    representing flags, you may place multiple inside a string
374    separated by spaces to OR them, e.g.
375    `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`.
376-   Similarly, for unions, these need to specified with two fields much like
377    you do when serializing from code. E.g. for a field `foo`, you must
378    add a field `foo_type: FooOne` right before the `foo` field, where
379    `FooOne` would be the table out of the union you want to use.
380-   A field that has the value `null` (e.g. `field: null`) is intended to
381    have the default value for that field (thus has the same effect as if
382    that field wasn't specified at all).
383-   It has some built in conversion functions, so you can write for example
384    `rad(180)` where ever you'd normally write `3.14159`.
385    Currently supports the following functions: `rad`, `deg`, `cos`, `sin`,
386    `tan`, `acos`, `asin`, `atan`.
387
388When parsing JSON, it recognizes the following escape codes in strings:
389
390-   `\n` - linefeed.
391-   `\t` - tab.
392-   `\r` - carriage return.
393-   `\b` - backspace.
394-   `\f` - form feed.
395-   `\"` - double quote.
396-   `\\` - backslash.
397-   `\/` - forward slash.
398-   `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
399    representation.
400-   `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is
401     not in the JSON spec (see http://json.org/), but is needed to be able to
402     encode arbitrary binary in strings to text and back without losing
403     information (e.g. the byte 0xFF can't be represented in standard JSON).
404
405It also generates these escape codes back again when generating JSON from a
406binary representation.
407
408When parsing numbers, the parser is more flexible than JSON.
409A format of numeric literals is more close to the C/C++.
410According to the [grammar](@ref flatbuffers_grammar), it accepts the following
411numerical literals:
412
413-   An integer literal can have any number of leading zero `0` digits.
414    Unlike C/C++, the parser ignores a leading zero, not interpreting it as the
415    beginning of the octal number.
416    The numbers `[081, -00094]` are equal to `[81, -94]`  decimal integers.
417-   The parser accepts unsigned and signed hexadecimal integer numbers.
418    For example: `[0x123, +0x45, -0x67]` are equal to `[291, 69, -103]` decimals.
419-   The format of float-point numbers is fully compatible with C/C++ format.
420    If a modern C++ compiler is used the parser accepts hexadecimal and special
421    float-point literals as well:
422    `[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]`.
423    The exponent suffix of hexadecimal float-point number is mandatory.
424
425    Extended float-point support was tested with:
426    - x64 Windows: `MSVC2015` and higher.
427    - x64 Linux: `LLVM 6.0`, `GCC 4.9` and higher.
428
429-   For compatibility with a JSON lint tool all numeric literals of scalar
430    fields can be wrapped to quoted string:
431    `"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"`.
432
433## Guidelines
434
435### Efficiency
436
437FlatBuffers is all about efficiency, but to realize that efficiency you
438require an efficient schema. There are usually multiple choices on
439how to represent data that have vastly different size characteristics.
440
441It is very common nowadays to represent any kind of data as dictionaries
442(as in e.g. JSON), because of its flexibility and extensibility. While
443it is possible to emulate this in FlatBuffers (as a vector
444of tables with key and value(s)), this is a bad match for a strongly
445typed system like FlatBuffers, leading to relatively large binaries.
446FlatBuffer tables are more flexible than classes/structs in most systems,
447since having a large number of fields only few of which are actually
448used is still efficient. You should thus try to organize your data
449as much as possible such that you can use tables where you might be
450tempted to use a dictionary.
451
452Similarly, strings as values should only be used when they are
453truely open-ended. If you can, always use an enum instead.
454
455FlatBuffers doesn't have inheritance, so the way to represent a set
456of related data structures is a union. Unions do have a cost however,
457so an alternative to a union is to have a single table that has
458all the fields of all the data structures you are trying to
459represent, if they are relatively similar / share many fields.
460Again, this is efficient because optional fields are cheap.
461
462FlatBuffers supports the full range of integer sizes, so try to pick
463the smallest size needed, rather than defaulting to int/long.
464
465Remember that you can share data (refer to the same string/table
466within a buffer), so factoring out repeating data into its own
467data structure may be worth it.
468
469### Style guide
470
471Identifiers in a schema are meant to translate to many different programming
472languages, so using the style of your "main" language is generally a bad idea.
473
474For this reason, below is a suggested style guide to adhere to, to keep schemas
475consistent for interoperation regardless of the target language.
476
477Where possible, the code generators for specific languages will generate
478identifiers that adhere to the language style, based on the schema identifiers.
479
480- Table, struct, enum and rpc names (types): UpperCamelCase.
481- Table and struct field names: snake_case. This is translated to lowerCamelCase
482  automatically for some languages, e.g. Java.
483- Enum values: UpperCamelCase.
484- namespaces: UpperCamelCase.
485
486Formatting (this is less important, but still worth adhering to):
487
488- Opening brace: on the same line as the start of the declaration.
489- Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`.
490
491For an example, see the schema at the top of this file.
492
493## Gotchas
494
495### Schemas and version control
496
497FlatBuffers relies on new field declarations being added at the end, and earlier
498declarations to not be removed, but be marked deprecated when needed. We think
499this is an improvement over the manual number assignment that happens in
500Protocol Buffers (and which is still an option using the `id` attribute
501mentioned above).
502
503One place where this is possibly problematic however is source control. If user
504A adds a field, generates new binary data with this new schema, then tries to
505commit both to source control after user B already committed a new field also,
506and just auto-merges the schema, the binary files are now invalid compared to
507the new schema.
508
509The solution of course is that you should not be generating binary data before
510your schema changes have been committed, ensuring consistency with the rest of
511the world. If this is not practical for you, use explicit field ids, which
512should always generate a merge conflict if two people try to allocate the same
513id.
514
515### Schema evolution examples
516
517Some examples to clarify what happens as you change a schema:
518
519If we have the following original schema:
520
521    table { a:int; b:int; }
522
523And we extend it:
524
525    table { a:int; b:int; c:int; }
526
527This is ok. Code compiled with the old schema reading data generated with the
528new one will simply ignore the presence of the new field. Code compiled with the
529new schema reading old data will get the default value for `c` (which is 0
530in this case, since it is not specified).
531
532    table { a:int (deprecated); b:int; }
533
534This is also ok. Code compiled with the old schema reading newer data will now
535always get the default value for `a` since it is not present. Code compiled
536with the new schema now cannot read nor write `a` anymore (any existing code
537that tries to do so will result in compile errors), but can still read
538old data (they will ignore the field).
539
540    table { c:int a:int; b:int; }
541
542This is NOT ok, as this makes the schemas incompatible. Old code reading newer
543data will interpret `c` as if it was `a`, and new code reading old data
544accessing `a` will instead receive `b`.
545
546    table { c:int (id: 2); a:int (id: 0); b:int (id: 1); }
547
548This is ok. If your intent was to order/group fields in a way that makes sense
549semantically, you can do so using explicit id assignment. Now we are compatible
550with the original schema, and the fields can be ordered in any way, as long as
551we keep the sequence of ids.
552
553    table { b:int; }
554
555NOT ok. We can only remove a field by deprecation, regardless of wether we use
556explicit ids or not.
557
558    table { a:uint; b:uint; }
559
560This is MAYBE ok, and only in the case where the type change is the same size,
561like here. If old data never contained any negative numbers, this will be
562safe to do.
563
564    table { a:int = 1; b:int = 2; }
565
566Generally NOT ok. Any older data written that had 0 values were not written to
567the buffer, and rely on the default value to be recreated. These will now have
568those values appear to `1` and `2` instead. There may be cases in which this
569is ok, but care must be taken.
570
571    table { aa:int; bb:int; }
572
573Occasionally ok. You've renamed fields, which will break all code (and JSON
574files!) that use this schema, but as long as the change is obvious, this is not
575incompatible with the actual binary buffers, since those only ever address
576fields by id/offset.
577<br>
578
579### Testing whether a field is present in a table
580
581Most serialization formats (e.g. JSON or Protocol Buffers) make it very
582explicit in the format whether a field is present in an object or not,
583allowing you to use this as "extra" information.
584
585In FlatBuffers, this also holds for everything except scalar values.
586
587FlatBuffers by default will not write fields that are equal to the default
588value (for scalars), sometimes resulting in a significant space savings.
589
590However, this also means testing whether a field is "present" is somewhat
591meaningless, since it does not tell you if the field was actually written by
592calling `add_field` style calls, unless you're only interested in this
593information for non-default values.
594
595Some `FlatBufferBuilder` implementations have an option called `force_defaults`
596that circumvents this behavior, and writes fields even if they are equal to
597the default. You can then use `IsFieldPresent` to query this.
598
599Another option that works in all languages is to wrap a scalar field in a
600struct. This way it will return null if it is not present. The cool thing
601is that structs don't take up any more space than the scalar they represent.
602
603   [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language
604