Writing a schema {#flatbuffers_guide_writing_schema} ================ The syntax of the schema language (aka IDL, [Interface Definition Language][]) should look quite familiar to users of any of the C family of languages, and also to users of other IDLs. Let's look at an example first: // example IDL file namespace MyGame; attribute "priority"; enum Color : byte { Red = 1, Green, Blue } union Any { Monster, Weapon, Pickup } struct Vec3 { x:float; y:float; z:float; } table Monster { pos:Vec3; mana:short = 150; hp:short = 100; name:string; friendly:bool = false (deprecated, priority: 1); inventory:[ubyte]; color:Color = Blue; test:Any; } root_type Monster; (`Weapon` & `Pickup` not defined as part of this example). ### Tables Tables are the main way of defining objects in FlatBuffers, and consist of a name (here `Monster`) and a list of fields. Each field has a name, a type, and optionally a default value. If the default value is not specified in the schema, it will be `0` for scalar types, or `null` for other types. Some languages support setting a scalar's default to `null`. This makes the scalar optional. Fields do not have to appear in the wire representation, and you can choose to omit fields when constructing an object. You have the flexibility to add fields without fear of bloating your data. This design is also FlatBuffer's mechanism for forward and backwards compatibility. Note that: - You can add new fields in the schema ONLY at the end of a table definition. Older data will still read correctly, and give you the default value when read. Older code will simply ignore the new field. If you want to have flexibility to use any order for fields in your schema, you can manually assign ids (much like Protocol Buffers), see the `id` attribute below. - You cannot delete fields you don't use anymore from the schema, but you can simply stop writing them into your data for almost the same effect. Additionally you can mark them as `deprecated` as in the example above, which will prevent the generation of accessors in the generated C++, as a way to enforce the field not being used any more. (careful: this may break code!). - You may change field names and table names, if you're ok with your code breaking until you've renamed them there too. See "Schema evolution examples" below for more on this topic. ### Structs Similar to a table, only now none of the fields are optional (so no defaults either), and fields may not be added or be deprecated. Structs may only contain scalars or other structs. Use this for simple objects where you are very sure no changes will ever be made (as quite clear in the example `Vec3`). Structs use less memory than tables and are even faster to access (they are always stored in-line in their parent object, and use no virtual table). ### Types Built-in scalar types are - 8 bit: `byte` (`int8`), `ubyte` (`uint8`), `bool` - 16 bit: `short` (`int16`), `ushort` (`uint16`) - 32 bit: `int` (`int32`), `uint` (`uint32`), `float` (`float32`) - 64 bit: `long` (`int64`), `ulong` (`uint64`), `double` (`float64`) The type names in parentheses are alias names such that for example `uint8` can be used in place of `ubyte`, and `int32` can be used in place of `int` without affecting code generation. Built-in non-scalar types: - Vector of any other type (denoted with `[type]`). Nesting vectors is not supported, instead you can wrap the inner vector in a table. - `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings or general binary data use vectors (`[byte]` or `[ubyte]`) instead. - References to other tables or structs, enums or unions (see below). You can't change types of fields once they're used, with the exception of same-size data where a `reinterpret_cast` would give you a desirable result, e.g. you could change a `uint` to an `int` if no values in current data use the high bit yet. ### Arrays Arrays are a convenience short-hand for a fixed-length collection of elements. Arrays can be used to replace the following schema: struct Vec3 { x:float; y:float; z:float; } with the following schema: struct Vec3 { v:[float:3]; } Both representations are binary equivalent. Arrays are currently only supported in a `struct`. ### Default, Optional and Required Values There are three, mutually exclusive, reactions to the non-presence of a table's field in the binary data: 1. Default valued fields will return the default value (as defined in the schema). 2. Optional valued fields will return some form of `null` depending on the local language. (In a sense, `null` is the default value). 3. Required fields will cause an error. Flatbuffer verifiers would consider the whole buffer invalid. See the `required` tag below. When writing a schema, values are a sequence of digits. Values may be optionally followed by a decimal point (`.`) and more digits, for float constants, or optionally prefixed by a `-`. Floats may also be in scientific notation; optionally ending with an `e` or `E`, followed by a `+` or `-` and more digits. Values can also be the keyword `null`. Only scalar values can have defaults, non-scalar (string/vector/table) fields default to `null` when not present. You generally do not want to change default values after they're initially defined. Fields that have the default value are not actually stored in the serialized data (see also Gotchas below). Values explicitly written by code generated by the old schema old version, if they happen to be the default, will be read as a different value by code generated with the new schema. This is slightly less bad when converting an optional scalar into a default valued scalar since non-presence would not be overloaded with a previous default value. There are situations, however, where this may be desirable, especially if you can ensure a simultaneous rebuild of all code. ### Enums Define a sequence of named constants, each with a given value, or increasing by one from the previous one. The default first value is `0`. As you can see in the enum declaration, you specify the underlying integral type of the enum with `:` (in this case `byte`), which then determines the type of any fields declared with this enum type. Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`, `uint`, `long` and `ulong`. Typically, enum values should only ever be added, never removed (there is no deprecation for enums). This requires code to handle forwards compatibility itself, by handling unknown enum values. ### Unions Unions share a lot of properties with enums, but instead of new names for constants, you use names of tables. You can then declare a union field, which can hold a reference to any of those types, and additionally a field with the suffix `_type` is generated that holds the corresponding enum value, allowing you to know which type to cast to at runtime. It's possible to give an alias name to a type union. This way a type can even be used to mean different things depending on the name used: table PointPosition { x:uint; y:uint; } table MarkerPosition {} union Position { Start:MarkerPosition, Point:PointPosition, Finish:MarkerPosition } Unions contain a special `NONE` marker to denote that no value is stored so that name cannot be used as an alias. Unions are a good way to be able to send multiple message types as a FlatBuffer. Note that because a union field is really two fields, it must always be part of a table, it cannot be the root of a FlatBuffer by itself. If you have a need to distinguish between different FlatBuffers in a more open-ended way, for example for use as files, see the file identification feature below. There is an experimental support only in C++ for a vector of unions (and types). In the example IDL file above, use [Any] to add a vector of Any to Monster table. There is also experimental support for other types besides tables in unions, in particular structs and strings. There's no direct support for scalars in unions, but they can be wrapped in a struct at no space cost. ### Namespaces These will generate the corresponding namespace in C++ for all helper code, and packages in Java. You can use `.` to specify nested namespaces / packages. ### Includes You can include other schemas files in your current one, e.g.: include "mydefinitions.fbs"; This makes it easier to refer to types defined elsewhere. `include` automatically ensures each file is parsed just once, even when referred to more than once. When using the `flatc` compiler to generate code for schema definitions, only definitions in the current file will be generated, not those from the included files (those you still generate separately). ### Root type This declares what you consider to be the root table (or struct) of the serialized data. This is particularly important for parsing JSON data, which doesn't include object type information. ### File identification and extension Typically, a FlatBuffer binary buffer is not self-describing, i.e. it needs you to know its schema to parse it correctly. But if you want to use a FlatBuffer as a file format, it would be convenient to be able to have a "magic number" in there, like most file formats have, to be able to do a sanity check to see if you're reading the kind of file you're expecting. Now, you can always prefix a FlatBuffer with your own file header, but FlatBuffers has a built-in way to add an identifier to a FlatBuffer that takes up minimal space, and keeps the buffer compatible with buffers that don't have such an identifier. You can specify in a schema, similar to `root_type`, that you intend for this type of FlatBuffer to be used as a file format: file_identifier "MYFI"; Identifiers must always be exactly 4 characters long. These 4 characters will end up as bytes at offsets 4-7 (inclusive) in the buffer. For any schema that has such an identifier, `flatc` will automatically add the identifier to any binaries it generates (with `-b`), and generated calls like `FinishMonsterBuffer` also add the identifier. If you have specified an identifier and wish to generate a buffer without one, you can always still do so by calling `FlatBufferBuilder::Finish` explicitly. After loading a buffer, you can use a call like `MonsterBufferHasIdentifier` to check if the identifier is present. Note that this is best for open-ended uses such as files. If you simply wanted to send one of a set of possible messages over a network for example, you'd be better off with a union. Additionally, by default `flatc` will output binary files as `.bin`. This declaration in the schema will change that to whatever you want: file_extension "ext"; ### RPC interface declarations You can declare RPC calls in a schema, that define a set of functions that take a FlatBuffer as an argument (the request) and return a FlatBuffer as the response (both of which must be table types): rpc_service MonsterStorage { Store(Monster):StoreResponse; Retrieve(MonsterId):Monster; } What code this produces and how it is used depends on language and RPC system used, there is preliminary support for GRPC through the `--grpc` code generator, see `grpc/tests` for an example. ### Comments & documentation May be written as in most C-based languages. Additionally, a triple comment (`///`) on a line by itself signals that a comment is documentation for whatever is declared on the line after it (table/struct/field/enum/union/element), and the comment is output in the corresponding C++ code. Multiple such lines per item are allowed. ### Attributes Attributes may be attached to a declaration, behind a field, or after the name of a table/struct/enum/union. These may either have a value or not. Some attributes like `deprecated` are understood by the compiler; user defined ones need to be declared with the attribute declaration (like `priority` in the example above), and are available to query if you parse the schema at runtime. This is useful if you write your own code generators/editors etc., and you wish to add additional information specific to your tool (such as a help text). Current understood attributes: - `id: n` (on a table field): manually set the field identifier to `n`. If you use this attribute, you must use it on ALL fields of this table, and the numbers must be a contiguous range from 0 onwards. Additionally, since a union type effectively adds two fields, its id must be that of the second field (the first field is the type field and not explicitly declared in the schema). For example, if the last field before the union field had id 6, the union field should have id 8, and the unions type field will implicitly be 7. IDs allow the fields to be placed in any order in the schema. When a new field is added to the schema it must use the next available ID. - `deprecated` (on a field): do not generate accessors for this field anymore, code should stop using this data. Old data may still contain this field, but it won't be accessible anymore by newer code. Note that if you deprecate a field that was previous required, old code may fail to validate new data (when using the optional verifier). - `required` (on a non-scalar table field): this field must always be set. By default, fields do not need to be present in the binary. This is desirable, as it helps with forwards/backwards compatibility, and flexibility of data structures. By specifying this attribute, you make non- presence in an error for both reader and writer. The reading code may access the field directly, without checking for null. If the constructing code does not initialize this field, they will get an assert, and also the verifier will fail on buffers that have missing required fields. Both adding and removing this attribute may be forwards/backwards incompatible as readers will be unable read old or new data, respectively, unless the data happens to always have the field set. - `force_align: size` (on a struct): force the alignment of this struct to be something higher than what it is naturally aligned to. Causes these structs to be aligned to that amount inside a buffer, IF that buffer is allocated with that alignment (which is not necessarily the case for buffers accessed directly inside a `FlatBufferBuilder`). Note: currently not guaranteed to have an effect when used with `--object-api`, since that may allocate objects at alignments less than what you specify with `force_align`. - `force_align: size` (on a vector): force the alignment of this vector to be something different than what the element size would normally dictate. Note: Now only work for generated C++ code. - `bit_flags` (on an unsigned enum): the values of this field indicate bits, meaning that any unsigned value N specified in the schema will end up representing 1< ### Testing whether a field is present in a table Most serialization formats (e.g. JSON or Protocol Buffers) make it very explicit in the format whether a field is present in an object or not, allowing you to use this as "extra" information. FlatBuffers will not write fields that are equal to their default value, sometimes resulting in significant space savings. However, this also means we cannot disambiguate the meaning of non-presence as "written default value" or "not written at all". This only applies to scalar fields since only they support default values. Unless otherwise specified, their default is 0. If you care about the presence of scalars, most languages support "optional scalars." You can set `null` as the default value in the schema. `null` is a value that's outside of all types, so we will always write if `add_field` is called. The generated field accessor should use the local language's canonical optional type. Some `FlatBufferBuilder` implementations have an option called `force_defaults` that circumvents this "not writing defaults" behavior you can then use `IsFieldPresent` to query presence. Another option that works in all languages is to wrap a scalar field in a struct. This way it will return null if it is not present. This will be slightly less ergonomic but structs don't take up any more space than the scalar they represent. [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language