• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1FlatBuffer Internals    {#flatbuffers_internals}
2====================
3
4This section is entirely optional for the use of FlatBuffers. In normal
5usage, you should never need the information contained herein. If you're
6interested however, it should give you more of an appreciation of why
7FlatBuffers is both efficient and convenient.
8
9### Format components
10
11A FlatBuffer is a binary file and in-memory format consisting mostly of
12scalars of various sizes, all aligned to their own size. Each scalar is
13also always represented in little-endian format, as this corresponds to
14all commonly used CPUs today. FlatBuffers will also work on big-endian
15machines, but will be slightly slower because of additional
16byte-swap intrinsics.
17
18It is assumed that the following conditions are met, to ensure
19cross-platform interoperability:
20- The binary `IEEE-754` format is used for floating-point numbers.
21- The `two's complemented` representation is used for signed integers.
22- The endianness is the same for floating-point numbers as for integers.
23
24On purpose, the format leaves a lot of details about where exactly
25things live in memory undefined, e.g. fields in a table can have any
26order, and objects to some extent can be stored in many orders. This is
27because the format doesn't need this information to be efficient, and it
28leaves room for optimization and extension (for example, fields can be
29packed in a way that is most compact). Instead, the format is defined in
30terms of offsets and adjacency only. This may mean two different
31implementations may produce different binaries given the same input
32values, and this is perfectly valid.
33
34### Format identification
35
36The format also doesn't contain information for format identification
37and versioning, which is also by design. FlatBuffers is a statically typed
38system, meaning the user of a buffer needs to know what kind of buffer
39it is. FlatBuffers can of course be wrapped inside other containers
40where needed, or you can use its union feature to dynamically identify
41multiple possible sub-objects stored. Additionally, it can be used
42together with the schema parser if full reflective capabilities are
43desired.
44
45Versioning is something that is intrinsically part of the format (the
46optionality / extensibility of fields), so the format itself does not
47need a version number (it's a meta-format, in a sense). We're hoping
48that this format can accommodate all data needed. If format breaking
49changes are ever necessary, it would become a new kind of format rather
50than just a variation.
51
52### Offsets
53
54The most important and generic offset type (see `flatbuffers.h`) is
55`uoffset_t`, which is currently always a `uint32_t`, and is used to
56refer to all tables/unions/strings/vectors (these are never stored
57in-line). 32bit is
58intentional, since we want to keep the format binary compatible between
5932 and 64bit systems, and a 64bit offset would bloat the size for almost
60all uses. A version of this format with 64bit (or 16bit) offsets is easy to set
61when needed. Unsigned means they can only point in one direction, which
62typically is forward (towards a higher memory location). Any backwards
63offsets will be explicitly marked as such.
64
65The format starts with an `uoffset_t` to the root object in the buffer.
66
67We have two kinds of objects, structs and tables.
68
69### Structs
70
71These are the simplest, and as mentioned, intended for simple data that
72benefits from being extra efficient and doesn't need versioning /
73extensibility. They are always stored inline in their parent (a struct,
74table, or vector) for maximum compactness. Structs define a consistent
75memory layout where all components are aligned to their size, and
76structs aligned to their largest scalar member. This is done independent
77of the alignment rules of the underlying compiler to guarantee a cross
78platform compatible layout. This layout is then enforced in the generated
79code.
80
81### Tables
82
83Unlike structs, these are not stored in inline in their parent, but are
84referred to by offset.
85
86They start with an `soffset_t` to a vtable. This is a signed version of
87`uoffset_t`, since vtables may be stored anywhere relative to the object.
88This offset is substracted (not added) from the object start to arrive at
89the vtable start. This offset is followed by all the
90fields as aligned scalars (or offsets). Unlike structs, not all fields
91need to be present. There is no set order and layout.
92
93To be able to access fields regardless of these uncertainties, we go
94through a vtable of offsets. Vtables are shared between any objects that
95happen to have the same vtable values.
96
97The elements of a vtable are all of type `voffset_t`, which is
98a `uint16_t`. The first element is the size of the vtable in bytes,
99including the size element. The second one is the size of the object, in bytes
100(including the vtable offset). This size could be used for streaming, to know
101how many bytes to read to be able to access all *inline* fields of the object.
102The remaining elements are the N offsets, where N is the amount of fields
103declared in the schema when the code that constructed this buffer was
104compiled (thus, the size of the table is N + 2).
105
106All accessor functions in the generated code for tables contain the
107offset into this table as a constant. This offset is checked against the
108first field (the number of elements), to protect against newer code
109reading older data. If this offset is out of range, or the vtable entry
110is 0, that means the field is not present in this object, and the
111default value is return. Otherwise, the entry is used as offset to the
112field to be read.
113
114### Strings and Vectors
115
116Strings are simply a vector of bytes, and are always
117null-terminated. Vectors are stored as contiguous aligned scalar
118elements prefixed by a 32bit element count (not including any
119null termination). Neither is stored inline in their parent, but are referred to
120by offset.
121
122### Construction
123
124The current implementation constructs these buffers backwards (starting
125at the highest memory address of the buffer), since
126that significantly reduces the amount of bookkeeping and simplifies the
127construction API.
128
129### Code example
130
131Here's an example of the code that gets generated for the `samples/monster.fbs`.
132What follows is the entire file, broken up by comments:
133
134    // automatically generated, do not modify
135
136    #include "flatbuffers/flatbuffers.h"
137
138    namespace MyGame {
139    namespace Sample {
140
141Nested namespace support.
142
143    enum {
144      Color_Red = 0,
145      Color_Green = 1,
146      Color_Blue = 2,
147    };
148
149    inline const char **EnumNamesColor() {
150      static const char *names[] = { "Red", "Green", "Blue", nullptr };
151      return names;
152    }
153
154    inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }
155
156Enums and convenient reverse lookup.
157
158    enum {
159      Any_NONE = 0,
160      Any_Monster = 1,
161    };
162
163    inline const char **EnumNamesAny() {
164      static const char *names[] = { "NONE", "Monster", nullptr };
165      return names;
166    }
167
168    inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }
169
170Unions share a lot with enums.
171
172    struct Vec3;
173    struct Monster;
174
175Predeclare all data types since circular references between types are allowed
176(circular references between object are not, though).
177
178    FLATBUFFERS_MANUALLY_ALIGNED_STRUCT(4) Vec3 {
179     private:
180      float x_;
181      float y_;
182      float z_;
183
184     public:
185      Vec3(float x, float y, float z)
186        : x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}
187
188      float x() const { return flatbuffers::EndianScalar(x_); }
189      float y() const { return flatbuffers::EndianScalar(y_); }
190      float z() const { return flatbuffers::EndianScalar(z_); }
191    };
192    FLATBUFFERS_STRUCT_END(Vec3, 12);
193
194These ugly macros do a couple of things: they turn off any padding the compiler
195might normally do, since we add padding manually (though none in this example),
196and they enforce alignment chosen by FlatBuffers. This ensures the layout of
197this struct will look the same regardless of compiler and platform. Note that
198the fields are private: this is because these store little endian scalars
199regardless of platform (since this is part of the serialized data).
200`EndianScalar` then converts back and forth, which is a no-op on all current
201mobile and desktop platforms, and a single machine instruction on the few
202remaining big endian platforms.
203
204    struct Monster : private flatbuffers::Table {
205      const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); }
206      int16_t mana() const { return GetField<int16_t>(6, 150); }
207      int16_t hp() const { return GetField<int16_t>(8, 100); }
208      const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
209      const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
210      int8_t color() const { return GetField<int8_t>(16, 2); }
211    };
212
213Tables are a bit more complicated. A table accessor struct is used to point at
214the serialized data for a table, which always starts with an offset to its
215vtable. It derives from `Table`, which contains the `GetField` helper functions.
216GetField takes a vtable offset, and a default value. It will look in the vtable
217at that offset. If the offset is out of bounds (data from an older version) or
218the vtable entry is 0, the field is not present and the default is returned.
219Otherwise, it uses the entry as an offset into the table to locate the field.
220
221    struct MonsterBuilder {
222      flatbuffers::FlatBufferBuilder &fbb_;
223      flatbuffers::uoffset_t start_;
224      void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
225      void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); }
226      void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); }
227      void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); }
228      void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); }
229      void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); }
230      MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
231      flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); }
232    };
233
234`MonsterBuilder` is the base helper struct to construct a table using a
235`FlatBufferBuilder`. You can add the fields in any order, and the `Finish`
236call will ensure the correct vtable gets generated.
237
238    inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb,
239                                                      const Vec3 *pos, int16_t mana,
240                                                      int16_t hp,
241                                                      flatbuffers::Offset<flatbuffers::String> name,
242                                                      flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory,
243                                                      int8_t color) {
244      MonsterBuilder builder_(_fbb);
245      builder_.add_inventory(inventory);
246      builder_.add_name(name);
247      builder_.add_pos(pos);
248      builder_.add_hp(hp);
249      builder_.add_mana(mana);
250      builder_.add_color(color);
251      return builder_.Finish();
252    }
253
254`CreateMonster` is a convenience function that calls all functions in
255`MonsterBuilder` above for you. Note that if you pass values which are
256defaults as arguments, it will not actually construct that field, so
257you can probably use this function instead of the builder class in
258almost all cases.
259
260    inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); }
261
262This function is only generated for the root table type, to be able to
263start traversing a FlatBuffer from a raw buffer pointer.
264
265    }; // namespace MyGame
266    }; // namespace Sample
267
268### Encoding example.
269
270Below is a sample encoding for the following JSON corresponding to the above
271schema:
272
273    { pos: { x: 1, y: 2, z: 3 }, name: "fred", hp: 50 }
274
275Resulting in this binary buffer:
276
277    // Start of the buffer:
278    uint32_t 20  // Offset to the root table.
279
280    // Start of the vtable. Not shared in this example, but could be:
281    uint16_t 16 // Size of table, starting from here.
282    uint16_t 22 // Size of object inline data.
283    uint16_t 4, 0, 20, 16, 0, 0  // Offsets to fields from start of (root) table, 0 for not present.
284
285    // Start of the root table:
286    int32_t 16     // Offset to vtable used (default negative direction)
287    float 1, 2, 3  // the Vec3 struct, inline.
288    uint32_t 8     // Offset to the name string.
289    int16_t 50     // hp field.
290    int16_t 0      // Padding for alignment.
291
292    // Start of name string:
293    uint32_t 4  // Length of string.
294    int8_t 'f', 'r', 'e', 'd', 0, 0, 0, 0  // Text + 0 termination + padding.
295
296Note that this not the only possible encoding, since the writer has some
297flexibility in which of the children of root object to write first (though in
298this case there's only one string), and what order to write the fields in.
299Different orders may also cause different alignments to happen.
300
301### Additional reading.
302
303The author of the C language implementation has made a similar
304[document](https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#flatbuffers-binary-format)
305that may further help clarify the format.
306
307# FlexBuffers
308
309The [schema-less](@ref flexbuffers) version of FlatBuffers have their
310own encoding, detailed here.
311
312It shares many properties mentioned above, in that all data is accessed
313over offsets, all scalars are aligned to their own size, and
314all data is always stored in little endian format.
315
316One difference is that FlexBuffers are built front to back, so children are
317stored before parents, and the root of the data starts at the last byte.
318
319Another difference is that scalar data is stored with a variable number of bits
320(8/16/32/64). The current width is always determined by the *parent*, i.e. if
321the scalar sits in a vector, the vector determines the bit width for all
322elements at once. Selecting the minimum bit width for a particular vector is
323something the encoder does automatically and thus is typically of no concern
324to the user, though being aware of this feature (and not sticking a double in
325the same vector as a bunch of byte sized elements) is helpful for efficiency.
326
327Unlike FlatBuffers there is only one kind of offset, and that is an unsigned
328integer indicating the number of bytes in a negative direction from the address
329of itself (where the offset is stored).
330
331### Vectors
332
333The representation of the vector is at the core of how FlexBuffers works (since
334maps are really just a combination of 2 vectors), so it is worth starting there.
335
336As mentioned, a vector is governed by a single bit width (supplied by its
337parent). This includes the size field. For example, a vector that stores the
338integer values `1, 2, 3` is encoded as follows:
339
340    uint8_t 3, 1, 2, 3, 4, 4, 4
341
342The first `3` is the size field, and is placed before the vector (an offset
343from the parent to this vector points to the first element, not the size
344field, so the size field is effectively at index -1).
345Since this is an untyped vector `SL_VECTOR`, it is followed by 3 type
346bytes (one per element of the vector), which are always following the vector,
347and are always a uint8_t even if the vector is made up of bigger scalars.
348
349### Types
350
351A type byte is made up of 2 components (see flexbuffers.h for exact values):
352
353* 2 lower bits representing the bit-width of the child (8, 16, 32, 64).
354  This is only used if the child is accessed over an offset, such as a child
355  vector. It is ignored for inline types.
356* 6 bits representing the actual type (see flexbuffers.h).
357
358Thus, in this example `4` means 8 bit child (value 0, unused, since the value is
359in-line), type `SL_INT` (value 1).
360
361### Typed Vectors
362
363These are like the Vectors above, but omit the type bytes. The type is instead
364determined by the vector type supplied by the parent. Typed vectors are only
365available for a subset of types for which these savings can be significant,
366namely inline signed/unsigned integers (`TYPE_VECTOR_INT` / `TYPE_VECTOR_UINT`),
367floats (`TYPE_VECTOR_FLOAT`), and keys (`TYPE_VECTOR_KEY`, see below).
368
369Additionally, for scalars, there are fixed length vectors of sizes 2 / 3 / 4
370that don't store the size (`TYPE_VECTOR_INT2` etc.), for an additional savings
371in space when storing common vector or color data.
372
373### Scalars
374
375FlexBuffers supports integers (`TYPE_INT` and `TYPE_UINT`) and floats
376(`TYPE_FLOAT`), available in the bit-widths mentioned above. They can be stored
377both inline and over an offset (`TYPE_INDIRECT_*`).
378
379The offset version is useful to encode costly 64bit (or even 32bit) quantities
380into vectors / maps of smaller sizes, and to share / repeat a value multiple
381times.
382
383### Booleans and Nulls
384
385Booleans (`TYPE_BOOL`) and nulls (`TYPE_NULL`) are encoded as inlined unsigned integers.
386
387### Blobs, Strings and Keys.
388
389A blob (`TYPE_BLOB`) is encoded similar to a vector, with one difference: the
390elements are always `uint8_t`. The parent bit width only determines the width of
391the size field, allowing blobs to be large without the elements being large.
392
393Strings (`TYPE_STRING`) are similar to blobs, except they have an additional 0
394termination byte for convenience, and they MUST be UTF-8 encoded (since an
395accessor in a language that does not support pointers to UTF-8 data may have to
396convert them to a native string type).
397
398A "Key" (`TYPE_KEY`) is similar to a string, but doesn't store the size
399field. They're so named because they are used with maps, which don't care
400for the size, and can thus be even more compact. Unlike strings, keys cannot
401contain bytes of value 0 as part of their data (size can only be determined by
402`strlen`), so while you can use them outside the context of maps if you so
403desire, you're usually better off with strings.
404
405### Maps
406
407A map (`TYPE_MAP`) is like an (untyped) vector, but with 2 prefixes before the
408size field:
409
410| index | field                                                        |
411| ----: | :----------------------------------------------------------- |
412| -3    | An offset to the keys vector (may be shared between tables). |
413| -2    | Byte width of the keys vector.                               |
414| -1    | Size (from here on it is compatible with `TYPE_VECTOR`)      |
415| 0     | Elements.                                                    |
416| Size  | Types.                                                       |
417
418Since a map is otherwise the same as a vector, it can be iterated like
419a vector (which is probably faster than lookup by key).
420
421The keys vector is a typed vector of keys. Both the keys and corresponding
422values *have* to be stored in sorted order (as determined by `strcmp`), such
423that lookups can be made using binary search.
424
425The reason the key vector is a seperate structure from the value vector is
426such that it can be shared between multiple value vectors, and also to
427allow it to be treated as its own individual vector in code.
428
429An example map { foo: 13, bar: 14 } would be encoded as:
430
431    0 : uint8_t 'b', 'a', 'r', 0
432    4 : uint8_t 'f', 'o', 'o', 0
433    8 : uint8_t 2      // key vector of size 2
434    // key vector offset points here
435    9 : uint8_t 9, 6   // offsets to bar_key and foo_key
436    11: uint8_t 2, 1   // offset to key vector, and its byte width
437    13: uint8_t 2      // value vector of size
438    // value vector offset points here
439    14: uint8_t 14, 13 // values
440    16: uint8_t 4, 4   // types
441
442### The root
443
444As mentioned, the root starts at the end of the buffer.
445The last uint8_t is the width in bytes of the root (normally the parent
446determines the width, but the root has no parent). The uint8_t before this is
447the type of the root, and the bytes before that are the root value (of the
448number of bytes specified by the last byte).
449
450So for example, the integer value `13` as root would be:
451
452    uint8_t 13, 4, 1    // Value, type, root byte width.
453
454
455<br>
456