• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1FlatBuffer Internals    {#flatbuffers_internals}
2====================
3
4This section is entirely optional for the use of FlatBuffers. In normal
5usage, you should never need the information contained herein. If you're
6interested however, it should give you more of an appreciation of why
7FlatBuffers is both efficient and convenient.
8
9### Format components
10
11A FlatBuffer is a binary file and in-memory format consisting mostly of
12scalars of various sizes, all aligned to their own size. Each scalar is
13also always represented in little-endian format, as this corresponds to
14all commonly used CPUs today. FlatBuffers will also work on big-endian
15machines, but will be slightly slower because of additional
16byte-swap intrinsics.
17
18On purpose, the format leaves a lot of details about where exactly
19things live in memory undefined, e.g. fields in a table can have any
20order, and objects to some extent can be stored in many orders. This is
21because the format doesn't need this information to be efficient, and it
22leaves room for optimization and extension (for example, fields can be
23packed in a way that is most compact). Instead, the format is defined in
24terms of offsets and adjacency only. This may mean two different
25implementations may produce different binaries given the same input
26values, and this is perfectly valid.
27
28### Format identification
29
30The format also doesn't contain information for format identification
31and versioning, which is also by design. FlatBuffers is a statically typed
32system, meaning the user of a buffer needs to know what kind of buffer
33it is. FlatBuffers can of course be wrapped inside other containers
34where needed, or you can use its union feature to dynamically identify
35multiple possible sub-objects stored. Additionally, it can be used
36together with the schema parser if full reflective capabilities are
37desired.
38
39Versioning is something that is intrinsically part of the format (the
40optionality / extensibility of fields), so the format itself does not
41need a version number (it's a meta-format, in a sense). We're hoping
42that this format can accommodate all data needed. If format breaking
43changes are ever necessary, it would become a new kind of format rather
44than just a variation.
45
46### Offsets
47
48The most important and generic offset type (see `flatbuffers.h`) is
49`uoffset_t`, which is currently always a `uint32_t`, and is used to
50refer to all tables/unions/strings/vectors (these are never stored
51in-line). 32bit is
52intentional, since we want to keep the format binary compatible between
5332 and 64bit systems, and a 64bit offset would bloat the size for almost
54all uses. A version of this format with 64bit (or 16bit) offsets is easy to set
55when needed. Unsigned means they can only point in one direction, which
56typically is forward (towards a higher memory location). Any backwards
57offsets will be explicitly marked as such.
58
59The format starts with an `uoffset_t` to the root object in the buffer.
60
61We have two kinds of objects, structs and tables.
62
63### Structs
64
65These are the simplest, and as mentioned, intended for simple data that
66benefits from being extra efficient and doesn't need versioning /
67extensibility. They are always stored inline in their parent (a struct,
68table, or vector) for maximum compactness. Structs define a consistent
69memory layout where all components are aligned to their size, and
70structs aligned to their largest scalar member. This is done independent
71of the alignment rules of the underlying compiler to guarantee a cross
72platform compatible layout. This layout is then enforced in the generated
73code.
74
75### Tables
76
77Unlike structs, these are not stored in inline in their parent, but are
78referred to by offset.
79
80They start with an `soffset_t` to a vtable. This is a signed version of
81`uoffset_t`, since vtables may be stored anywhere relative to the object.
82This offset is substracted (not added) from the object start to arrive at
83the vtable start. This offset is followed by all the
84fields as aligned scalars (or offsets). Unlike structs, not all fields
85need to be present. There is no set order and layout.
86
87To be able to access fields regardless of these uncertainties, we go
88through a vtable of offsets. Vtables are shared between any objects that
89happen to have the same vtable values.
90
91The elements of a vtable are all of type `voffset_t`, which is
92a `uint16_t`. The first element is the size of the vtable in bytes,
93including the size element. The second one is the size of the object, in bytes
94(including the vtable offset). This size could be used for streaming, to know
95how many bytes to read to be able to access all *inline* fields of the object.
96The remaining elements are the N offsets, where N is the amount of fields
97declared in the schema when the code that constructed this buffer was
98compiled (thus, the size of the table is N + 2).
99
100All accessor functions in the generated code for tables contain the
101offset into this table as a constant. This offset is checked against the
102first field (the number of elements), to protect against newer code
103reading older data. If this offset is out of range, or the vtable entry
104is 0, that means the field is not present in this object, and the
105default value is return. Otherwise, the entry is used as offset to the
106field to be read.
107
108### Strings and Vectors
109
110Strings are simply a vector of bytes, and are always
111null-terminated. Vectors are stored as contiguous aligned scalar
112elements prefixed by a 32bit element count (not including any
113null termination). Neither is stored inline in their parent, but are referred to
114by offset.
115
116### Construction
117
118The current implementation constructs these buffers backwards (starting
119at the highest memory address of the buffer), since
120that significantly reduces the amount of bookkeeping and simplifies the
121construction API.
122
123### Code example
124
125Here's an example of the code that gets generated for the `samples/monster.fbs`.
126What follows is the entire file, broken up by comments:
127
128    // automatically generated, do not modify
129
130    #include "flatbuffers/flatbuffers.h"
131
132    namespace MyGame {
133    namespace Sample {
134
135Nested namespace support.
136
137    enum {
138      Color_Red = 0,
139      Color_Green = 1,
140      Color_Blue = 2,
141    };
142
143    inline const char **EnumNamesColor() {
144      static const char *names[] = { "Red", "Green", "Blue", nullptr };
145      return names;
146    }
147
148    inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }
149
150Enums and convenient reverse lookup.
151
152    enum {
153      Any_NONE = 0,
154      Any_Monster = 1,
155    };
156
157    inline const char **EnumNamesAny() {
158      static const char *names[] = { "NONE", "Monster", nullptr };
159      return names;
160    }
161
162    inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }
163
164Unions share a lot with enums.
165
166    struct Vec3;
167    struct Monster;
168
169Predeclare all data types since circular references between types are allowed
170(circular references between object are not, though).
171
172    MANUALLY_ALIGNED_STRUCT(4) Vec3 {
173     private:
174      float x_;
175      float y_;
176      float z_;
177
178     public:
179      Vec3(float x, float y, float z)
180        : x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}
181
182      float x() const { return flatbuffers::EndianScalar(x_); }
183      float y() const { return flatbuffers::EndianScalar(y_); }
184      float z() const { return flatbuffers::EndianScalar(z_); }
185    };
186    STRUCT_END(Vec3, 12);
187
188These ugly macros do a couple of things: they turn off any padding the compiler
189might normally do, since we add padding manually (though none in this example),
190and they enforce alignment chosen by FlatBuffers. This ensures the layout of
191this struct will look the same regardless of compiler and platform. Note that
192the fields are private: this is because these store little endian scalars
193regardless of platform (since this is part of the serialized data).
194`EndianScalar` then converts back and forth, which is a no-op on all current
195mobile and desktop platforms, and a single machine instruction on the few
196remaining big endian platforms.
197
198    struct Monster : private flatbuffers::Table {
199      const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); }
200      int16_t mana() const { return GetField<int16_t>(6, 150); }
201      int16_t hp() const { return GetField<int16_t>(8, 100); }
202      const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
203      const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
204      int8_t color() const { return GetField<int8_t>(16, 2); }
205    };
206
207Tables are a bit more complicated. A table accessor struct is used to point at
208the serialized data for a table, which always starts with an offset to its
209vtable. It derives from `Table`, which contains the `GetField` helper functions.
210GetField takes a vtable offset, and a default value. It will look in the vtable
211at that offset. If the offset is out of bounds (data from an older version) or
212the vtable entry is 0, the field is not present and the default is returned.
213Otherwise, it uses the entry as an offset into the table to locate the field.
214
215    struct MonsterBuilder {
216      flatbuffers::FlatBufferBuilder &fbb_;
217      flatbuffers::uoffset_t start_;
218      void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
219      void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); }
220      void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); }
221      void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); }
222      void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); }
223      void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); }
224      MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
225      flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); }
226    };
227
228`MonsterBuilder` is the base helper struct to construct a table using a
229`FlatBufferBuilder`. You can add the fields in any order, and the `Finish`
230call will ensure the correct vtable gets generated.
231
232    inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb,
233                                                      const Vec3 *pos, int16_t mana,
234                                                      int16_t hp,
235                                                      flatbuffers::Offset<flatbuffers::String> name,
236                                                      flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory,
237                                                      int8_t color) {
238      MonsterBuilder builder_(_fbb);
239      builder_.add_inventory(inventory);
240      builder_.add_name(name);
241      builder_.add_pos(pos);
242      builder_.add_hp(hp);
243      builder_.add_mana(mana);
244      builder_.add_color(color);
245      return builder_.Finish();
246    }
247
248`CreateMonster` is a convenience function that calls all functions in
249`MonsterBuilder` above for you. Note that if you pass values which are
250defaults as arguments, it will not actually construct that field, so
251you can probably use this function instead of the builder class in
252almost all cases.
253
254    inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); }
255
256This function is only generated for the root table type, to be able to
257start traversing a FlatBuffer from a raw buffer pointer.
258
259    }; // namespace MyGame
260    }; // namespace Sample
261
262### Encoding example.
263
264Below is a sample encoding for the following JSON corresponding to the above
265schema:
266
267    { pos: { x: 1, y: 2, z: 3 }, name: "fred", hp: 50 }
268
269Resulting in this binary buffer:
270
271    // Start of the buffer:
272    uint32_t 20  // Offset to the root table.
273
274    // Start of the vtable. Not shared in this example, but could be:
275    uint16_t 16 // Size of table, starting from here.
276    uint16_t 22 // Size of object inline data.
277    uint16_t 4, 0, 20, 16, 0, 0  // Offsets to fields from start of (root) table, 0 for not present.
278
279    // Start of the root table:
280    int32_t 16     // Offset to vtable used (default negative direction)
281    float 1, 2, 3  // the Vec3 struct, inline.
282    uint32_t 8     // Offset to the name string.
283    int16_t 50     // hp field.
284    int16_t 0      // Padding for alignment.
285
286    // Start of name string:
287    uint32_t 4  // Length of string.
288    int8_t 'f', 'r', 'e', 'd', 0, 0, 0, 0  // Text + 0 termination + padding.
289
290Note that this not the only possible encoding, since the writer has some
291flexibility in which of the children of root object to write first (though in
292this case there's only one string), and what order to write the fields in.
293Different orders may also cause different alignments to happen.
294
295# FlexBuffers
296
297The [schema-less](@ref flexbuffers) version of FlatBuffers have their
298own encoding, detailed here.
299
300It shares many properties mentioned above, in that all data is accessed
301over offsets, all scalars are aligned to their own size, and
302all data is always stored in little endian format.
303
304One difference is that FlexBuffers are built front to back, so children are
305stored before parents, and the root of the data starts at the last byte.
306
307Another difference is that scalar data is stored with a variable number of bits
308(8/16/32/64). The current width is always determined by the *parent*, i.e. if
309the scalar sits in a vector, the vector determines the bit width for all
310elements at once. Selecting the minimum bit width for a particular vector is
311something the encoder does automatically and thus is typically of no concern
312to the user, though being aware of this feature (and not sticking a double in
313the same vector as a bunch of byte sized elements) is helpful for efficiency.
314
315Unlike FlatBuffers there is only one kind of offset, and that is an unsigned
316integer indicating the number of bytes in a negative direction from the address
317of itself (where the offset is stored).
318
319### Vectors
320
321The representation of the vector is at the core of how FlexBuffers works (since
322maps are really just a combination of 2 vectors), so it is worth starting there.
323
324As mentioned, a vector is governed by a single bit width (supplied by its
325parent). This includes the size field. For example, a vector that stores the
326integer values `1, 2, 3` is encoded as follows:
327
328    uint8_t 3, 1, 2, 3, 4, 4, 4
329
330The first `3` is the size field, and is placed before the vector (an offset
331from the parent to this vector points to the first element, not the size
332field, so the size field is effectively at index -1).
333Since this is an untyped vector `SL_VECTOR`, it is followed by 3 type
334bytes (one per element of the vector), which are always following the vector,
335and are always a uint8_t even if the vector is made up of bigger scalars.
336
337### Types
338
339A type byte is made up of 2 components (see flexbuffers.h for exact values):
340
341* 2 lower bits representing the bit-width of the child (8, 16, 32, 64).
342  This is only used if the child is accessed over an offset, such as a child
343  vector. It is ignored for inline types.
344* 6 bits representing the actual type (see flexbuffers.h).
345
346Thus, in this example `4` means 8 bit child (value 0, unused, since the value is
347in-line), type `SL_INT` (value 1).
348
349### Typed Vectors
350
351These are like the Vectors above, but omit the type bytes. The type is instead
352determined by the vector type supplied by the parent. Typed vectors are only
353available for a subset of types for which these savings can be significant,
354namely inline signed/unsigned integers (`TYPE_VECTOR_INT` / `TYPE_VECTOR_UINT`),
355floats (`TYPE_VECTOR_FLOAT`), and keys (`TYPE_VECTOR_KEY`, see below).
356
357Additionally, for scalars, there are fixed length vectors of sizes 2 / 3 / 4
358that don't store the size (`TYPE_VECTOR_INT2` etc.), for an additional savings
359in space when storing common vector or color data.
360
361### Scalars
362
363FlexBuffers supports integers (`TYPE_INT` and `TYPE_UINT`) and floats
364(`TYPE_FLOAT`), available in the bit-widths mentioned above. They can be stored
365both inline and over an offset (`TYPE_INDIRECT_*`).
366
367The offset version is useful to encode costly 64bit (or even 32bit) quantities
368into vectors / maps of smaller sizes, and to share / repeat a value multiple
369times.
370
371### Blobs, Strings and Keys.
372
373A blob (`TYPE_BLOB`) is encoded similar to a vector, with one difference: the
374elements are always `uint8_t`. The parent bit width only determines the width of
375the size field, allowing blobs to be large without the elements being large.
376
377Strings (`TYPE_STRING`) are similar to blobs, except they have an additional 0
378termination byte for convenience, and they MUST be UTF-8 encoded (since an
379accessor in a language that does not support pointers to UTF-8 data may have to
380convert them to a native string type).
381
382A "Key" (`TYPE_KEY`) is similar to a string, but doesn't store the size
383field. They're so named because they are used with maps, which don't care
384for the size, and can thus be even more compact. Unlike strings, keys cannot
385contain bytes of value 0 as part of their data (size can only be determined by
386`strlen`), so while you can use them outside the context of maps if you so
387desire, you're usually better off with strings.
388
389### Maps
390
391A map (`TYPE_MAP`) is like an (untyped) vector, but with 2 prefixes before the
392size field:
393
394| index | field                                                        |
395| ----: | :----------------------------------------------------------- |
396| -3    | An offset to the keys vector (may be shared between tables). |
397| -2    | Byte width of the keys vector.                               |
398| -1    | Size (from here on it is compatible with `TYPE_VECTOR`)      |
399| 0     | Elements.                                                    |
400| Size  | Types.                                                       |
401
402Since a map is otherwise the same as a vector, it can be iterated like
403a vector (which is probably faster than lookup by key).
404
405The keys vector is a typed vector of keys. Both the keys and corresponding
406values *have* to be stored in sorted order (as determined by `strcmp`), such
407that lookups can be made using binary search.
408
409The reason the key vector is a seperate structure from the value vector is
410such that it can be shared between multiple value vectors, and also to
411allow it to be treated as its own indivual vector in code.
412
413An example map { foo: 13, bar: 14 } would be encoded as:
414
415    0 : uint8_t 'f', 'o', 'o', 0
416    4 : uint8_t 'b', 'a', 'r', 0
417    8 : uint8_t 2      // key vector of size 2
418    // key vector offset points here
419    9 : uint8_t 9, 6   // offsets to foo_key and bar_key
420    11: uint8_t 3, 1   // offset to key vector, and its byte width
421    13: uint8_t 2      // value vector of size
422    // value vector offset points here
423    14: uint8_t 13, 14 // values
424    16: uint8_t 4, 4   // types
425
426### The root
427
428As mentioned, the root starts at the end of the buffer.
429The last uint8_t is the width in bytes of the root (normally the parent
430determines the width, but the root has no parent). The uint8_t before this is
431the type of the root, and the bytes before that are the root value (of the
432number of bytes specified by the last byte).
433
434So for example, the integer value `13` as root would be:
435
436    uint8_t 13, 4, 1    // Value, type, root byte width.
437
438
439<br>
440