1FlatBuffer Internals {#flatbuffers_internals} 2==================== 3 4This section is entirely optional for the use of FlatBuffers. In normal 5usage, you should never need the information contained herein. If you're 6interested however, it should give you more of an appreciation of why 7FlatBuffers is both efficient and convenient. 8 9### Format components 10 11A FlatBuffer is a binary file and in-memory format consisting mostly of 12scalars of various sizes, all aligned to their own size. Each scalar is 13also always represented in little-endian format, as this corresponds to 14all commonly used CPUs today. FlatBuffers will also work on big-endian 15machines, but will be slightly slower because of additional 16byte-swap intrinsics. 17 18It is assumed that the following conditions are met, to ensure 19cross-platform interoperability: 20- The binary `IEEE-754` format is used for floating-point numbers. 21- The `two's complemented` representation is used for signed integers. 22- The endianness is the same for floating-point numbers as for integers. 23 24On purpose, the format leaves a lot of details about where exactly 25things live in memory undefined, e.g. fields in a table can have any 26order, and objects to some extent can be stored in many orders. This is 27because the format doesn't need this information to be efficient, and it 28leaves room for optimization and extension (for example, fields can be 29packed in a way that is most compact). Instead, the format is defined in 30terms of offsets and adjacency only. This may mean two different 31implementations may produce different binaries given the same input 32values, and this is perfectly valid. 33 34### Format identification 35 36The format also doesn't contain information for format identification 37and versioning, which is also by design. FlatBuffers is a statically typed 38system, meaning the user of a buffer needs to know what kind of buffer 39it is. FlatBuffers can of course be wrapped inside other containers 40where needed, or you can use its union feature to dynamically identify 41multiple possible sub-objects stored. Additionally, it can be used 42together with the schema parser if full reflective capabilities are 43desired. 44 45Versioning is something that is intrinsically part of the format (the 46optionality / extensibility of fields), so the format itself does not 47need a version number (it's a meta-format, in a sense). We're hoping 48that this format can accommodate all data needed. If format breaking 49changes are ever necessary, it would become a new kind of format rather 50than just a variation. 51 52### Offsets 53 54The most important and generic offset type (see `flatbuffers.h`) is 55`uoffset_t`, which is currently always a `uint32_t`, and is used to 56refer to all tables/unions/strings/vectors (these are never stored 57in-line). 32bit is 58intentional, since we want to keep the format binary compatible between 5932 and 64bit systems, and a 64bit offset would bloat the size for almost 60all uses. A version of this format with 64bit (or 16bit) offsets is easy to set 61when needed. Unsigned means they can only point in one direction, which 62typically is forward (towards a higher memory location). Any backwards 63offsets will be explicitly marked as such. 64 65The format starts with an `uoffset_t` to the root table in the buffer. 66 67We have two kinds of objects, structs and tables. 68 69### Structs 70 71These are the simplest, and as mentioned, intended for simple data that 72benefits from being extra efficient and doesn't need versioning / 73extensibility. They are always stored inline in their parent (a struct, 74table, or vector) for maximum compactness. Structs define a consistent 75memory layout where all components are aligned to their size, and 76structs aligned to their largest scalar member. This is done independent 77of the alignment rules of the underlying compiler to guarantee a cross 78platform compatible layout. This layout is then enforced in the generated 79code. 80 81### Tables 82 83Unlike structs, these are not stored in inline in their parent, but are 84referred to by offset. 85 86They start with an `soffset_t` to a vtable. This is a signed version of 87`uoffset_t`, since vtables may be stored anywhere relative to the object. 88This offset is substracted (not added) from the object start to arrive at 89the vtable start. This offset is followed by all the 90fields as aligned scalars (or offsets). Unlike structs, not all fields 91need to be present. There is no set order and layout. A table may contain 92field offsets that point to the same value if the user explicitly 93serializes the same offset twice. 94 95To be able to access fields regardless of these uncertainties, we go 96through a vtable of offsets. Vtables are shared between any objects that 97happen to have the same vtable values. 98 99The elements of a vtable are all of type `voffset_t`, which is 100a `uint16_t`. The first element is the size of the vtable in bytes, 101including the size element. The second one is the size of the object, in bytes 102(including the vtable offset). This size could be used for streaming, to know 103how many bytes to read to be able to access all *inline* fields of the object. 104The remaining elements are the N offsets, where N is the amount of fields 105declared in the schema when the code that constructed this buffer was 106compiled (thus, the size of the table is N + 2). 107 108All accessor functions in the generated code for tables contain the 109offset into this table as a constant. This offset is checked against the 110first field (the number of elements), to protect against newer code 111reading older data. If this offset is out of range, or the vtable entry 112is 0, that means the field is not present in this object, and the 113default value is return. Otherwise, the entry is used as offset to the 114field to be read. 115 116### Unions 117 118Unions are encoded as the combination of two fields: an enum representing the 119union choice and the offset to the actual element. FlatBuffers reserves the 120enumeration constant `NONE` (encoded as 0) to mean that the union field is not 121set. 122 123### Strings and Vectors 124 125Strings are simply a vector of bytes, and are always 126null-terminated. Vectors are stored as contiguous aligned scalar 127elements prefixed by a 32bit element count (not including any 128null termination). Neither is stored inline in their parent, but are referred to 129by offset. A vector may consist of more than one offset pointing to the same 130value if the user explicitly serializes the same offset twice. 131 132### Construction 133 134The current implementation constructs these buffers backwards (starting 135at the highest memory address of the buffer), since 136that significantly reduces the amount of bookkeeping and simplifies the 137construction API. 138 139### Code example 140 141Here's an example of the code that gets generated for the `samples/monster.fbs`. 142What follows is the entire file, broken up by comments: 143 144 // automatically generated, do not modify 145 146 #include "flatbuffers/flatbuffers.h" 147 148 namespace MyGame { 149 namespace Sample { 150 151Nested namespace support. 152 153 enum { 154 Color_Red = 0, 155 Color_Green = 1, 156 Color_Blue = 2, 157 }; 158 159 inline const char **EnumNamesColor() { 160 static const char *names[] = { "Red", "Green", "Blue", nullptr }; 161 return names; 162 } 163 164 inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; } 165 166Enums and convenient reverse lookup. 167 168 enum { 169 Any_NONE = 0, 170 Any_Monster = 1, 171 }; 172 173 inline const char **EnumNamesAny() { 174 static const char *names[] = { "NONE", "Monster", nullptr }; 175 return names; 176 } 177 178 inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; } 179 180Unions share a lot with enums. 181 182 struct Vec3; 183 struct Monster; 184 185Predeclare all data types since circular references between types are allowed 186(circular references between object are not, though). 187 188 FLATBUFFERS_MANUALLY_ALIGNED_STRUCT(4) Vec3 { 189 private: 190 float x_; 191 float y_; 192 float z_; 193 194 public: 195 Vec3(float x, float y, float z) 196 : x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {} 197 198 float x() const { return flatbuffers::EndianScalar(x_); } 199 float y() const { return flatbuffers::EndianScalar(y_); } 200 float z() const { return flatbuffers::EndianScalar(z_); } 201 }; 202 FLATBUFFERS_STRUCT_END(Vec3, 12); 203 204These ugly macros do a couple of things: they turn off any padding the compiler 205might normally do, since we add padding manually (though none in this example), 206and they enforce alignment chosen by FlatBuffers. This ensures the layout of 207this struct will look the same regardless of compiler and platform. Note that 208the fields are private: this is because these store little endian scalars 209regardless of platform (since this is part of the serialized data). 210`EndianScalar` then converts back and forth, which is a no-op on all current 211mobile and desktop platforms, and a single machine instruction on the few 212remaining big endian platforms. 213 214 struct Monster : private flatbuffers::Table { 215 const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); } 216 int16_t mana() const { return GetField<int16_t>(6, 150); } 217 int16_t hp() const { return GetField<int16_t>(8, 100); } 218 const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); } 219 const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); } 220 int8_t color() const { return GetField<int8_t>(16, 2); } 221 }; 222 223Tables are a bit more complicated. A table accessor struct is used to point at 224the serialized data for a table, which always starts with an offset to its 225vtable. It derives from `Table`, which contains the `GetField` helper functions. 226GetField takes a vtable offset, and a default value. It will look in the vtable 227at that offset. If the offset is out of bounds (data from an older version) or 228the vtable entry is 0, the field is not present and the default is returned. 229Otherwise, it uses the entry as an offset into the table to locate the field. 230 231 struct MonsterBuilder { 232 flatbuffers::FlatBufferBuilder &fbb_; 233 flatbuffers::uoffset_t start_; 234 void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); } 235 void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); } 236 void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); } 237 void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); } 238 void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); } 239 void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); } 240 MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); } 241 flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); } 242 }; 243 244`MonsterBuilder` is the base helper struct to construct a table using a 245`FlatBufferBuilder`. You can add the fields in any order, and the `Finish` 246call will ensure the correct vtable gets generated. 247 248 inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb, 249 const Vec3 *pos, int16_t mana, 250 int16_t hp, 251 flatbuffers::Offset<flatbuffers::String> name, 252 flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory, 253 int8_t color) { 254 MonsterBuilder builder_(_fbb); 255 builder_.add_inventory(inventory); 256 builder_.add_name(name); 257 builder_.add_pos(pos); 258 builder_.add_hp(hp); 259 builder_.add_mana(mana); 260 builder_.add_color(color); 261 return builder_.Finish(); 262 } 263 264`CreateMonster` is a convenience function that calls all functions in 265`MonsterBuilder` above for you. Note that if you pass values which are 266defaults as arguments, it will not actually construct that field, so 267you can probably use this function instead of the builder class in 268almost all cases. 269 270 inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); } 271 272This function is only generated for the root table type, to be able to 273start traversing a FlatBuffer from a raw buffer pointer. 274 275 }; // namespace MyGame 276 }; // namespace Sample 277 278### Encoding example. 279 280Below is a sample encoding for the following JSON corresponding to the above 281schema: 282 283 { pos: { x: 1, y: 2, z: 3 }, name: "fred", hp: 50 } 284 285Resulting in this binary buffer: 286 287 // Start of the buffer: 288 uint32_t 20 // Offset to the root table. 289 290 // Start of the vtable. Not shared in this example, but could be: 291 uint16_t 16 // Size of table, starting from here. 292 uint16_t 22 // Size of object inline data. 293 uint16_t 4, 0, 20, 16, 0, 0 // Offsets to fields from start of (root) table, 0 for not present. 294 295 // Start of the root table: 296 int32_t 16 // Offset to vtable used (default negative direction) 297 float 1, 2, 3 // the Vec3 struct, inline. 298 uint32_t 8 // Offset to the name string. 299 int16_t 50 // hp field. 300 int16_t 0 // Padding for alignment. 301 302 // Start of name string: 303 uint32_t 4 // Length of string. 304 int8_t 'f', 'r', 'e', 'd', 0, 0, 0, 0 // Text + 0 termination + padding. 305 306Note that this not the only possible encoding, since the writer has some 307flexibility in which of the children of root object to write first (though in 308this case there's only one string), and what order to write the fields in. 309Different orders may also cause different alignments to happen. 310 311### Additional reading. 312 313The author of the C language implementation has made a similar 314[document](https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#flatbuffers-binary-format) 315that may further help clarify the format. 316 317# FlexBuffers 318 319The [schema-less](@ref flexbuffers) version of FlatBuffers have their 320own encoding, detailed here. 321 322It shares many properties mentioned above, in that all data is accessed 323over offsets, all scalars are aligned to their own size, and 324all data is always stored in little endian format. 325 326One difference is that FlexBuffers are built front to back, so children are 327stored before parents, and the root of the data starts at the last byte. 328 329Another difference is that scalar data is stored with a variable number of bits 330(8/16/32/64). The current width is always determined by the *parent*, i.e. if 331the scalar sits in a vector, the vector determines the bit width for all 332elements at once. Selecting the minimum bit width for a particular vector is 333something the encoder does automatically and thus is typically of no concern 334to the user, though being aware of this feature (and not sticking a double in 335the same vector as a bunch of byte sized elements) is helpful for efficiency. 336 337Unlike FlatBuffers there is only one kind of offset, and that is an unsigned 338integer indicating the number of bytes in a negative direction from the address 339of itself (where the offset is stored). 340 341### Vectors 342 343The representation of the vector is at the core of how FlexBuffers works (since 344maps are really just a combination of 2 vectors), so it is worth starting there. 345 346As mentioned, a vector is governed by a single bit width (supplied by its 347parent). This includes the size field. For example, a vector that stores the 348integer values `1, 2, 3` is encoded as follows: 349 350 uint8_t 3, 1, 2, 3, 4, 4, 4 351 352The first `3` is the size field, and is placed before the vector (an offset 353from the parent to this vector points to the first element, not the size 354field, so the size field is effectively at index -1). 355Since this is an untyped vector `SL_VECTOR`, it is followed by 3 type 356bytes (one per element of the vector), which are always following the vector, 357and are always a uint8_t even if the vector is made up of bigger scalars. 358 359A vector may include more than one offset pointing to the same value if the 360user explicitly serializes the same offset twice. 361 362### Types 363 364A type byte is made up of 2 components (see flexbuffers.h for exact values): 365 366* 2 lower bits representing the bit-width of the child (8, 16, 32, 64). 367 This is only used if the child is accessed over an offset, such as a child 368 vector. It is ignored for inline types. 369* 6 bits representing the actual type (see flexbuffers.h). 370 371Thus, in this example `4` means 8 bit child (value 0, unused, since the value is 372in-line), type `SL_INT` (value 1). 373 374### Typed Vectors 375 376These are like the Vectors above, but omit the type bytes. The type is instead 377determined by the vector type supplied by the parent. Typed vectors are only 378available for a subset of types for which these savings can be significant, 379namely inline signed/unsigned integers (`TYPE_VECTOR_INT` / `TYPE_VECTOR_UINT`), 380floats (`TYPE_VECTOR_FLOAT`), and keys (`TYPE_VECTOR_KEY`, see below). 381 382Additionally, for scalars, there are fixed length vectors of sizes 2 / 3 / 4 383that don't store the size (`TYPE_VECTOR_INT2` etc.), for an additional savings 384in space when storing common vector or color data. 385 386### Scalars 387 388FlexBuffers supports integers (`TYPE_INT` and `TYPE_UINT`) and floats 389(`TYPE_FLOAT`), available in the bit-widths mentioned above. They can be stored 390both inline and over an offset (`TYPE_INDIRECT_*`). 391 392The offset version is useful to encode costly 64bit (or even 32bit) quantities 393into vectors / maps of smaller sizes, and to share / repeat a value multiple 394times. 395 396### Booleans and Nulls 397 398Booleans (`TYPE_BOOL`) and nulls (`TYPE_NULL`) are encoded as inlined unsigned integers. 399 400### Blobs, Strings and Keys. 401 402A blob (`TYPE_BLOB`) is encoded similar to a vector, with one difference: the 403elements are always `uint8_t`. The parent bit width only determines the width of 404the size field, allowing blobs to be large without the elements being large. 405 406Strings (`TYPE_STRING`) are similar to blobs, except they have an additional 0 407termination byte for convenience, and they MUST be UTF-8 encoded (since an 408accessor in a language that does not support pointers to UTF-8 data may have to 409convert them to a native string type). 410 411A "Key" (`TYPE_KEY`) is similar to a string, but doesn't store the size 412field. They're so named because they are used with maps, which don't care 413for the size, and can thus be even more compact. Unlike strings, keys cannot 414contain bytes of value 0 as part of their data (size can only be determined by 415`strlen`), so while you can use them outside the context of maps if you so 416desire, you're usually better off with strings. 417 418### Maps 419 420A map (`TYPE_MAP`) is like an (untyped) vector, but with 2 prefixes before the 421size field: 422 423| index | field | 424| ----: | :----------------------------------------------------------- | 425| -3 | An offset to the keys vector (may be shared between tables). | 426| -2 | Byte width of the keys vector. | 427| -1 | Size (from here on it is compatible with `TYPE_VECTOR`) | 428| 0 | Elements. | 429| Size | Types. | 430 431Since a map is otherwise the same as a vector, it can be iterated like 432a vector (which is probably faster than lookup by key). 433 434The keys vector is a typed vector of keys. Both the keys and corresponding 435values *have* to be stored in sorted order (as determined by `strcmp`), such 436that lookups can be made using binary search. 437 438The reason the key vector is a seperate structure from the value vector is 439such that it can be shared between multiple value vectors, and also to 440allow it to be treated as its own individual vector in code. 441 442An example map { foo: 13, bar: 14 } would be encoded as: 443 444 0 : uint8_t 'b', 'a', 'r', 0 445 4 : uint8_t 'f', 'o', 'o', 0 446 8 : uint8_t 2 // key vector of size 2 447 // key vector offset points here 448 9 : uint8_t 9, 6 // offsets to bar_key and foo_key 449 11: uint8_t 2, 1 // offset to key vector, and its byte width 450 13: uint8_t 2 // value vector of size 451 // value vector offset points here 452 14: uint8_t 14, 13 // values 453 16: uint8_t 4, 4 // types 454 455### The root 456 457As mentioned, the root starts at the end of the buffer. 458The last uint8_t is the width in bytes of the root (normally the parent 459determines the width, but the root has no parent). The uint8_t before this is 460the type of the root, and the bytes before that are the root value (of the 461number of bytes specified by the last byte). 462 463So for example, the integer value `13` as root would be: 464 465 uint8_t 13, 4, 1 // Value, type, root byte width. 466 467 468<br> 469