• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Binary formats
2
3![conversion between JSON and binary formats](images/binary.png)
4
5Several formats exist that encode JSON values in a binary format to reduce the size of the encoded value as well as the required effort to parse encoded value. The library implements three formats, namely
6
7- [CBOR](https://tools.ietf.org/html/rfc7049) (Concise Binary Object Representation)
8- [MessagePack](https://msgpack.org)
9- [UBJSON](http://ubjson.org) (Universal Binary JSON)
10
11## Interface
12
13### JSON to binary format
14
15For each format, the `to_*` functions (i.e., `to_cbor`, `to_msgpack`, and `to_ubjson`) convert a JSON value into the respective binary format. Taking CBOR as example, the concrete prototypes are:
16
17```cpp
18static std::vector<uint8_t> to_cbor(const basic_json& j);                    // 1
19static void to_cbor(const basic_json& j, detail::output_adapter<uint8_t> o); // 2
20static void to_cbor(const basic_json& j, detail::output_adapter<char> o);    // 3
21```
22
23The first function creates a byte vector from the given JSON value. The second and third function writes to an output adapter of `uint8_t` and `char`, respectively. Output adapters are implemented for strings, output streams, and vectors.
24
25Given a JSON value `j`, the following calls are possible:
26
27```cpp
28std::vector<uint8_t> v;
29v = json::to_cbor(j);   // 1
30
31json::to_cbor(j, v);    // 2
32
33std::string s;
34json::to_cbor(j, s);    // 3
35
36std::ostringstream oss;
37json::to_cbor(j, oss);  // 3
38```
39
40### Binary format to JSON
41
42Likewise, the `from_*` functions (i.e, `from_cbor`, `from_msgpack`, and `from_ubjson`) convert a binary encoded value into a JSON value. Taking CBOR as example, the concrete prototypes are:
43
44```cpp
45static basic_json from_cbor(detail::input_adapter i, const bool strict = true); // 1
46static basic_json from_cbor(A1 && a1, A2 && a2, const bool strict = true);      // 2
47```
48
49Both functions read from an input adapter: the first function takes it directly form argument `i`, whereas the second function creates it from the provided arguments `a1` and `a2`. If the optional parameter `strict` is true, the input must be read completely (or a parse error exception is thrown). If it is false, parsing succeeds even if the input is not completely read.
50
51Input adapters are implemented for input streams, character buffers, string literals, and iterator ranges.
52
53Given several inputs (which we assume to be filled with a CBOR value), the following calls are possible:
54
55```cpp
56std::string s;
57json j1 = json::from_cbor(s);                         // 1
58
59std::ifstream is("somefile.cbor", std::ios::binary);
60json j2 = json::from_cbor(is);                        // 1
61
62std::vector<uint8_t> v;
63json j3 = json::from_cbor(v);                         // 1
64
65const char* buff;
66std::size_t buff_size;
67json j4 = json::from_cbor(buff, buff_size);           // 2
68```
69
70## Details
71
72### CBOR
73
74The mapping from CBOR to JSON is **incomplete** in the sense that not all CBOR types can be converted to a JSON value. The following CBOR types are not supported and will yield parse errors (parse_error.112):
75
76- byte strings (0x40..0x5F)
77- date/time (0xC0..0xC1)
78- bignum (0xC2..0xC3)
79- decimal fraction (0xC4)
80- bigfloat (0xC5)
81- tagged items (0xC6..0xD4, 0xD8..0xDB)
82- expected conversions (0xD5..0xD7)
83- simple values (0xE0..0xF3, 0xF8)
84- undefined (0xF7)
85
86CBOR further allows map keys of any type, whereas JSON only allows strings as keys in object values. Therefore, CBOR maps with keys other than UTF-8 strings are rejected (parse_error.113).
87
88The mapping from JSON to CBOR is **complete** in the sense that any JSON value type can be converted to a CBOR value.
89
90If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the dump() function which serializes NaN or Infinity to null.
91
92The following CBOR types are not used in the conversion:
93
94- byte strings (0x40..0x5F)
95- UTF-8 strings terminated by "break" (0x7F)
96- arrays terminated by "break" (0x9F)
97- maps terminated by "break" (0xBF)
98- date/time (0xC0..0xC1)
99- bignum (0xC2..0xC3)
100- decimal fraction (0xC4)
101- bigfloat (0xC5)
102- tagged items (0xC6..0xD4, 0xD8..0xDB)
103- expected conversions (0xD5..0xD7)
104- simple values (0xE0..0xF3, 0xF8)
105- undefined (0xF7)
106- half and single-precision floats (0xF9-0xFA)
107- break (0xFF)
108
109### MessagePack
110
111The mapping from MessagePack to JSON is **incomplete** in the sense that not all MessagePack types can be converted to a JSON value. The following MessagePack types are not supported and will yield parse errors:
112
113- bin 8 - bin 32 (0xC4..0xC6)
114- ext 8 - ext 32 (0xC7..0xC9)
115- fixext 1 - fixext 16 (0xD4..0xD8)
116
117The mapping from JSON to MessagePack is **complete** in the sense that any JSON value type can be converted to a MessagePack value.
118
119The following values can not be converted to a MessagePack value:
120
121- strings with more than 4294967295 bytes
122- arrays with more than 4294967295 elements
123- objects with more than 4294967295 elements
124
125The following MessagePack types are not used in the conversion:
126
127- bin 8 - bin 32 (0xC4..0xC6)
128- ext 8 - ext 32 (0xC7..0xC9)
129- float 32 (0xCA)
130- fixext 1 - fixext 16 (0xD4..0xD8)
131
132Any MessagePack output created `to_msgpack` can be successfully parsed by `from_msgpack`.
133
134If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the `dump()` function which serializes NaN or Infinity to `null`.
135
136### UBJSON
137
138The mapping from UBJSON to JSON is **complete** in the sense that any UBJSON value can be converted to a JSON value.
139
140The mapping from JSON to UBJSON is **complete** in the sense that any JSON value type can be converted to a UBJSON value.
141
142The following values can not be converted to a UBJSON value:
143
144- strings with more than 9223372036854775807 bytes (theoretical)
145- unsigned integer numbers above 9223372036854775807
146
147The following markers are not used in the conversion:
148
149- `Z`: no-op values are not created.
150- `C`: single-byte strings are serialized with S markers.
151
152Any UBJSON output created to_ubjson can be successfully parsed by from_ubjson.
153
154If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the `dump()` function which serializes NaN or Infinity to null.
155
156The optimized formats for containers are supported: Parameter `use_size` adds size information to the beginning of a container and removes the closing marker. Parameter `use_type` further checks whether all elements of a container have the same type and adds the type marker to the beginning of the container. The `use_type` parameter must only be used together with `use_size = true`. Note that `use_size = true` alone may result in larger representations - the benefit of this parameter is that the receiving side is immediately informed on the number of elements of the container.
157
158## Size comparison examples
159
160The following table shows the size compared to the original JSON value for different files from the repository for the different formats.
161
162| format                  | sample.json | all_unicode.json | floats.json | signed_ints.json | jeopardy.json | canada.json |
163| ----------------------- | -----------:| ----------------:| -----------:| ----------------:| -------------:| -----------:|
164| JSON                    |    100.00 % |         100.00 % |    100.00 % |         100.00 % |      100.00 % |    100.00 % |
165| CBOR                    |     87.21 % |          71.18 % |     48.20 % |          44.16 % |       87.96 % |     50.53 % |
166| MessagePack             |     87.16 % |          71.18 % |     48.20 % |          44.16 % |       87.91 % |     50.56 % |
167| UBJSON unoptimized      |     88.15 % |         100.00 % |     48.20 % |          44.16 % |       96.58 % |     53.20 % |
168| UBJSON size-optimized   |     89.26 % |         100.00 % |     48.20 % |          44.16 % |       97.40 % |     58.56 % |
169| UBJSON format-optimized |     89.45 % |         100.00 % |     42.85 % |          39.26 % |       94.96 % |     55.93 % |
170
171The results show that there does not exist a "best" encoding. Furthermore, it is not always worthwhile to use UBJSON's optimizations.
172