• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# RFC8949 CBOR Stream Parsing and Writing
2
3|||
4|---|---|---|
5|cmake| `LWS_WITH_CBOR`, `LWS_WITH_CBOR_FLOAT`|
6|Header| ./include/libwebsockets/lws-lecp.h|
7|api-test| ./minimal-examples/api-tests/api-test-lecp/|
8|test app| ./test-apps/test-lecp.c -> libwebsockets-test-lecp|
9
10LECP is the RFC8949 CBOR stream parsing counterpart to LEJP for JSON.
11
12## Features
13
14 - Completely immune to input fragmentation, give it any size blocks of CBOR as
15   they become available; 1 byte, or 100K at a time give identical parsing
16   results
17 - Input chunks discarded as they are parsed, whole CBOR never needed in memory
18 - Nonrecursive, fixed stack usage of a few dozen bytes
19 - No heap allocations at all, just requires ~500 byte context usually on
20   caller stack
21 - Creates callbacks to a user-provided handler as members are parsed out
22 - No payload size limit, supports huge / endless strings or blobs bigger than
23   system memory
24 - Collates utf-8 text and blob payloads into a 250-byte chunk buffer for ease
25   of access
26 - Write apis don't use any heap allocations or recursion either
27 - Write apis use an explicit context with its own lifecycle, and printf style
28   vaargs including sized blobs, C strings, double, int, unsigned long etc
29 - Completely immune to output fragmentation, supports huge strings and blobs
30   into small buffers, api returns to indicates unfinished if it needs to be
31   called again to continue; 1 byte or 100K output buffer give same results
32 - Write apis completely fill available buffer and if unfinished, continues
33   into same or different buffer when called again with same args; no
34   requirement for subsequent calls to be done sequentially or even from same
35   function
36
37## Type limits
38
39CBOR allows negative integers of up to 64 bits, these do not fit into a `uint64_t`.
40LECP has a union for numbers that includes the types `uint64_t` and `int64_t`,
41but it does not separately handle negative integers.  Only -2^63.. 2^64 -1 can
42be handled by the C types, the oversize negative numbers wrap and should be
43avoided.
44
45## Floating point support
46
47Floats are handled using the IEEE memory format, it means they can be parsed
48from the CBOR without needing any floating point support in the build.  If
49floating point is available, you can also enable `LWS_WITH_CBOR_FLOAT` and
50a `float` and `double` types are available in the number item union.  Otherwise
51these are handled as `ctx->item.u.u32` and `ctx->item.u.u64` union members.
52
53Half-float (16-bit) is defined in CBOR and always handled as a `uint16_t`
54number union member `ctx->item.u.hf`.
55
56## Callback reasons
57
58The user callback does not have to handle any callbacks, it only needs to
59process the data for the ones it is interested in.
60
61|Callback reason|CBOR structure|Associated data|
62|---|---|---|
63|`LECPCB_CONSTRUCTED`|Created the parse context||
64|`LECPCB_DESTRUCTED`|Destroyed the parse context||
65|`LECPCB_COMPLETE`|The parsing completed OK||
66|`LECPCB_FAILED`|The parsing failed||
67|`LECPCB_VAL_TRUE`|boolean true||
68|`LECPCB_VAL_FALSE`|boolean false||
69|`LECPCB_VAL_NULL`|explicit NULL||
70|`LECPCB_VAL_NUM_INT`|signed integer|`ctx->item.u.i64`|
71|`LECPCB_VAL_STR_START`|A UTF-8 string is starting||
72|`LECPCB_VAL_STR_CHUNK`|The next string chunk|`ctx->npos` bytes in `ctx->buf`|
73|`LECPCB_VAL_STR_END`|The last string chunk|`ctx->npos` bytes in `ctx->buf`|
74|`LECPCB_ARRAY_START`|An array is starting||
75|`LECPCB_ARRAY_END`|An array has ended||
76|`LECPCB_OBJECT_START`|A CBOR map is starting||
77|`LECPCB_OBJECT_END`|A CBOR map has ended||
78|`LECPCB_TAG_START`|The following data has a tag index|`ctx->item.u.u64`|
79|`LECPCB_TAG_END`|The end of the data referenced by the last tag||
80|`LECPCB_VAL_NUM_UINT`|Unsigned integer|`ctx->item.u.u64`|
81|`LECPCB_VAL_UNDEFINED`|CBOR undefined||
82|`LECPCB_VAL_FLOAT16`|half-float available as host-endian `uint16_t`|`ctx->item.u.hf`|
83|`LECPCB_VAL_FLOAT32`|`float` (`uint32_t` if no float support) available|`ctx->item.u.f`|
84|`LECPCB_VAL_FLOAT64`|`double` (`uint64_t` if no float support) available|`ctx->item.u.d`|
85|`LECPCB_VAL_SIMPLE`|CBOR simple|`ctx->item.u.u64`|
86|`LECPCB_VAL_BLOB_START`|A binary blob is starting||
87|`LECPCB_VAL_BLOB_CHUNK`|The next blob chunk|`ctx->npos` bytes in `ctx->buf`|
88|`LECPCB_VAL_BLOB_END`|The last blob chunk|`ctx->npos` bytes in `ctx->buf`|
89|`LECPCB_ARRAY_ITEM_START`|A logical item in an array is starting|
90|`LCEPDB_ARRAY_ITEM_END`|A logical item in an array has completed|
91
92## CBOR indeterminite lengths
93
94Indeterminite lengths are supported, but are concealed in the parser as far as
95possible, the CBOR lengths or its indeterminacy are not exposed in the callback
96interface at all, just chunks of data that may be the start, the middle, or the
97end.
98
99## Handling CBOR UTF-8 strings and blobs
100
101When a string or blob is parsed, an advisory callback of `LECPCB_VAL_STR_START` or
102`LECPCB_VAL_BLOB_START` occurs first.  The `_STR_` callbacks indicate the
103content is a CBOR UTF-8 string, `_BLOB_` indicates it is binary data.
104
105Strings or blobs may have indeterminite length, but if so, they are composed
106of logical chunks which must have known lengths.  When the `_START` callback
107occurs, the logical length either of the whole string, or of the sub-chunk if
108indeterminite length, can be found in `ctx->item.u.u64`.
109
110Payload is collated into `ctx->buf[]`, the valid length is in `ctx->npos`.
111
112For short strings or blobs where the length is known, the whole payload is
113delivered in a single `LECPCB_VAL_STR_END` or `LECPCB_VAL_BLOB_END` callback.
114
115For payloads larger than the size of `ctx->buf[]`, `LECPCB_VAL_STR_CHUNK` or
116`LECPCB_VAL_BLOB_CHUNK` callbacks occur delivering each sequential bufferload.
117If the CBOR indicates the total length, the last chunk is delievered in a
118`LECPCB_VAL_STR_END` or `LECPCB_VAL_BLOB_END`.
119
120If the CBOR indicates the string end after the chunk, a zero-length `..._END`
121callback is provided.
122
123## Handling CBOR tags
124
125CBOR tags are exposed as `LECPCB_TAG_START` and `LECPCB_TAG_END` pairs, at
126the `_START` callback the tag index is available in `ctx->item.u.u64`.
127
128## CBOR maps
129
130You can check if you are on the "key" part of a map "key:value" pair using the
131helper api `lecp_parse_map_is_key(ctx)`.
132
133## Parsing paths
134
135LECP maintains a "parsing path" in `ctx->path` that represents the context of
136the callback events.  As a convenience, at LECP context creation time, you can
137pass in an array of path strings you want to match on, and have any match
138checkable in the callback using `ctx->path_match`, it's 0 if no active match,
139or the match index from your path array starting from 1 for the first entry.
140
141|CBOR element|Representation in path|
142|---|---|
143|CBOR Array|`[]`|
144|CBOR Map|`.`|
145|CBOR Map entry key string|`keystring`|
146
147## Accessing raw CBOR subtrees
148
149Some CBOR usages like COSE require access to selected raw CBOR from the input
150stream.  `lecp_parse_report_raw(ctx, on)` lets you turn on and off buffering of
151raw CBOR and reporting it in the parse callback with `LECPCB_LITERAL_CBOR`
152callbacks.  The callbacks mean the temp buffer `ctx->cbor[]` has `ctx->cbor_pos`
153bytes of raw CBOR available in it.  Callbacks are triggered when the buffer
154fills, or reporting is turned off and the buffer has something in it.
155
156By turning the reporting on and off according to the outer CBOR parsing state,
157it's possible to get exactly the raw CBOR subtree that's needed.
158
159Capturing and reporting the raw CBOR does not change that the same CBOR is being
160passed to the parser as usual as well.
161
162## Comparison with LEJP (JSON parser)
163
164LECP is based on the same principles as LEJP and shares most of the callbacks.
165The major differences:
166
167 - LEJP value callbacks all appear in `ctx->buf[]`, ie, floating-point is
168   provided to the callback in ascii form like `"1.0"`.  CBOR provides a more
169   strict typing system, and the different type values are provided either in
170   `ctx->buf[]` for blobs or utf-8 text strtings, or the `item.u` union for
171   converted types, with additional callback reasons specific to each type.
172
173 - CBOR "maps" use `_OBJECT_START` and `_END` parsing callbacks around the
174   key / value pairs.  LEJP has a special callback type `PAIR_NAME` for the
175   key string / integer, but in LECP these are provided as generic callbacks
176   dependent on type, ie, generic string callbacks or integer ones, and the
177   value part is represented according to whatever comes.
178
179
180# Writing CBOR
181
182CBOR is written into a `lws_lec_pctx_t` object that has been initialized to
183point to an output buffer of a specified size, using printf type formatting.
184
185Output is paused if the buffer fills, and the write api may be called again
186later with the same context object, to resume emitting to the same or different
187buffer.
188
189This allows bufferloads of encoded CBOR to be produced on demand, it's designed
190to fit usage in WRITEABLE callbacks and Secure Streams tx() callbacks where the
191buffer size for one packet is already fixed.
192
193CBOR array and map lengths are deduced from the format string, as is whether to
194use indeterminite length formatting or not.  For indeterminite text or binary
195strings, a container of < >
196
197|Format|Arg(s)|Meaning|
198|---|---|---|
199|`123`||unsigned literal number|
200|`-123`||signed literal number|
201|`%u`|`unsigned int`|number|
202|`%lu`|`unsigned long int`|number|
203|`%llu`|`unsigned long long int`|number|
204|`%d`|`signed int`|number|
205|`%ld`|`signed long int`|number|
206|`%lld`|`signed long long int`|number|
207|`%f`|`double`|floating point number|
208|`123(...)`||literal tag and scope|
209|`%t(...)`|`unsigned int`|tag and scope|
210|`%lt(...)`|`unsigned long int`|tag and scope|
211|`%llt(...)`|`unsigned long long int`|tag and scope|
212|`[...]`||Array (fixed len if `]` in same format string)|
213|`{...}`||Map (fixed len if `}` in same format string)|
214|`<t...>`||Container for indeterminite text string frags|
215|`<b...>`||Container for indeterminite binary string frags|
216|`'string'`||Literal text of known length|
217|`%s`|`const char *`|NUL-terminated string|
218|`%.*s`|`int`, `const char *`|length-specified string|
219|`%.*b`|`int`, `const uint8_t *`|length-specified binary|
220|`:`||separator between Map items (a:b)|
221|`,`||separator between Map pairs or array items|
222
223Backslash is used as an escape in `'...'` literal strings, so `'\\'` represents
224a string consisting of a single backslash, and `'\''` a string consisting of a
225single single-quote.
226
227For integers, various natural C types are available, but in all cases, the
228number is represented in CBOR using the smallest valid way based on its value,
229the long or long-long modifiers just apply to the expected C type in the args.
230
231For floats, the C argument is always expected to be a `double` type following
232C type promotion, but again it is represented in CBOR using the smallest valid
233way based on value, half-floats are used for NaN / Infinity and where possible
234for values like 0.0 and -1.0.
235
236## Examples
237
238### Literal ints
239
240```
241	uint8_t buf[128];
242	lws_lec_pctx_t cbw;
243
244	lws_lec_init(&cbw, buf, sizeof(buf));
245	lws_lec_printf(ctx, "-1");
246```
247|||
248|---|---|
249|Return| `LWS_LECPCTX_RET_FINISHED`|
250|`ctx->used`|1|
251|`buf[]`|20|
252
253### Dynamic ints
254
255```
256	uint8_t buf[128];
257	lws_lec_pctx_t cbw;
258	int n = -1; /* could be long */
259
260	lws_lec_init(&cbw, buf, sizeof(buf));
261	lws_lec_printf(ctx, "%d", n); /* use %ld for long */
262```
263|||
264|---|---|
265|Return| `LWS_LECPCTX_RET_FINISHED`|
266|`ctx->used`|1|
267|`buf[]`|20|
268
269### Maps, arrays and dynamic ints
270
271```
272	...
273	int args[3] = { 1, 2, 3 };
274
275	lws_lec_printf(ctx, "{'a':%d,'b':[%d,%d]}", args[0], args[1], args[2]);
276```
277
278|||
279|---|---|
280|Return| `LWS_LECPCTX_RET_FINISHED`|
281|`ctx->used`|9|
282|`buf[]`|A2 61 61 01 61 62 82 02 03|
283
284### String longer than the buffer
285
286Using `%s` and the same string as an arg gives same results
287
288```
289	uint8_t buf[16];
290	lws_lec_pctx_t cbw;
291
292	lws_lec_init(&cbw, buf, sizeof(buf));
293	lws_lec_printf(ctx, "'A literal string > one buf'");
294	/* not required to be in same function context or same buf,
295	 * but the string must remain the same */
296	lws_lec_setbuf(&cbw, buf, sizeof(buf));
297	lws_lec_printf(ctx, "'A literal string > one buf'");
298```
299
300First call
301
302|||
303|---|---|
304|Return| `LWS_LECPCTX_RET_AGAIN`|
305|`ctx->used`|16|
306|`buf[]`|78 1A 41 20 6C 69 74 65 72 61 6C 20 73 74 72 69|
307
308Second call
309
310|||
311|---|---|
312|Return| `LWS_LECPCTX_RET_FINISHED`|
313|`ctx->used`|12|
314|`buf[]`|6E 67 20 3E 20 6F 6E 65 20 62 75 66|
315
316### Binary blob longer than the buffer
317
318```
319	uint8_t buf[16], blob[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
320	lws_lec_pctx_t cbw;
321
322	lws_lec_init(&cbw, buf, sizeof(buf));
323	lws_lec_printf(ctx, "%.*b", (int)sizeof(blob), blob);
324	/* not required to be in same function context or same buf,
325	 * but the length and blob must remain the same */
326	lws_lec_setbuf(&cbw, buf, sizeof(buf));
327	lws_lec_printf(ctx, "%.*b", (int)sizeof(blob), blob);
328```
329
330First call
331
332|||
333|---|---|
334|Return| `LWS_LECPCTX_RET_AGAIN`|
335|`ctx->used`|16|
336|`buf[]`|52 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F|
337
338Second call
339
340|||
341|---|---|
342|Return| `LWS_LECPCTX_RET_FINISHED`|
343|`ctx->used`|3|
344|`buf[]`|10 11 12|
345