README.md
1# llhttp
2[](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI)
3
4Port of [http_parser][0] to [llparse][1].
5
6## Why?
7
8Let's face it, [http_parser][0] is practically unmaintainable. Even
9introduction of a single new method results in a significant code churn.
10
11This project aims to:
12
13* Make it maintainable
14* Verifiable
15* Improving benchmarks where possible
16
17More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY)
18
19## How?
20
21Over time, different approaches for improving [http_parser][0]'s code base
22were tried. However, all of them failed due to resulting significant performance
23degradation.
24
25This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used
26to generate the output C source file, which could be compiled and
27linked with the embedder's program (like [Node.js][7]).
28
29## Performance
30
31So far llhttp outperforms http_parser:
32
33| | input size | bandwidth | reqs/sec | time |
34|:----------------|-----------:|-------------:|-----------:|--------:|
35| **llhttp** | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s |
36| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s |
37
38llhttp is faster by approximately **156%**.
39
40## Maintenance
41
42llhttp project has about 1400 lines of TypeScript code describing the parser
43itself and around 450 lines of C code and headers providing the helper methods.
44The whole [http_parser][0] is implemented in approximately 2500 lines of C, and
45436 lines of headers.
46
47All optimizations and multi-character matching in llhttp are generated
48automatically, and thus doesn't add any extra maintenance cost. On the contrary,
49most of http_parser's code is hand-optimized and unrolled. Instead describing
50"how" it should parse the HTTP requests/responses, a maintainer should
51implement the new features in [http_parser][0] cautiously, considering
52possible performance degradation and manually optimizing the new code.
53
54## Verification
55
56The state machine graph is encoded explicitly in llhttp. The [llparse][1]
57automatically checks the graph for absence of loops and correct reporting of the
58input ranges (spans) like header names and values. In the future, additional
59checks could be performed to get even stricter verification of the llhttp.
60
61## Usage
62
63```C
64#include "stdio.h"
65#include "llhttp.h"
66#include "string.h"
67
68int handle_on_message_complete(llhttp_t* parser) {
69 fprintf(stdout, "Message completed!\n");
70 return 0;
71}
72
73int main() {
74 llhttp_t parser;
75 llhttp_settings_t settings;
76
77 /*Initialize user callbacks and settings */
78 llhttp_settings_init(&settings);
79
80 /*Set user callback */
81 settings.on_message_complete = handle_on_message_complete;
82
83 /*Initialize the parser in HTTP_BOTH mode, meaning that it will select between
84 *HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first
85 *input.
86 */
87 llhttp_init(&parser, HTTP_BOTH, &settings);
88
89 /*Parse request! */
90 const char* request = "GET / HTTP/1.1\r\n\r\n";
91 int request_len = strlen(request);
92
93 enum llhttp_errno err = llhttp_execute(&parser, request, request_len);
94 if (err == HPE_OK) {
95 fprintf(stdout, "Successfully parsed!\n");
96 } else {
97 fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err), parser.reason);
98 }
99}
100```
101For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h).
102
103## API
104
105### llhttp_settings_t
106
107The settings object contains a list of callbacks that the parser will invoke.
108
109The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser):
110
111* `on_message_begin`: Invoked when a new request/response starts.
112* `on_message_complete`: Invoked when a request/response has been completedly parsed.
113* `on_url_complete`: Invoked after the URL has been parsed.
114* `on_method_complete`: Invoked after the HTTP method has been parsed.
115* `on_version_complete`: Invoked after the HTTP version has been parsed.
116* `on_status_complete`: Invoked after the status code has been parsed.
117* `on_header_field_complete`: Invoked after a header name has been parsed.
118* `on_header_value_complete`: Invoked after a header value has been parsed.
119* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`.
120* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started.
121* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started.
122* `on_chunk_complete`: Invoked after a new chunk is received.
123* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message
124 is received on the same parser. This is not invoked for the first message of the parser.
125
126The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback):
127
128* `on_url`: Invoked when another character of the URL is received.
129* `on_status`: Invoked when another character of the status is received.
130* `on_method`: Invoked when another character of the method is received.
131 When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/`
132 of the first message.
133* `on_version`: Invoked when another character of the version is received.
134* `on_header_field`: Invoked when another character of a header name is received.
135* `on_header_value`: Invoked when another character of a header value is received.
136* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received.
137* `on_chunk_extension_value`: Invoked when another character of a extension value is received.
138
139The callback `on_headers_complete`, invoked when headers are completed, can return:
140
141* `0`: Proceed normally.
142* `1`: Assume that request/response has no body, and proceed to parsing the next message.
143* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`.
144* `-1`: Error
145* `HPE_PAUSED`: Pause the parser.
146
147### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)`
148
149Initialize the parser with specific type and user settings.
150
151### `uint8_t llhttp_get_type(llhttp_t* parser)`
152
153Returns the type of the parser.
154
155### `uint8_t llhttp_get_http_major(llhttp_t* parser)`
156
157Returns the major version of the HTTP protocol of the current request/response.
158
159### `uint8_t llhttp_get_http_minor(llhttp_t* parser)`
160
161Returns the minor version of the HTTP protocol of the current request/response.
162
163### `uint8_t llhttp_get_method(llhttp_t* parser)`
164
165Returns the method of the current request.
166
167### `int llhttp_get_status_code(llhttp_t* parser)`
168
169Returns the method of the current response.
170
171### `uint8_t llhttp_get_upgrade(llhttp_t* parser)`
172
173Returns `1` if request includes the `Connection: upgrade` header.
174
175### `void llhttp_reset(llhttp_t* parser)`
176
177Reset an already initialized parser back to the start state, preserving the
178existing parser type, callback settings, user data, and lenient flags.
179
180### `void llhttp_settings_init(llhttp_settings_t* settings)`
181
182Initialize the settings object.
183
184### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)`
185
186Parse full or partial request/response, invoking user callbacks along the way.
187
188If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts,
189and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno,
190the execution can be resumed with `llhttp_resume()` call.
191
192In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned
193after fully parsing the request/response. If the user wishes to continue parsing,
194they need to invoke `llhttp_resume_after_upgrade()`.
195
196**if this function ever returns a non-pause type error, it will continue to return
197the same error upon each successive call up until `llhttp_init()` is called.**
198
199### `llhttp_errno_t llhttp_finish(llhttp_t* parser)`
200
201This method should be called when the other side has no further bytes to
202send (e.g. shutdown of readable side of the TCP connection.)
203
204Requests without `Content-Length` and other messages might require treating
205all incoming bytes as the part of the body, up to the last byte of the
206connection.
207
208This method will invoke `on_message_complete()` callback if the
209request was terminated safely. Otherwise a error code would be returned.
210
211
212### `int llhttp_message_needs_eof(const llhttp_t* parser)`
213
214Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF.
215
216### `int llhttp_should_keep_alive(const llhttp_t* parser)`
217
218Returns `1` if there might be any other messages following the last that was
219successfully parsed.
220
221### `void llhttp_pause(llhttp_t* parser)`
222
223Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
224appropriate error reason.
225
226**Do not call this from user callbacks! User callbacks must return
227`HPE_PAUSED` if pausing is required.**
228
229### `void llhttp_resume(llhttp_t* parser)`
230
231Might be called to resume the execution after the pause in user's callback.
232
233See `llhttp_execute()` above for details.
234
235**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.**
236
237### `void llhttp_resume_after_upgrade(llhttp_t* parser)`
238
239Might be called to resume the execution after the pause in user's callback.
240See `llhttp_execute()` above for details.
241
242**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`**
243
244### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)`
245
246Returns the latest error.
247
248### `const char* llhttp_get_error_reason(const llhttp_t* parser)`
249
250Returns the verbal explanation of the latest returned error.
251
252**User callback should set error reason when returning the error. See
253`llhttp_set_error_reason()` for details.**
254
255### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)`
256
257Assign verbal description to the returned error. Must be called in user
258callbacks right before returning the errno.
259
260**`HPE_USER` error code might be useful in user callbacks.**
261
262### `const char* llhttp_get_error_pos(const llhttp_t* parser)`
263
264Returns the pointer to the last parsed byte before the returned error. The
265pointer is relative to the `data` argument of `llhttp_execute()`.
266
267**This method might be useful for counting the number of parsed bytes.**
268
269### `const char* llhttp_errno_name(llhttp_errno_t err)`
270
271Returns textual name of error code.
272
273### `const char* llhttp_method_name(llhttp_method_t method)`
274
275Returns textual name of HTTP method.
276
277### `const char* llhttp_status_name(llhttp_status_t status)`
278
279Returns textual name of HTTP status.
280
281### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)`
282
283Enables/disables lenient header value parsing (disabled by default).
284Lenient parsing disables header value token checks, extending llhttp's
285protocol support to highly non-compliant clients/server.
286
287No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when
288lenient parsing is "on".
289
290**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
291
292### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)`
293
294Enables/disables lenient handling of conflicting `Transfer-Encoding` and
295`Content-Length` headers (disabled by default).
296
297Normally `llhttp` would error when `Transfer-Encoding` is present in
298conjunction with `Content-Length`.
299
300This error is important to prevent HTTP request smuggling, but may be less desirable
301for small number of cases involving legacy servers.
302
303**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
304
305### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)`
306
307Enables/disables lenient handling of `Connection: close` and HTTP/1.0
308requests responses.
309
310Normally `llhttp` would error the HTTP request/response
311after the request/response with `Connection: close` and `Content-Length`.
312
313This is important to prevent cache poisoning attacks,
314but might interact badly with outdated and insecure clients.
315
316With this flag the extra request/response will be parsed normally.
317
318**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!**
319
320### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)`
321
322Enables/disables lenient handling of `Transfer-Encoding` header.
323
324Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value
325and another value after it (either in a single header or in multiple
326headers whose value are internally joined using `, `).
327
328This is mandated by the spec to reliably determine request body size and thus
329avoid request smuggling.
330
331With this flag the extra value will be parsed normally.
332
333**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
334
335### `void llhttp_set_lenient_version(llhttp_t* parser, int enabled)`
336
337Enables/disables lenient handling of HTTP version.
338
339Normally `llhttp` would error when the HTTP version in the request or status line
340is not `0.9`, `1.0`, `1.1` or `2.0`.
341With this flag the extra value will be parsed normally.
342
343**Enabling this flag can pose a security issue since you will allow unsupported HTTP versions. USE WITH CAUTION!**
344
345### `void llhttp_set_lenient_data_after_close(llhttp_t* parser, int enabled)`
346
347Enables/disables lenient handling of additional data received after a message ends
348and keep-alive is disabled.
349
350Normally `llhttp` would error when additional unexpected data is received if the message
351contains the `Connection` header with `close` value.
352With this flag the extra data will discarded without throwing an error.
353
354**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!**
355
356### `void llhttp_set_lenient_optional_lf_after_cr(llhttp_t* parser, int enabled)`
357
358Enables/disables lenient handling of incomplete CRLF sequences.
359
360Normally `llhttp` would error when a CR is not followed by LF when terminating the
361request line, the status line, the headers or a chunk header.
362With this flag only a CR is required to terminate such sections.
363
364**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
365
366### `void llhttp_set_lenient_optional_crlf_after_chunk(llhttp_t* parser, int enabled)`
367
368Enables/disables lenient handling of chunks not separated via CRLF.
369
370Normally `llhttp` would error when after a chunk data a CRLF is missing before
371starting a new chunk.
372With this flag the new chunk can start immediately after the previous one.
373
374**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
375
376## Build Instructions
377
378Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run:
379
380```sh
381npm install
382make
383```
384
385---
386
387### Bindings to other languages
388
389* Lua: [MunifTanjim/llhttp.lua][11]
390* Python: [pallas/pyllhttp][8]
391* Ruby: [metabahn/llhttp][9]
392* Rust: [JackLiar/rust-llhttp][10]
393
394### Using with CMake
395
396If you want to use this library in a CMake project as a shared library, you can use the snippet below.
397
398```
399FetchContent_Declare(llhttp
400 URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz")
401
402FetchContent_MakeAvailable(llhttp)
403
404# Link with the llhttp_shared target
405target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_shared ${PROJECT_NAME})
406```
407
408If you want to use this library in a CMake project as a static library, you can set some cache variables first.
409
410```
411FetchContent_Declare(llhttp
412 URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz")
413
414set(BUILD_SHARED_LIBS OFF CACHE INTERNAL "")
415set(BUILD_STATIC_LIBS ON CACHE INTERNAL "")
416FetchContent_MakeAvailable(llhttp)
417
418# Link with the llhttp_static target
419target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_static ${PROJECT_NAME})
420```
421
422_Note that using the git repo directly (e.g., via a git repo url and tag) will not work with FetchContent_Declare because [CMakeLists.txt](./CMakeLists.txt) requires string replacements (e.g., `_RELEASE_`) before it will build._
423
424## Building on Windows
425
426### Installation
427
428* `choco install git`
429* `choco install node`
430* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer)
431* `choco install make` (or if you have MinGW, it comes bundled)
432
4331. Ensure that `Clang` and `make` are in your system path.
4342. Using Git Bash, clone the repo to your preferred location.
4353. Cd into the cloned directory and run `npm install`
4365. Run `make`
4376. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries.
4387. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`.
439
440### A simple example on linking with the library:
441
442Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`.
443
444If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`.
445
446#### LICENSE
447
448This software is licensed under the MIT License.
449
450Copyright Fedor Indutny, 2018.
451
452Permission is hereby granted, free of charge, to any person obtaining a
453copy of this software and associated documentation files (the
454"Software"), to deal in the Software without restriction, including
455without limitation the rights to use, copy, modify, merge, publish,
456distribute, sublicense, and/or sell copies of the Software, and to permit
457persons to whom the Software is furnished to do so, subject to the
458following conditions:
459
460The above copyright notice and this permission notice shall be included
461in all copies or substantial portions of the Software.
462
463THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
464OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
465MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
466NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
467DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
468OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
469USE OR OTHER DEALINGS IN THE SOFTWARE.
470
471[0]: https://github.com/nodejs/http-parser
472[1]: https://github.com/nodejs/llparse
473[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling
474[3]: https://en.wikipedia.org/wiki/Tail_call
475[4]: https://llvm.org/docs/LangRef.html
476[5]: https://llvm.org/docs/LangRef.html#call-instruction
477[6]: https://clang.llvm.org/
478[7]: https://github.com/nodejs/node
479[8]: https://github.com/pallas/pyllhttp
480[9]: https://github.com/metabahn/llhttp
481[10]: https://github.com/JackLiar/rust-llhttp
482[11]: https://github.com/MunifTanjim/llhttp.lua
483