README.md
1# llhttp
2[![CI](https://github.com/nodejs/llhttp/workflows/CI/badge.svg)](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI)
3
4Port of [http_parser][0] to [llparse][1].
5
6## Why?
7
8Let's face it, [http_parser][0] is practically unmaintainable. Even
9introduction of a single new method results in a significant code churn.
10
11This project aims to:
12
13* Make it maintainable
14* Verifiable
15* Improving benchmarks where possible
16
17More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY)
18
19## How?
20
21Over time, different approaches for improving [http_parser][0]'s code base
22were tried. However, all of them failed due to resulting significant performance
23degradation.
24
25This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used
26to generate the output C source file, which could be compiled and
27linked with the embedder's program (like [Node.js][7]).
28
29## Performance
30
31So far llhttp outperforms http_parser:
32
33| | input size | bandwidth | reqs/sec | time |
34|:----------------|-----------:|-------------:|-----------:|--------:|
35| **llhttp** | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s |
36| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s |
37
38llhttp is faster by approximately **156%**.
39
40## Maintenance
41
42llhttp project has about 1400 lines of TypeScript code describing the parser
43itself and around 450 lines of C code and headers providing the helper methods.
44The whole [http_parser][0] is implemented in approximately 2500 lines of C, and
45436 lines of headers.
46
47All optimizations and multi-character matching in llhttp are generated
48automatically, and thus doesn't add any extra maintenance cost. On the contrary,
49most of http_parser's code is hand-optimized and unrolled. Instead describing
50"how" it should parse the HTTP requests/responses, a maintainer should
51implement the new features in [http_parser][0] cautiously, considering
52possible performance degradation and manually optimizing the new code.
53
54## Verification
55
56The state machine graph is encoded explicitly in llhttp. The [llparse][1]
57automatically checks the graph for absence of loops and correct reporting of the
58input ranges (spans) like header names and values. In the future, additional
59checks could be performed to get even stricter verification of the llhttp.
60
61## Usage
62
63```C
64#include "llhttp.h"
65
66llhttp_t parser;
67llhttp_settings_t settings;
68
69/* Initialize user callbacks and settings */
70llhttp_settings_init(&settings);
71
72/* Set user callback */
73settings.on_message_complete = handle_on_message_complete;
74
75/* Initialize the parser in HTTP_BOTH mode, meaning that it will select between
76 * HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first
77 * input.
78 */
79llhttp_init(&parser, HTTP_BOTH, &settings);
80
81/* Parse request! */
82const char* request = "GET / HTTP/1.1\r\n\r\n";
83int request_len = strlen(request);
84
85enum llhttp_errno err = llhttp_execute(&parser, request, request_len);
86if (err == HPE_OK) {
87 /* Successfully parsed! */
88} else {
89 fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err),
90 parser.reason);
91}
92```
93For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h).
94
95## API
96
97### llhttp_settings_t
98
99The settings object contains a list of callbacks that the parser will invoke.
100
101The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser):
102
103* `on_message_begin`: Invoked when a new request/response starts.
104* `on_message_complete`: Invoked when a request/response has been completedly parsed.
105* `on_url_complete`: Invoked after the URL has been parsed.
106* `on_method_complete`: Invoked after the HTTP method has been parsed.
107* `on_version_complete`: Invoked after the HTTP version has been parsed.
108* `on_status_complete`: Invoked after the status code has been parsed.
109* `on_header_field_complete`: Invoked after a header name has been parsed.
110* `on_header_value_complete`: Invoked after a header value has been parsed.
111* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`.
112* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started.
113* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started.
114* `on_chunk_complete`: Invoked after a new chunk is received.
115* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message
116 is received on the same parser. This is not invoked for the first message of the parser.
117
118The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback):
119
120* `on_url`: Invoked when another character of the URL is received.
121* `on_status`: Invoked when another character of the status is received.
122* `on_method`: Invoked when another character of the method is received.
123 When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/`
124 of the first message.
125* `on_version`: Invoked when another character of the version is received.
126* `on_header_field`: Invoked when another character of a header name is received.
127* `on_header_value`: Invoked when another character of a header value is received.
128* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received.
129* `on_chunk_extension_value`: Invoked when another character of a extension value is received.
130
131The callback `on_headers_complete`, invoked when headers are completed, can return:
132
133* `0`: Proceed normally.
134* `1`: Assume that request/response has no body, and proceed to parsing the next message.
135* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`.
136* `-1`: Error
137* `HPE_PAUSED`: Pause the parser.
138
139### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)`
140
141Initialize the parser with specific type and user settings.
142
143### `uint8_t llhttp_get_type(llhttp_t* parser)`
144
145Returns the type of the parser.
146
147### `uint8_t llhttp_get_http_major(llhttp_t* parser)`
148
149Returns the major version of the HTTP protocol of the current request/response.
150
151### `uint8_t llhttp_get_http_minor(llhttp_t* parser)`
152
153Returns the minor version of the HTTP protocol of the current request/response.
154
155### `uint8_t llhttp_get_method(llhttp_t* parser)`
156
157Returns the method of the current request.
158
159### `int llhttp_get_status_code(llhttp_t* parser)`
160
161Returns the method of the current response.
162
163### `uint8_t llhttp_get_upgrade(llhttp_t* parser)`
164
165Returns `1` if request includes the `Connection: upgrade` header.
166
167### `void llhttp_reset(llhttp_t* parser)`
168
169Reset an already initialized parser back to the start state, preserving the
170existing parser type, callback settings, user data, and lenient flags.
171
172### `void llhttp_settings_init(llhttp_settings_t* settings)`
173
174Initialize the settings object.
175
176### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)`
177
178Parse full or partial request/response, invoking user callbacks along the way.
179
180If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts,
181and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno,
182the execution can be resumed with `llhttp_resume()` call.
183
184In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned
185after fully parsing the request/response. If the user wishes to continue parsing,
186they need to invoke `llhttp_resume_after_upgrade()`.
187
188**if this function ever returns a non-pause type error, it will continue to return
189the same error upon each successive call up until `llhttp_init()` is called.**
190
191### `llhttp_errno_t llhttp_finish(llhttp_t* parser)`
192
193This method should be called when the other side has no further bytes to
194send (e.g. shutdown of readable side of the TCP connection.)
195
196Requests without `Content-Length` and other messages might require treating
197all incoming bytes as the part of the body, up to the last byte of the
198connection.
199
200This method will invoke `on_message_complete()` callback if the
201request was terminated safely. Otherwise a error code would be returned.
202
203
204### `int llhttp_message_needs_eof(const llhttp_t* parser)`
205
206Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF.
207
208### `int llhttp_should_keep_alive(const llhttp_t* parser)`
209
210Returns `1` if there might be any other messages following the last that was
211successfully parsed.
212
213### `void llhttp_pause(llhttp_t* parser)`
214
215Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
216appropriate error reason.
217
218**Do not call this from user callbacks! User callbacks must return
219`HPE_PAUSED` if pausing is required.**
220
221### `void llhttp_resume(llhttp_t* parser)`
222
223Might be called to resume the execution after the pause in user's callback.
224
225See `llhttp_execute()` above for details.
226
227**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.**
228
229### `void llhttp_resume_after_upgrade(llhttp_t* parser)`
230
231Might be called to resume the execution after the pause in user's callback.
232See `llhttp_execute()` above for details.
233
234**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`**
235
236### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)`
237
238Returns the latest error.
239
240### `const char* llhttp_get_error_reason(const llhttp_t* parser)`
241
242Returns the verbal explanation of the latest returned error.
243
244**User callback should set error reason when returning the error. See
245`llhttp_set_error_reason()` for details.**
246
247### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)`
248
249Assign verbal description to the returned error. Must be called in user
250callbacks right before returning the errno.
251
252**`HPE_USER` error code might be useful in user callbacks.**
253
254### `const char* llhttp_get_error_pos(const llhttp_t* parser)`
255
256Returns the pointer to the last parsed byte before the returned error. The
257pointer is relative to the `data` argument of `llhttp_execute()`.
258
259**This method might be useful for counting the number of parsed bytes.**
260
261### `const char* llhttp_errno_name(llhttp_errno_t err)`
262
263Returns textual name of error code.
264
265### `const char* llhttp_method_name(llhttp_method_t method)`
266
267Returns textual name of HTTP method.
268
269### `const char* llhttp_status_name(llhttp_status_t status)`
270
271Returns textual name of HTTP status.
272
273### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)`
274
275Enables/disables lenient header value parsing (disabled by default).
276Lenient parsing disables header value token checks, extending llhttp's
277protocol support to highly non-compliant clients/server.
278
279No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when
280lenient parsing is "on".
281
282**USE AT YOUR OWN RISK!**
283
284### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)`
285
286Enables/disables lenient handling of conflicting `Transfer-Encoding` and
287`Content-Length` headers (disabled by default).
288
289Normally `llhttp` would error when `Transfer-Encoding` is present in
290conjunction with `Content-Length`.
291
292This error is important to prevent HTTP request smuggling, but may be less desirable
293for small number of cases involving legacy servers.
294
295**USE AT YOUR OWN RISK!**
296
297### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)`
298
299Enables/disables lenient handling of `Connection: close` and HTTP/1.0
300requests responses.
301
302Normally `llhttp` would error on (in strict mode) or discard (in loose mode)
303the HTTP request/response after the request/response with `Connection: close`
304and `Content-Length`.
305
306This is important to prevent cache poisoning attacks,
307but might interact badly with outdated and insecure clients.
308
309With this flag the extra request/response will be parsed normally.
310
311**USE AT YOUR OWN RISK!**
312
313### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)`
314
315Enables/disables lenient handling of `Transfer-Encoding` header.
316
317Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value
318and another value after it (either in a single header or in multiple
319headers whose value are internally joined using `, `).
320
321This is mandated by the spec to reliably determine request body size and thus
322avoid request smuggling.
323
324With this flag the extra value will be parsed normally.
325
326**USE AT YOUR OWN RISK!**
327
328## Build Instructions
329
330Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run:
331
332```sh
333npm install
334make
335```
336
337---
338
339### Bindings to other languages
340
341* Lua: [MunifTanjim/llhttp.lua][11]
342* Python: [pallas/pyllhttp][8]
343* Ruby: [metabahn/llhttp][9]
344* Rust: [JackLiar/rust-llhttp][10]
345
346### Using with CMake
347
348If you want to use this library in a CMake project you can use the snippet below.
349
350```
351FetchContent_Declare(llhttp
352 URL "https://github.com/nodejs/llhttp/archive/refs/tags/v6.0.5.tar.gz") # Using version 6.0.5
353
354FetchContent_MakeAvailable(llhttp)
355
356target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp ${PROJECT_NAME})
357```
358
359## Building on Windows
360
361### Installation
362
363* `choco install git`
364* `choco install node`
365* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer)
366* `choco install make` (or if you have MinGW, it comes bundled)
367
3681. Ensure that `Clang` and `make` are in your system path.
3692. Using Git Bash, clone the repo to your preferred location.
3703. Cd into the cloned directory and run `npm install`
3715. Run `make`
3726. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries.
3737. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`.
374
375### A simple example on linking with the library:
376
377Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`.
378
379If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`.
380
381#### LICENSE
382
383This software is licensed under the MIT License.
384
385Copyright Fedor Indutny, 2018.
386
387Permission is hereby granted, free of charge, to any person obtaining a
388copy of this software and associated documentation files (the
389"Software"), to deal in the Software without restriction, including
390without limitation the rights to use, copy, modify, merge, publish,
391distribute, sublicense, and/or sell copies of the Software, and to permit
392persons to whom the Software is furnished to do so, subject to the
393following conditions:
394
395The above copyright notice and this permission notice shall be included
396in all copies or substantial portions of the Software.
397
398THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
399OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
400MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
401NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
402DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
403OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
404USE OR OTHER DEALINGS IN THE SOFTWARE.
405
406[0]: https://github.com/nodejs/http-parser
407[1]: https://github.com/nodejs/llparse
408[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling
409[3]: https://en.wikipedia.org/wiki/Tail_call
410[4]: https://llvm.org/docs/LangRef.html
411[5]: https://llvm.org/docs/LangRef.html#call-instruction
412[6]: https://clang.llvm.org/
413[7]: https://github.com/nodejs/node
414[8]: https://github.com/pallas/pyllhttp
415[9]: https://github.com/metabahn/llhttp
416[10]: https://github.com/JackLiar/rust-llhttp
417[11]: https://github.com/MunifTanjim/llhttp.lua
418