1# llhttp 2[](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI) 3 4Port of [http_parser][0] to [llparse][1]. 5 6## Why? 7 8Let's face it, [http_parser][0] is practically unmaintainable. Even 9introduction of a single new method results in a significant code churn. 10 11This project aims to: 12 13* Make it maintainable 14* Verifiable 15* Improving benchmarks where possible 16 17More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY) 18 19## How? 20 21Over time, different approaches for improving [http_parser][0]'s code base 22were tried. However, all of them failed due to resulting significant performance 23degradation. 24 25This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used 26to generate the output C source file, which could be compiled and 27linked with the embedder's program (like [Node.js][7]). 28 29## Performance 30 31So far llhttp outperforms http_parser: 32 33| | input size | bandwidth | reqs/sec | time | 34|:----------------|-----------:|-------------:|-----------:|--------:| 35| **llhttp** | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s | 36| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s | 37 38llhttp is faster by approximately **156%**. 39 40## Maintenance 41 42llhttp project has about 1400 lines of TypeScript code describing the parser 43itself and around 450 lines of C code and headers providing the helper methods. 44The whole [http_parser][0] is implemented in approximately 2500 lines of C, and 45436 lines of headers. 46 47All optimizations and multi-character matching in llhttp are generated 48automatically, and thus doesn't add any extra maintenance cost. On the contrary, 49most of http_parser's code is hand-optimized and unrolled. Instead describing 50"how" it should parse the HTTP requests/responses, a maintainer should 51implement the new features in [http_parser][0] cautiously, considering 52possible performance degradation and manually optimizing the new code. 53 54## Verification 55 56The state machine graph is encoded explicitly in llhttp. The [llparse][1] 57automatically checks the graph for absence of loops and correct reporting of the 58input ranges (spans) like header names and values. In the future, additional 59checks could be performed to get even stricter verification of the llhttp. 60 61## Usage 62 63```C 64#include "stdio.h" 65#include "llhttp.h" 66#include "string.h" 67 68int handle_on_message_complete(llhttp_t* parser) { 69 fprintf(stdout, "Message completed!\n"); 70 return 0; 71} 72 73int main() { 74 llhttp_t parser; 75 llhttp_settings_t settings; 76 77 /*Initialize user callbacks and settings */ 78 llhttp_settings_init(&settings); 79 80 /*Set user callback */ 81 settings.on_message_complete = handle_on_message_complete; 82 83 /*Initialize the parser in HTTP_BOTH mode, meaning that it will select between 84 *HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first 85 *input. 86 */ 87 llhttp_init(&parser, HTTP_BOTH, &settings); 88 89 /*Parse request! */ 90 const char* request = "GET / HTTP/1.1\r\n\r\n"; 91 int request_len = strlen(request); 92 93 enum llhttp_errno err = llhttp_execute(&parser, request, request_len); 94 if (err == HPE_OK) { 95 fprintf(stdout, "Successfully parsed!\n"); 96 } else { 97 fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err), parser.reason); 98 } 99} 100``` 101For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h). 102 103## API 104 105### llhttp_settings_t 106 107The settings object contains a list of callbacks that the parser will invoke. 108 109The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser): 110 111* `on_message_begin`: Invoked when a new request/response starts. 112* `on_message_complete`: Invoked when a request/response has been completedly parsed. 113* `on_url_complete`: Invoked after the URL has been parsed. 114* `on_method_complete`: Invoked after the HTTP method has been parsed. 115* `on_version_complete`: Invoked after the HTTP version has been parsed. 116* `on_status_complete`: Invoked after the status code has been parsed. 117* `on_header_field_complete`: Invoked after a header name has been parsed. 118* `on_header_value_complete`: Invoked after a header value has been parsed. 119* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`. 120* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started. 121* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started. 122* `on_chunk_complete`: Invoked after a new chunk is received. 123* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message 124 is received on the same parser. This is not invoked for the first message of the parser. 125 126The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback): 127 128* `on_url`: Invoked when another character of the URL is received. 129* `on_status`: Invoked when another character of the status is received. 130* `on_method`: Invoked when another character of the method is received. 131 When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/` 132 of the first message. 133* `on_version`: Invoked when another character of the version is received. 134* `on_header_field`: Invoked when another character of a header name is received. 135* `on_header_value`: Invoked when another character of a header value is received. 136* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received. 137* `on_chunk_extension_value`: Invoked when another character of a extension value is received. 138 139The callback `on_headers_complete`, invoked when headers are completed, can return: 140 141* `0`: Proceed normally. 142* `1`: Assume that request/response has no body, and proceed to parsing the next message. 143* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`. 144* `-1`: Error 145* `HPE_PAUSED`: Pause the parser. 146 147### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)` 148 149Initialize the parser with specific type and user settings. 150 151### `uint8_t llhttp_get_type(llhttp_t* parser)` 152 153Returns the type of the parser. 154 155### `uint8_t llhttp_get_http_major(llhttp_t* parser)` 156 157Returns the major version of the HTTP protocol of the current request/response. 158 159### `uint8_t llhttp_get_http_minor(llhttp_t* parser)` 160 161Returns the minor version of the HTTP protocol of the current request/response. 162 163### `uint8_t llhttp_get_method(llhttp_t* parser)` 164 165Returns the method of the current request. 166 167### `int llhttp_get_status_code(llhttp_t* parser)` 168 169Returns the method of the current response. 170 171### `uint8_t llhttp_get_upgrade(llhttp_t* parser)` 172 173Returns `1` if request includes the `Connection: upgrade` header. 174 175### `void llhttp_reset(llhttp_t* parser)` 176 177Reset an already initialized parser back to the start state, preserving the 178existing parser type, callback settings, user data, and lenient flags. 179 180### `void llhttp_settings_init(llhttp_settings_t* settings)` 181 182Initialize the settings object. 183 184### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)` 185 186Parse full or partial request/response, invoking user callbacks along the way. 187 188If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts, 189and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno, 190the execution can be resumed with `llhttp_resume()` call. 191 192In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned 193after fully parsing the request/response. If the user wishes to continue parsing, 194they need to invoke `llhttp_resume_after_upgrade()`. 195 196**if this function ever returns a non-pause type error, it will continue to return 197the same error upon each successive call up until `llhttp_init()` is called.** 198 199### `llhttp_errno_t llhttp_finish(llhttp_t* parser)` 200 201This method should be called when the other side has no further bytes to 202send (e.g. shutdown of readable side of the TCP connection.) 203 204Requests without `Content-Length` and other messages might require treating 205all incoming bytes as the part of the body, up to the last byte of the 206connection. 207 208This method will invoke `on_message_complete()` callback if the 209request was terminated safely. Otherwise a error code would be returned. 210 211 212### `int llhttp_message_needs_eof(const llhttp_t* parser)` 213 214Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF. 215 216### `int llhttp_should_keep_alive(const llhttp_t* parser)` 217 218Returns `1` if there might be any other messages following the last that was 219successfully parsed. 220 221### `void llhttp_pause(llhttp_t* parser)` 222 223Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set 224appropriate error reason. 225 226**Do not call this from user callbacks! User callbacks must return 227`HPE_PAUSED` if pausing is required.** 228 229### `void llhttp_resume(llhttp_t* parser)` 230 231Might be called to resume the execution after the pause in user's callback. 232 233See `llhttp_execute()` above for details. 234 235**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.** 236 237### `void llhttp_resume_after_upgrade(llhttp_t* parser)` 238 239Might be called to resume the execution after the pause in user's callback. 240See `llhttp_execute()` above for details. 241 242**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`** 243 244### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)` 245 246Returns the latest error. 247 248### `const char* llhttp_get_error_reason(const llhttp_t* parser)` 249 250Returns the verbal explanation of the latest returned error. 251 252**User callback should set error reason when returning the error. See 253`llhttp_set_error_reason()` for details.** 254 255### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)` 256 257Assign verbal description to the returned error. Must be called in user 258callbacks right before returning the errno. 259 260**`HPE_USER` error code might be useful in user callbacks.** 261 262### `const char* llhttp_get_error_pos(const llhttp_t* parser)` 263 264Returns the pointer to the last parsed byte before the returned error. The 265pointer is relative to the `data` argument of `llhttp_execute()`. 266 267**This method might be useful for counting the number of parsed bytes.** 268 269### `const char* llhttp_errno_name(llhttp_errno_t err)` 270 271Returns textual name of error code. 272 273### `const char* llhttp_method_name(llhttp_method_t method)` 274 275Returns textual name of HTTP method. 276 277### `const char* llhttp_status_name(llhttp_status_t status)` 278 279Returns textual name of HTTP status. 280 281### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)` 282 283Enables/disables lenient header value parsing (disabled by default). 284Lenient parsing disables header value token checks, extending llhttp's 285protocol support to highly non-compliant clients/server. 286 287No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when 288lenient parsing is "on". 289 290**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 291 292### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)` 293 294Enables/disables lenient handling of conflicting `Transfer-Encoding` and 295`Content-Length` headers (disabled by default). 296 297Normally `llhttp` would error when `Transfer-Encoding` is present in 298conjunction with `Content-Length`. 299 300This error is important to prevent HTTP request smuggling, but may be less desirable 301for small number of cases involving legacy servers. 302 303**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 304 305### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)` 306 307Enables/disables lenient handling of `Connection: close` and HTTP/1.0 308requests responses. 309 310Normally `llhttp` would error the HTTP request/response 311after the request/response with `Connection: close` and `Content-Length`. 312 313This is important to prevent cache poisoning attacks, 314but might interact badly with outdated and insecure clients. 315 316With this flag the extra request/response will be parsed normally. 317 318**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!** 319 320### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)` 321 322Enables/disables lenient handling of `Transfer-Encoding` header. 323 324Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value 325and another value after it (either in a single header or in multiple 326headers whose value are internally joined using `, `). 327 328This is mandated by the spec to reliably determine request body size and thus 329avoid request smuggling. 330 331With this flag the extra value will be parsed normally. 332 333**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 334 335### `void llhttp_set_lenient_version(llhttp_t* parser, int enabled)` 336 337Enables/disables lenient handling of HTTP version. 338 339Normally `llhttp` would error when the HTTP version in the request or status line 340is not `0.9`, `1.0`, `1.1` or `2.0`. 341With this flag the extra value will be parsed normally. 342 343**Enabling this flag can pose a security issue since you will allow unsupported HTTP versions. USE WITH CAUTION!** 344 345### `void llhttp_set_lenient_data_after_close(llhttp_t* parser, int enabled)` 346 347Enables/disables lenient handling of additional data received after a message ends 348and keep-alive is disabled. 349 350Normally `llhttp` would error when additional unexpected data is received if the message 351contains the `Connection` header with `close` value. 352With this flag the extra data will discarded without throwing an error. 353 354**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!** 355 356### `void llhttp_set_lenient_optional_lf_after_cr(llhttp_t* parser, int enabled)` 357 358Enables/disables lenient handling of incomplete CRLF sequences. 359 360Normally `llhttp` would error when a CR is not followed by LF when terminating the 361request line, the status line, the headers or a chunk header. 362With this flag only a CR is required to terminate such sections. 363 364**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 365 366### `void llhttp_set_lenient_optional_cr_before_lf(llhttp_t* parser, int enabled)` 367 368Enables/disables lenient handling of line separators. 369 370Normally `llhttp` would error when a LF is not preceded by CR when terminating the 371request line, the status line, the headers, a chunk header or a chunk data. 372With this flag only a LF is required to terminate such sections. 373 374**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 375 376### `void llhttp_set_lenient_optional_crlf_after_chunk(llhttp_t* parser, int enabled)` 377 378Enables/disables lenient handling of chunks not separated via CRLF. 379 380Normally `llhttp` would error when after a chunk data a CRLF is missing before 381starting a new chunk. 382With this flag the new chunk can start immediately after the previous one. 383 384**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 385 386### `void llhttp_set_lenient_spaces_after_chunk_size(llhttp_t* parser, int enabled)` 387 388Enables/disables lenient handling of spaces after chunk size. 389 390Normally `llhttp` would error when after a chunk size is followed by one or more spaces are present instead of a CRLF or `;`. 391With this flag this check is disabled. 392 393**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 394 395## Build Instructions 396 397Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run: 398 399```sh 400npm ci 401make 402``` 403 404--- 405 406### Bindings to other languages 407 408* Lua: [MunifTanjim/llhttp.lua][11] 409* Python: [pallas/pyllhttp][8] 410* Ruby: [metabahn/llhttp][9] 411* Rust: [JackLiar/rust-llhttp][10] 412 413### Using with CMake 414 415If you want to use this library in a CMake project as a shared library, you can use the snippet below. 416 417``` 418FetchContent_Declare(llhttp 419 URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz") 420 421FetchContent_MakeAvailable(llhttp) 422 423# Link with the llhttp_shared target 424target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_shared ${PROJECT_NAME}) 425``` 426 427If you want to use this library in a CMake project as a static library, you can set some cache variables first. 428 429``` 430FetchContent_Declare(llhttp 431 URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz") 432 433set(BUILD_SHARED_LIBS OFF CACHE INTERNAL "") 434set(BUILD_STATIC_LIBS ON CACHE INTERNAL "") 435FetchContent_MakeAvailable(llhttp) 436 437# Link with the llhttp_static target 438target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_static ${PROJECT_NAME}) 439``` 440 441_Note that using the git repo directly (e.g., via a git repo url and tag) will not work with FetchContent_Declare because [CMakeLists.txt](./CMakeLists.txt) requires string replacements (e.g., `_RELEASE_`) before it will build._ 442 443## Building on Windows 444 445### Installation 446 447* `choco install git` 448* `choco install node` 449* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer) 450* `choco install make` (or if you have MinGW, it comes bundled) 451 4521. Ensure that `Clang` and `make` are in your system path. 4532. Using Git Bash, clone the repo to your preferred location. 4543. Cd into the cloned directory and run `npm ci` 4555. Run `make` 4566. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries. 4577. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`. 458 459### A simple example on linking with the library: 460 461Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`. 462 463If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`. 464 465#### LICENSE 466 467This software is licensed under the MIT License. 468 469Copyright Fedor Indutny, 2018. 470 471Permission is hereby granted, free of charge, to any person obtaining a 472copy of this software and associated documentation files (the 473"Software"), to deal in the Software without restriction, including 474without limitation the rights to use, copy, modify, merge, publish, 475distribute, sublicense, and/or sell copies of the Software, and to permit 476persons to whom the Software is furnished to do so, subject to the 477following conditions: 478 479The above copyright notice and this permission notice shall be included 480in all copies or substantial portions of the Software. 481 482THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 483OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 484MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 485NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 486DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 487OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 488USE OR OTHER DEALINGS IN THE SOFTWARE. 489 490[0]: https://github.com/nodejs/http-parser 491[1]: https://github.com/nodejs/llparse 492[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling 493[3]: https://en.wikipedia.org/wiki/Tail_call 494[4]: https://llvm.org/docs/LangRef.html 495[5]: https://llvm.org/docs/LangRef.html#call-instruction 496[6]: https://clang.llvm.org/ 497[7]: https://github.com/nodejs/node 498[8]: https://github.com/pallas/pyllhttp 499[9]: https://github.com/metabahn/llhttp 500[10]: https://github.com/JackLiar/rust-llhttp 501[11]: https://github.com/MunifTanjim/llhttp.lua 502