1# llhttp 2[](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI) 3 4Port of [http_parser][0] to [llparse][1]. 5 6## Why? 7 8Let's face it, [http_parser][0] is practically unmaintainable. Even 9introduction of a single new method results in a significant code churn. 10 11This project aims to: 12 13* Make it maintainable 14* Verifiable 15* Improving benchmarks where possible 16 17More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY) 18 19## How? 20 21Over time, different approaches for improving [http_parser][0]'s code base 22were tried. However, all of them failed due to resulting significant performance 23degradation. 24 25This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used 26to generate the output C source file, which could be compiled and 27linked with the embedder's program (like [Node.js][7]). 28 29## Performance 30 31So far llhttp outperforms http_parser: 32 33| | input size | bandwidth | reqs/sec | time | 34|:----------------|-----------:|-------------:|-----------:|--------:| 35| **llhttp** | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s | 36| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s | 37 38llhttp is faster by approximately **156%**. 39 40## Maintenance 41 42llhttp project has about 1400 lines of TypeScript code describing the parser 43itself and around 450 lines of C code and headers providing the helper methods. 44The whole [http_parser][0] is implemented in approximately 2500 lines of C, and 45436 lines of headers. 46 47All optimizations and multi-character matching in llhttp are generated 48automatically, and thus doesn't add any extra maintenance cost. On the contrary, 49most of http_parser's code is hand-optimized and unrolled. Instead describing 50"how" it should parse the HTTP requests/responses, a maintainer should 51implement the new features in [http_parser][0] cautiously, considering 52possible performance degradation and manually optimizing the new code. 53 54## Verification 55 56The state machine graph is encoded explicitly in llhttp. The [llparse][1] 57automatically checks the graph for absence of loops and correct reporting of the 58input ranges (spans) like header names and values. In the future, additional 59checks could be performed to get even stricter verification of the llhttp. 60 61## Usage 62 63```C 64#include "stdio.h" 65#include "llhttp.h" 66#include "string.h" 67 68int handle_on_message_complete(llhttp_t* parser) { 69 fprintf(stdout, "Message completed!\n"); 70 return 0; 71} 72 73int main() { 74 llhttp_t parser; 75 llhttp_settings_t settings; 76 77 /*Initialize user callbacks and settings */ 78 llhttp_settings_init(&settings); 79 80 /*Set user callback */ 81 settings.on_message_complete = handle_on_message_complete; 82 83 /*Initialize the parser in HTTP_BOTH mode, meaning that it will select between 84 *HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first 85 *input. 86 */ 87 llhttp_init(&parser, HTTP_BOTH, &settings); 88 89 /*Parse request! */ 90 const char* request = "GET / HTTP/1.1\r\n\r\n"; 91 int request_len = strlen(request); 92 93 enum llhttp_errno err = llhttp_execute(&parser, request, request_len); 94 if (err == HPE_OK) { 95 fprintf(stdout, "Successfully parsed!\n"); 96 } else { 97 fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err), parser.reason); 98 } 99} 100``` 101For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h). 102 103## API 104 105### llhttp_settings_t 106 107The settings object contains a list of callbacks that the parser will invoke. 108 109The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser): 110 111* `on_message_begin`: Invoked when a new request/response starts. 112* `on_message_complete`: Invoked when a request/response has been completedly parsed. 113* `on_url_complete`: Invoked after the URL has been parsed. 114* `on_method_complete`: Invoked after the HTTP method has been parsed. 115* `on_version_complete`: Invoked after the HTTP version has been parsed. 116* `on_status_complete`: Invoked after the status code has been parsed. 117* `on_header_field_complete`: Invoked after a header name has been parsed. 118* `on_header_value_complete`: Invoked after a header value has been parsed. 119* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`. 120* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started. 121* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started. 122* `on_chunk_complete`: Invoked after a new chunk is received. 123* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message 124 is received on the same parser. This is not invoked for the first message of the parser. 125 126The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback): 127 128* `on_url`: Invoked when another character of the URL is received. 129* `on_status`: Invoked when another character of the status is received. 130* `on_method`: Invoked when another character of the method is received. 131 When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/` 132 of the first message. 133* `on_version`: Invoked when another character of the version is received. 134* `on_header_field`: Invoked when another character of a header name is received. 135* `on_header_value`: Invoked when another character of a header value is received. 136* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received. 137* `on_chunk_extension_value`: Invoked when another character of a extension value is received. 138 139The callback `on_headers_complete`, invoked when headers are completed, can return: 140 141* `0`: Proceed normally. 142* `1`: Assume that request/response has no body, and proceed to parsing the next message. 143* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`. 144* `-1`: Error 145* `HPE_PAUSED`: Pause the parser. 146 147### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)` 148 149Initialize the parser with specific type and user settings. 150 151### `uint8_t llhttp_get_type(llhttp_t* parser)` 152 153Returns the type of the parser. 154 155### `uint8_t llhttp_get_http_major(llhttp_t* parser)` 156 157Returns the major version of the HTTP protocol of the current request/response. 158 159### `uint8_t llhttp_get_http_minor(llhttp_t* parser)` 160 161Returns the minor version of the HTTP protocol of the current request/response. 162 163### `uint8_t llhttp_get_method(llhttp_t* parser)` 164 165Returns the method of the current request. 166 167### `int llhttp_get_status_code(llhttp_t* parser)` 168 169Returns the method of the current response. 170 171### `uint8_t llhttp_get_upgrade(llhttp_t* parser)` 172 173Returns `1` if request includes the `Connection: upgrade` header. 174 175### `void llhttp_reset(llhttp_t* parser)` 176 177Reset an already initialized parser back to the start state, preserving the 178existing parser type, callback settings, user data, and lenient flags. 179 180### `void llhttp_settings_init(llhttp_settings_t* settings)` 181 182Initialize the settings object. 183 184### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)` 185 186Parse full or partial request/response, invoking user callbacks along the way. 187 188If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts, 189and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno, 190the execution can be resumed with `llhttp_resume()` call. 191 192In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned 193after fully parsing the request/response. If the user wishes to continue parsing, 194they need to invoke `llhttp_resume_after_upgrade()`. 195 196**if this function ever returns a non-pause type error, it will continue to return 197the same error upon each successive call up until `llhttp_init()` is called.** 198 199### `llhttp_errno_t llhttp_finish(llhttp_t* parser)` 200 201This method should be called when the other side has no further bytes to 202send (e.g. shutdown of readable side of the TCP connection.) 203 204Requests without `Content-Length` and other messages might require treating 205all incoming bytes as the part of the body, up to the last byte of the 206connection. 207 208This method will invoke `on_message_complete()` callback if the 209request was terminated safely. Otherwise a error code would be returned. 210 211 212### `int llhttp_message_needs_eof(const llhttp_t* parser)` 213 214Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF. 215 216### `int llhttp_should_keep_alive(const llhttp_t* parser)` 217 218Returns `1` if there might be any other messages following the last that was 219successfully parsed. 220 221### `void llhttp_pause(llhttp_t* parser)` 222 223Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set 224appropriate error reason. 225 226**Do not call this from user callbacks! User callbacks must return 227`HPE_PAUSED` if pausing is required.** 228 229### `void llhttp_resume(llhttp_t* parser)` 230 231Might be called to resume the execution after the pause in user's callback. 232 233See `llhttp_execute()` above for details. 234 235**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.** 236 237### `void llhttp_resume_after_upgrade(llhttp_t* parser)` 238 239Might be called to resume the execution after the pause in user's callback. 240See `llhttp_execute()` above for details. 241 242**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`** 243 244### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)` 245 246Returns the latest error. 247 248### `const char* llhttp_get_error_reason(const llhttp_t* parser)` 249 250Returns the verbal explanation of the latest returned error. 251 252**User callback should set error reason when returning the error. See 253`llhttp_set_error_reason()` for details.** 254 255### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)` 256 257Assign verbal description to the returned error. Must be called in user 258callbacks right before returning the errno. 259 260**`HPE_USER` error code might be useful in user callbacks.** 261 262### `const char* llhttp_get_error_pos(const llhttp_t* parser)` 263 264Returns the pointer to the last parsed byte before the returned error. The 265pointer is relative to the `data` argument of `llhttp_execute()`. 266 267**This method might be useful for counting the number of parsed bytes.** 268 269### `const char* llhttp_errno_name(llhttp_errno_t err)` 270 271Returns textual name of error code. 272 273### `const char* llhttp_method_name(llhttp_method_t method)` 274 275Returns textual name of HTTP method. 276 277### `const char* llhttp_status_name(llhttp_status_t status)` 278 279Returns textual name of HTTP status. 280 281### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)` 282 283Enables/disables lenient header value parsing (disabled by default). 284Lenient parsing disables header value token checks, extending llhttp's 285protocol support to highly non-compliant clients/server. 286 287No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when 288lenient parsing is "on". 289 290**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 291 292### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)` 293 294Enables/disables lenient handling of conflicting `Transfer-Encoding` and 295`Content-Length` headers (disabled by default). 296 297Normally `llhttp` would error when `Transfer-Encoding` is present in 298conjunction with `Content-Length`. 299 300This error is important to prevent HTTP request smuggling, but may be less desirable 301for small number of cases involving legacy servers. 302 303**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 304 305### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)` 306 307Enables/disables lenient handling of `Connection: close` and HTTP/1.0 308requests responses. 309 310Normally `llhttp` would error the HTTP request/response 311after the request/response with `Connection: close` and `Content-Length`. 312 313This is important to prevent cache poisoning attacks, 314but might interact badly with outdated and insecure clients. 315 316With this flag the extra request/response will be parsed normally. 317 318**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!** 319 320### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)` 321 322Enables/disables lenient handling of `Transfer-Encoding` header. 323 324Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value 325and another value after it (either in a single header or in multiple 326headers whose value are internally joined using `, `). 327 328This is mandated by the spec to reliably determine request body size and thus 329avoid request smuggling. 330 331With this flag the extra value will be parsed normally. 332 333**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 334 335### `void llhttp_set_lenient_version(llhttp_t* parser, int enabled)` 336 337Enables/disables lenient handling of HTTP version. 338 339Normally `llhttp` would error when the HTTP version in the request or status line 340is not `0.9`, `1.0`, `1.1` or `2.0`. 341With this flag the extra value will be parsed normally. 342 343**Enabling this flag can pose a security issue since you will allow unsupported HTTP versions. USE WITH CAUTION!** 344 345### `void llhttp_set_lenient_data_after_close(llhttp_t* parser, int enabled)` 346 347Enables/disables lenient handling of additional data received after a message ends 348and keep-alive is disabled. 349 350Normally `llhttp` would error when additional unexpected data is received if the message 351contains the `Connection` header with `close` value. 352With this flag the extra data will discarded without throwing an error. 353 354**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!** 355 356### `void llhttp_set_lenient_optional_lf_after_cr(llhttp_t* parser, int enabled)` 357 358Enables/disables lenient handling of incomplete CRLF sequences. 359 360Normally `llhttp` would error when a CR is not followed by LF when terminating the 361request line, the status line, the headers or a chunk header. 362With this flag only a CR is required to terminate such sections. 363 364**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 365 366### `void llhttp_set_lenient_optional_crlf_after_chunk(llhttp_t* parser, int enabled)` 367 368Enables/disables lenient handling of chunks not separated via CRLF. 369 370Normally `llhttp` would error when after a chunk data a CRLF is missing before 371starting a new chunk. 372With this flag the new chunk can start immediately after the previous one. 373 374**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!** 375 376## Build Instructions 377 378Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run: 379 380```sh 381npm install 382make 383``` 384 385--- 386 387### Bindings to other languages 388 389* Lua: [MunifTanjim/llhttp.lua][11] 390* Python: [pallas/pyllhttp][8] 391* Ruby: [metabahn/llhttp][9] 392* Rust: [JackLiar/rust-llhttp][10] 393 394### Using with CMake 395 396If you want to use this library in a CMake project as a shared library, you can use the snippet below. 397 398``` 399FetchContent_Declare(llhttp 400 URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz") 401 402FetchContent_MakeAvailable(llhttp) 403 404# Link with the llhttp_shared target 405target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_shared ${PROJECT_NAME}) 406``` 407 408If you want to use this library in a CMake project as a static library, you can set some cache variables first. 409 410``` 411FetchContent_Declare(llhttp 412 URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz") 413 414set(BUILD_SHARED_LIBS OFF CACHE INTERNAL "") 415set(BUILD_STATIC_LIBS ON CACHE INTERNAL "") 416FetchContent_MakeAvailable(llhttp) 417 418# Link with the llhttp_static target 419target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_static ${PROJECT_NAME}) 420``` 421 422_Note that using the git repo directly (e.g., via a git repo url and tag) will not work with FetchContent_Declare because [CMakeLists.txt](./CMakeLists.txt) requires string replacements (e.g., `_RELEASE_`) before it will build._ 423 424## Building on Windows 425 426### Installation 427 428* `choco install git` 429* `choco install node` 430* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer) 431* `choco install make` (or if you have MinGW, it comes bundled) 432 4331. Ensure that `Clang` and `make` are in your system path. 4342. Using Git Bash, clone the repo to your preferred location. 4353. Cd into the cloned directory and run `npm install` 4365. Run `make` 4376. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries. 4387. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`. 439 440### A simple example on linking with the library: 441 442Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`. 443 444If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`. 445 446#### LICENSE 447 448This software is licensed under the MIT License. 449 450Copyright Fedor Indutny, 2018. 451 452Permission is hereby granted, free of charge, to any person obtaining a 453copy of this software and associated documentation files (the 454"Software"), to deal in the Software without restriction, including 455without limitation the rights to use, copy, modify, merge, publish, 456distribute, sublicense, and/or sell copies of the Software, and to permit 457persons to whom the Software is furnished to do so, subject to the 458following conditions: 459 460The above copyright notice and this permission notice shall be included 461in all copies or substantial portions of the Software. 462 463THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 464OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 465MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 466NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 467DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 468OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 469USE OR OTHER DEALINGS IN THE SOFTWARE. 470 471[0]: https://github.com/nodejs/http-parser 472[1]: https://github.com/nodejs/llparse 473[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling 474[3]: https://en.wikipedia.org/wiki/Tail_call 475[4]: https://llvm.org/docs/LangRef.html 476[5]: https://llvm.org/docs/LangRef.html#call-instruction 477[6]: https://clang.llvm.org/ 478[7]: https://github.com/nodejs/node 479[8]: https://github.com/pallas/pyllhttp 480[9]: https://github.com/metabahn/llhttp 481[10]: https://github.com/JackLiar/rust-llhttp 482[11]: https://github.com/MunifTanjim/llhttp.lua 483