1# llhttp 2[![CI](https://github.com/nodejs/llhttp/workflows/CI/badge.svg)](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI) 3 4Port of [http_parser][0] to [llparse][1]. 5 6## Why? 7 8Let's face it, [http_parser][0] is practically unmaintainable. Even 9introduction of a single new method results in a significant code churn. 10 11This project aims to: 12 13* Make it maintainable 14* Verifiable 15* Improving benchmarks where possible 16 17More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY) 18 19## How? 20 21Over time, different approaches for improving [http_parser][0]'s code base 22were tried. However, all of them failed due to resulting significant performance 23degradation. 24 25This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used 26to generate the output C source file, which could be compiled and 27linked with the embedder's program (like [Node.js][7]). 28 29## Performance 30 31So far llhttp outperforms http_parser: 32 33| | input size | bandwidth | reqs/sec | time | 34|:----------------|-----------:|-------------:|-----------:|--------:| 35| **llhttp** | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s | 36| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s | 37 38llhttp is faster by approximately **156%**. 39 40## Maintenance 41 42llhttp project has about 1400 lines of TypeScript code describing the parser 43itself and around 450 lines of C code and headers providing the helper methods. 44The whole [http_parser][0] is implemented in approximately 2500 lines of C, and 45436 lines of headers. 46 47All optimizations and multi-character matching in llhttp are generated 48automatically, and thus doesn't add any extra maintenance cost. On the contrary, 49most of http_parser's code is hand-optimized and unrolled. Instead describing 50"how" it should parse the HTTP requests/responses, a maintainer should 51implement the new features in [http_parser][0] cautiously, considering 52possible performance degradation and manually optimizing the new code. 53 54## Verification 55 56The state machine graph is encoded explicitly in llhttp. The [llparse][1] 57automatically checks the graph for absence of loops and correct reporting of the 58input ranges (spans) like header names and values. In the future, additional 59checks could be performed to get even stricter verification of the llhttp. 60 61## Usage 62 63```C 64#include "llhttp.h" 65 66llhttp_t parser; 67llhttp_settings_t settings; 68 69/* Initialize user callbacks and settings */ 70llhttp_settings_init(&settings); 71 72/* Set user callback */ 73settings.on_message_complete = handle_on_message_complete; 74 75/* Initialize the parser in HTTP_BOTH mode, meaning that it will select between 76 * HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first 77 * input. 78 */ 79llhttp_init(&parser, HTTP_BOTH, &settings); 80 81/* Parse request! */ 82const char* request = "GET / HTTP/1.1\r\n\r\n"; 83int request_len = strlen(request); 84 85enum llhttp_errno err = llhttp_execute(&parser, request, request_len); 86if (err == HPE_OK) { 87 /* Successfully parsed! */ 88} else { 89 fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err), 90 parser.reason); 91} 92``` 93For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h). 94 95## API 96 97### llhttp_settings_t 98 99The settings object contains a list of callbacks that the parser will invoke. 100 101The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser): 102 103* `on_message_begin`: Invoked when a new request/response starts. 104* `on_message_complete`: Invoked when a request/response has been completedly parsed. 105* `on_url_complete`: Invoked after the URL has been parsed. 106* `on_method_complete`: Invoked after the HTTP method has been parsed. 107* `on_version_complete`: Invoked after the HTTP version has been parsed. 108* `on_status_complete`: Invoked after the status code has been parsed. 109* `on_header_field_complete`: Invoked after a header name has been parsed. 110* `on_header_value_complete`: Invoked after a header value has been parsed. 111* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`. 112* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started. 113* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started. 114* `on_chunk_complete`: Invoked after a new chunk is received. 115* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message 116 is received on the same parser. This is not invoked for the first message of the parser. 117 118The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback): 119 120* `on_url`: Invoked when another character of the URL is received. 121* `on_status`: Invoked when another character of the status is received. 122* `on_method`: Invoked when another character of the method is received. 123 When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/` 124 of the first message. 125* `on_version`: Invoked when another character of the version is received. 126* `on_header_field`: Invoked when another character of a header name is received. 127* `on_header_value`: Invoked when another character of a header value is received. 128* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received. 129* `on_chunk_extension_value`: Invoked when another character of a extension value is received. 130 131The callback `on_headers_complete`, invoked when headers are completed, can return: 132 133* `0`: Proceed normally. 134* `1`: Assume that request/response has no body, and proceed to parsing the next message. 135* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`. 136* `-1`: Error 137* `HPE_PAUSED`: Pause the parser. 138 139### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)` 140 141Initialize the parser with specific type and user settings. 142 143### `uint8_t llhttp_get_type(llhttp_t* parser)` 144 145Returns the type of the parser. 146 147### `uint8_t llhttp_get_http_major(llhttp_t* parser)` 148 149Returns the major version of the HTTP protocol of the current request/response. 150 151### `uint8_t llhttp_get_http_minor(llhttp_t* parser)` 152 153Returns the minor version of the HTTP protocol of the current request/response. 154 155### `uint8_t llhttp_get_method(llhttp_t* parser)` 156 157Returns the method of the current request. 158 159### `int llhttp_get_status_code(llhttp_t* parser)` 160 161Returns the method of the current response. 162 163### `uint8_t llhttp_get_upgrade(llhttp_t* parser)` 164 165Returns `1` if request includes the `Connection: upgrade` header. 166 167### `void llhttp_reset(llhttp_t* parser)` 168 169Reset an already initialized parser back to the start state, preserving the 170existing parser type, callback settings, user data, and lenient flags. 171 172### `void llhttp_settings_init(llhttp_settings_t* settings)` 173 174Initialize the settings object. 175 176### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)` 177 178Parse full or partial request/response, invoking user callbacks along the way. 179 180If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts, 181and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno, 182the execution can be resumed with `llhttp_resume()` call. 183 184In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned 185after fully parsing the request/response. If the user wishes to continue parsing, 186they need to invoke `llhttp_resume_after_upgrade()`. 187 188**if this function ever returns a non-pause type error, it will continue to return 189the same error upon each successive call up until `llhttp_init()` is called.** 190 191### `llhttp_errno_t llhttp_finish(llhttp_t* parser)` 192 193This method should be called when the other side has no further bytes to 194send (e.g. shutdown of readable side of the TCP connection.) 195 196Requests without `Content-Length` and other messages might require treating 197all incoming bytes as the part of the body, up to the last byte of the 198connection. 199 200This method will invoke `on_message_complete()` callback if the 201request was terminated safely. Otherwise a error code would be returned. 202 203 204### `int llhttp_message_needs_eof(const llhttp_t* parser)` 205 206Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF. 207 208### `int llhttp_should_keep_alive(const llhttp_t* parser)` 209 210Returns `1` if there might be any other messages following the last that was 211successfully parsed. 212 213### `void llhttp_pause(llhttp_t* parser)` 214 215Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set 216appropriate error reason. 217 218**Do not call this from user callbacks! User callbacks must return 219`HPE_PAUSED` if pausing is required.** 220 221### `void llhttp_resume(llhttp_t* parser)` 222 223Might be called to resume the execution after the pause in user's callback. 224 225See `llhttp_execute()` above for details. 226 227**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.** 228 229### `void llhttp_resume_after_upgrade(llhttp_t* parser)` 230 231Might be called to resume the execution after the pause in user's callback. 232See `llhttp_execute()` above for details. 233 234**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`** 235 236### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)` 237 238Returns the latest error. 239 240### `const char* llhttp_get_error_reason(const llhttp_t* parser)` 241 242Returns the verbal explanation of the latest returned error. 243 244**User callback should set error reason when returning the error. See 245`llhttp_set_error_reason()` for details.** 246 247### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)` 248 249Assign verbal description to the returned error. Must be called in user 250callbacks right before returning the errno. 251 252**`HPE_USER` error code might be useful in user callbacks.** 253 254### `const char* llhttp_get_error_pos(const llhttp_t* parser)` 255 256Returns the pointer to the last parsed byte before the returned error. The 257pointer is relative to the `data` argument of `llhttp_execute()`. 258 259**This method might be useful for counting the number of parsed bytes.** 260 261### `const char* llhttp_errno_name(llhttp_errno_t err)` 262 263Returns textual name of error code. 264 265### `const char* llhttp_method_name(llhttp_method_t method)` 266 267Returns textual name of HTTP method. 268 269### `const char* llhttp_status_name(llhttp_status_t status)` 270 271Returns textual name of HTTP status. 272 273### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)` 274 275Enables/disables lenient header value parsing (disabled by default). 276Lenient parsing disables header value token checks, extending llhttp's 277protocol support to highly non-compliant clients/server. 278 279No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when 280lenient parsing is "on". 281 282**USE AT YOUR OWN RISK!** 283 284### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)` 285 286Enables/disables lenient handling of conflicting `Transfer-Encoding` and 287`Content-Length` headers (disabled by default). 288 289Normally `llhttp` would error when `Transfer-Encoding` is present in 290conjunction with `Content-Length`. 291 292This error is important to prevent HTTP request smuggling, but may be less desirable 293for small number of cases involving legacy servers. 294 295**USE AT YOUR OWN RISK!** 296 297### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)` 298 299Enables/disables lenient handling of `Connection: close` and HTTP/1.0 300requests responses. 301 302Normally `llhttp` would error on (in strict mode) or discard (in loose mode) 303the HTTP request/response after the request/response with `Connection: close` 304and `Content-Length`. 305 306This is important to prevent cache poisoning attacks, 307but might interact badly with outdated and insecure clients. 308 309With this flag the extra request/response will be parsed normally. 310 311**USE AT YOUR OWN RISK!** 312 313### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)` 314 315Enables/disables lenient handling of `Transfer-Encoding` header. 316 317Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value 318and another value after it (either in a single header or in multiple 319headers whose value are internally joined using `, `). 320 321This is mandated by the spec to reliably determine request body size and thus 322avoid request smuggling. 323 324With this flag the extra value will be parsed normally. 325 326**USE AT YOUR OWN RISK!** 327 328## Build Instructions 329 330Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run: 331 332```sh 333npm install 334make 335``` 336 337--- 338 339### Bindings to other languages 340 341* Lua: [MunifTanjim/llhttp.lua][11] 342* Python: [pallas/pyllhttp][8] 343* Ruby: [metabahn/llhttp][9] 344* Rust: [JackLiar/rust-llhttp][10] 345 346### Using with CMake 347 348If you want to use this library in a CMake project you can use the snippet below. 349 350``` 351FetchContent_Declare(llhttp 352 URL "https://github.com/nodejs/llhttp/archive/refs/tags/v6.0.5.tar.gz") # Using version 6.0.5 353 354FetchContent_MakeAvailable(llhttp) 355 356target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp ${PROJECT_NAME}) 357``` 358 359## Building on Windows 360 361### Installation 362 363* `choco install git` 364* `choco install node` 365* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer) 366* `choco install make` (or if you have MinGW, it comes bundled) 367 3681. Ensure that `Clang` and `make` are in your system path. 3692. Using Git Bash, clone the repo to your preferred location. 3703. Cd into the cloned directory and run `npm install` 3715. Run `make` 3726. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries. 3737. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`. 374 375### A simple example on linking with the library: 376 377Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`. 378 379If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`. 380 381#### LICENSE 382 383This software is licensed under the MIT License. 384 385Copyright Fedor Indutny, 2018. 386 387Permission is hereby granted, free of charge, to any person obtaining a 388copy of this software and associated documentation files (the 389"Software"), to deal in the Software without restriction, including 390without limitation the rights to use, copy, modify, merge, publish, 391distribute, sublicense, and/or sell copies of the Software, and to permit 392persons to whom the Software is furnished to do so, subject to the 393following conditions: 394 395The above copyright notice and this permission notice shall be included 396in all copies or substantial portions of the Software. 397 398THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 399OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 400MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 401NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 402DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 403OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 404USE OR OTHER DEALINGS IN THE SOFTWARE. 405 406[0]: https://github.com/nodejs/http-parser 407[1]: https://github.com/nodejs/llparse 408[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling 409[3]: https://en.wikipedia.org/wiki/Tail_call 410[4]: https://llvm.org/docs/LangRef.html 411[5]: https://llvm.org/docs/LangRef.html#call-instruction 412[6]: https://clang.llvm.org/ 413[7]: https://github.com/nodejs/node 414[8]: https://github.com/pallas/pyllhttp 415[9]: https://github.com/metabahn/llhttp 416[10]: https://github.com/JackLiar/rust-llhttp 417[11]: https://github.com/MunifTanjim/llhttp.lua 418