1curl internals 2============== 3 4 - [Intro](#intro) 5 - [git](#git) 6 - [Portability](#Portability) 7 - [Windows vs Unix](#winvsunix) 8 - [Library](#Library) 9 - [`Curl_connect`](#Curl_connect) 10 - [`multi_do`](#multi_do) 11 - [`Curl_readwrite`](#Curl_readwrite) 12 - [`multi_done`](#multi_done) 13 - [`Curl_disconnect`](#Curl_disconnect) 14 - [HTTP(S)](#http) 15 - [FTP](#ftp) 16 - [Kerberos](#kerberos) 17 - [TELNET](#telnet) 18 - [FILE](#file) 19 - [SMB](#smb) 20 - [LDAP](#ldap) 21 - [E-mail](#email) 22 - [General](#general) 23 - [Persistent Connections](#persistent) 24 - [multi interface/non-blocking](#multi) 25 - [SSL libraries](#ssl) 26 - [Library Symbols](#symbols) 27 - [Return Codes and Informationals](#returncodes) 28 - [AP/ABI](#abi) 29 - [Client](#client) 30 - [Memory Debugging](#memorydebug) 31 - [Test Suite](#test) 32 - [Asynchronous name resolves](#asyncdns) 33 - [c-ares](#cares) 34 - [`curl_off_t`](#curl_off_t) 35 - [curlx](#curlx) 36 - [Content Encoding](#contentencoding) 37 - [`hostip.c` explained](#hostip) 38 - [Track Down Memory Leaks](#memoryleak) 39 - [`multi_socket`](#multi_socket) 40 - [Structs in libcurl](#structs) 41 - [Curl_easy](#Curl_easy) 42 - [connectdata](#connectdata) 43 - [Curl_multi](#Curl_multi) 44 - [Curl_handler](#Curl_handler) 45 - [conncache](#conncache) 46 - [Curl_share](#Curl_share) 47 - [CookieInfo](#CookieInfo) 48 49<a name="intro"></a> 50Intro 51===== 52 53 This project is split in two. The library and the client. The client part 54 uses the library, but the library is designed to allow other applications to 55 use it. 56 57 The largest amount of code and complexity is in the library part. 58 59 60<a name="git"></a> 61git 62=== 63 64 All changes to the sources are committed to the git repository as soon as 65 they're somewhat verified to work. Changes shall be committed as independently 66 as possible so that individual changes can be easily spotted and tracked 67 afterwards. 68 69 Tagging shall be used extensively, and by the time we release new archives we 70 should tag the sources with a name similar to the released version number. 71 72<a name="Portability"></a> 73Portability 74=========== 75 76 We write curl and libcurl to compile with C89 compilers. On 32-bit and up 77 machines. Most of libcurl assumes more or less POSIX compliance but that's 78 not a requirement. 79 80 We write libcurl to build and work with lots of third party tools, and we 81 want it to remain functional and buildable with these and later versions 82 (older versions may still work but is not what we work hard to maintain): 83 84Dependencies 85------------ 86 87 - OpenSSL 0.9.7 88 - GnuTLS 2.11.3 89 - zlib 1.1.4 90 - libssh2 0.16 91 - c-ares 1.6.0 92 - libidn2 2.0.0 93 - wolfSSL 2.0.0 94 - openldap 2.0 95 - MIT Kerberos 1.2.4 96 - GSKit V5R3M0 97 - NSS 3.14.x 98 - PolarSSL 1.3.0 99 - Heimdal ? 100 - nghttp2 1.0.0 101 102Operating Systems 103----------------- 104 105 On systems where configure runs, we aim at working on them all - if they have 106 a suitable C compiler. On systems that don't run configure, we strive to keep 107 curl running correctly on: 108 109 - Windows 98 110 - AS/400 V5R3M0 111 - Symbian 9.1 112 - Windows CE ? 113 - TPF ? 114 115Build tools 116----------- 117 118 When writing code (mostly for generating stuff included in release tarballs) 119 we use a few "build tools" and we make sure that we remain functional with 120 these versions: 121 122 - GNU Libtool 1.4.2 123 - GNU Autoconf 2.57 124 - GNU Automake 1.7 125 - GNU M4 1.4 126 - perl 5.004 127 - roffit 0.5 128 - groff ? (any version that supports `groff -Tps -man [in] [out]`) 129 - ps2pdf (gs) ? 130 131<a name="winvsunix"></a> 132Windows vs Unix 133=============== 134 135 There are a few differences in how to program curl the Unix way compared to 136 the Windows way. Perhaps the four most notable details are: 137 138 1. Different function names for socket operations. 139 140 In curl, this is solved with defines and macros, so that the source looks 141 the same in all places except for the header file that defines them. The 142 macros in use are `sclose()`, `sread()` and `swrite()`. 143 144 2. Windows requires a couple of init calls for the socket stuff. 145 146 That's taken care of by the `curl_global_init()` call, but if other libs 147 also do it etc there might be reasons for applications to alter that 148 behaviour. 149 150 3. The file descriptors for network communication and file operations are 151 not as easily interchangeable as in Unix. 152 153 We avoid this by not trying any funny tricks on file descriptors. 154 155 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus 156 destroying binary data, although you do want that conversion if it is 157 text coming through... (sigh) 158 159 We set stdout to binary under windows 160 161 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All 162 conditionals that deal with features *should* instead be in the format 163 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts, 164 we maintain a `curl_config-win32.h` file in lib directory that is supposed to 165 look exactly like a `curl_config.h` file would have looked like on a Windows 166 machine! 167 168 Generally speaking: always remember that this will be compiled on dozens of 169 operating systems. Don't walk on the edge! 170 171<a name="Library"></a> 172Library 173======= 174 175 (See [Structs in libcurl](#structs) for the separate section describing all 176 major internal structs and their purposes.) 177 178 There are plenty of entry points to the library, namely each publicly defined 179 function that libcurl offers to applications. All of those functions are 180 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are 181 put in the `lib/easy.c` file. 182 183 `curl_global_init()` and `curl_global_cleanup()` should be called by the 184 application to initialize and clean up global stuff in the library. As of 185 today, it can handle the global SSL initing if SSL is enabled and it can init 186 the socket layer on windows machines. libcurl itself has no "global" scope. 187 188 All printf()-style functions use the supplied clones in `lib/mprintf.c`. This 189 makes sure we stay absolutely platform independent. 190 191 [ `curl_easy_init()`][2] allocates an internal struct and makes some 192 initializations. The returned handle does not reveal internals. This is the 193 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy` 194 functions. All connections performed will get connect-specific data allocated 195 that should be used for things related to particular connections/requests. 196 197 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must 198 be passed in pairs: the parameter-ID and the parameter-value. The list of 199 options is documented in the man page. This function mainly sets things in 200 the `Curl_easy` struct. 201 202 `curl_easy_perform()` is just a wrapper function that makes use of the multi 203 API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`, 204 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done 205 and then returns. 206 207 Some of the most important key functions in `url.c` are called from 208 `multi.c` when certain key steps are to be made in the transfer operation. 209 210<a name="Curl_connect"></a> 211Curl_connect() 212-------------- 213 214 Analyzes the URL, it separates the different components and connects to the 215 remote host. This may involve using a proxy and/or using SSL. The 216 `Curl_resolv()` function in `lib/hostip.c` is used for looking up host 217 names (it does then use the proper underlying method, which may vary 218 between platforms and builds). 219 220 When `Curl_connect` is done, we are connected to the remote site. Then it 221 is time to tell the server to get a document/file. `Curl_do()` arranges 222 this. 223 224 This function makes sure there's an allocated and initiated `connectdata` 225 struct that is used for this particular connection only (although there may 226 be several requests performed on the same connect). A bunch of things are 227 inited/inherited from the `Curl_easy` struct. 228 229<a name="multi_do"></a> 230multi_do() 231--------- 232 233 `multi_do()` makes sure the proper protocol-specific function is called. 234 The functions are named after the protocols they handle. 235 236 The protocol-specific functions of course deal with protocol-specific 237 negotiations and setup. They have access to the `Curl_sendf()` (from 238 `lib/sendf.c`) function to send printf-style formatted data to the remote 239 host and when they're ready to make the actual file transfer they call the 240 `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the 241 transfer and returns. 242 243 If this DO function fails and the connection is being re-used, libcurl will 244 then close this connection, setup a new connection and re-issue the DO 245 request on that. This is because there is no way to be perfectly sure that 246 we have discovered a dead connection before the DO function and thus we 247 might wrongly be re-using a connection that was closed by the remote peer. 248 249<a name="Curl_readwrite"></a> 250Curl_readwrite() 251---------------- 252 253 Called during the transfer of the actual protocol payload. 254 255 During transfer, the progress functions in `lib/progress.c` are called at 256 frequent intervals (or at the user's choice, a specified callback might get 257 called). The speedcheck functions in `lib/speedcheck.c` are also used to 258 verify that the transfer is as fast as required. 259 260<a name="multi_done"></a> 261multi_done() 262----------- 263 264 Called after a transfer is done. This function takes care of everything 265 that has to be done after a transfer. This function attempts to leave 266 matters in a state so that `multi_do()` should be possible to call again on 267 the same connection (in a persistent connection case). It might also soon 268 be closed with `Curl_disconnect()`. 269 270<a name="Curl_disconnect"></a> 271Curl_disconnect() 272----------------- 273 274 When doing normal connections and transfers, no one ever tries to close any 275 connections so this is not normally called when `curl_easy_perform()` is 276 used. This function is only used when we are certain that no more transfers 277 are going to be made on the connection. It can be also closed by force, or 278 it can be called to make sure that libcurl doesn't keep too many 279 connections alive at the same time. 280 281 This function cleans up all resources that are associated with a single 282 connection. 283 284<a name="http"></a> 285HTTP(S) 286======= 287 288 HTTP offers a lot and is the protocol in curl that uses the most lines of 289 code. There is a special file `lib/formdata.c` that offers all the 290 multipart post functions. 291 292 base64-functions for user+password stuff (and more) is in `lib/base64.c` 293 and all functions for parsing and sending cookies are found in 294 `lib/cookie.c`. 295 296 HTTPS uses in almost every case the same procedure as HTTP, with only two 297 exceptions: the connect procedure is different and the function used to read 298 or write from the socket is different, although the latter fact is hidden in 299 the source by the use of `Curl_read()` for reading and `Curl_write()` for 300 writing data to the remote server. 301 302 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer 303 encoding. 304 305 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()` 306 series of functions we use. They append data to one single buffer, and when 307 the building is finished the entire request is sent off in one single write. 308 This is done this way to overcome problems with flawed firewalls and lame 309 servers. 310 311<a name="ftp"></a> 312FTP 313=== 314 315 The `Curl_if2ip()` function can be used for getting the IP number of a 316 specified network interface, and it resides in `lib/if2ip.c`. 317 318 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It 319 was made a separate function to prevent us programmers from forgetting that 320 they must be CRLF terminated. They must also be sent in one single `write()` 321 to make firewalls and similar happy. 322 323<a name="kerberos"></a> 324Kerberos 325======== 326 327 Kerberos support is mainly in `lib/krb5.c` and `lib/security.c` but also 328 `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and 329 `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics. 330 331<a name="telnet"></a> 332TELNET 333====== 334 335 Telnet is implemented in `lib/telnet.c`. 336 337<a name="file"></a> 338FILE 339==== 340 341 The `file://` protocol is dealt with in `lib/file.c`. 342 343<a name="smb"></a> 344SMB 345=== 346 347 The `smb://` protocol is dealt with in `lib/smb.c`. 348 349<a name="ldap"></a> 350LDAP 351==== 352 353 Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`. 354 355<a name="email"></a> 356E-mail 357====== 358 359 The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and 360 `lib/smtp.c`. 361 362<a name="general"></a> 363General 364======= 365 366 URL encoding and decoding, called escaping and unescaping in the source code, 367 is found in `lib/escape.c`. 368 369 While transferring data in `Transfer()` a few functions might get used. 370 `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and 371 more). 372 373 `lib/getenv.c` offers `curl_getenv()` which is for reading environment 374 variables in a neat platform independent way. That's used in the client, but 375 also in `lib/url.c` when checking the proxy environment variables. Note that 376 contrary to the normal unix `getenv()`, this returns an allocated buffer that 377 must be `free()`ed after use. 378 379 `lib/netrc.c` holds the `.netrc` parser. 380 381 `lib/timeval.c` features replacement functions for systems that don't have 382 `gettimeofday()` and a few support functions for timeval conversions. 383 384 A function named `curl_version()` that returns the full curl version string 385 is found in `lib/version.c`. 386 387<a name="persistent"></a> 388Persistent Connections 389====================== 390 391 The persistent connection support in libcurl requires some considerations on 392 how to do things inside of the library. 393 394 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call 395 must never hold connection-oriented data. It is meant to hold the root data 396 as well as all the options etc that the library-user may choose. 397 398 - The `Curl_easy` struct holds the "connection cache" (an array of 399 pointers to `connectdata` structs). 400 401 - This enables the 'curl handle' to be reused on subsequent transfers. 402 403 - When libcurl is told to perform a transfer, it first checks for an already 404 existing connection in the cache that we can use. Otherwise it creates a 405 new one and adds that to the cache. If the cache is full already when a new 406 connection is added, it will first close the oldest unused one. 407 408 - When the transfer operation is complete, the connection is left 409 open. Particular options may tell libcurl not to, and protocols may signal 410 closure on connections and then they won't be kept open, of course. 411 412 - When `curl_easy_cleanup()` is called, we close all still opened connections, 413 unless of course the multi interface "owns" the connections. 414 415 The curl handle must be re-used in order for the persistent connections to 416 work. 417 418<a name="multi"></a> 419multi interface/non-blocking 420============================ 421 422 The multi interface is a non-blocking interface to the library. To make that 423 interface work as well as possible, no low-level functions within libcurl 424 must be written to work in a blocking manner. (There are still a few spots 425 violating this rule.) 426 427 One of the primary reasons we introduced c-ares support was to allow the name 428 resolve phase to be perfectly non-blocking as well. 429 430 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust 431 the code to allow non-blocking operations even on multi-stage command- 432 response protocols. They are built around state machines that return when 433 they would otherwise block waiting for data. The DICT, LDAP and TELNET 434 protocols are crappy examples and they are subject for rewrite in the future 435 to better fit the libcurl protocol family. 436 437<a name="ssl"></a> 438SSL libraries 439============= 440 441 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then 442 extended to its successor OpenSSL but has since also been extended to several 443 other SSL/TLS libraries and we expect and hope to further extend the support 444 in future libcurl versions. 445 446 To deal with this internally in the best way possible, we have a generic SSL 447 function API as provided by the `vtls/vtls.[ch]` system, and they are the only 448 SSL functions we must use from within libcurl. vtls is then crafted to use 449 the appropriate lower-level function calls to whatever SSL library that is in 450 use. For example `vtls/openssl.[ch]` for the OpenSSL library. 451 452<a name="symbols"></a> 453Library Symbols 454=============== 455 456 All symbols used internally in libcurl must use a `Curl_` prefix if they're 457 used in more than a single file. Single-file symbols must be made static. 458 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions, 459 but they are to be changed to follow this pattern in future versions.) Public 460 API functions are marked with `CURL_EXTERN` in the public header files so 461 that all others can be hidden on platforms where this is possible. 462 463<a name="returncodes"></a> 464Return Codes and Informationals 465=============================== 466 467 I've made things simple. Almost every function in libcurl returns a CURLcode, 468 that must be `CURLE_OK` if everything is OK or otherwise a suitable error 469 code as the `curl/curl.h` include file defines. The very spot that detects an 470 error must use the `Curl_failf()` function to set the human-readable error 471 description. 472 473 In aiding the user to understand what's happening and to debug curl usage, we 474 must supply a fair number of informational messages by using the 475 `Curl_infof()` function. Those messages are only displayed when the user 476 explicitly asks for them. They are best used when revealing information that 477 isn't otherwise obvious. 478 479<a name="abi"></a> 480API/ABI 481======= 482 483 We make an effort to not export or show internals or how internals work, as 484 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI 485 for our promise to users. 486 487<a name="client"></a> 488Client 489====== 490 491 `main()` resides in `src/tool_main.c`. 492 493 `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl 494 script to display the complete "manual" and the `src/tool_urlglob.c` file 495 holds the functions used for the URL-"globbing" support. Globbing in the 496 sense that the `{}` and `[]` expansion stuff is there. 497 498 The client mostly sets up its `config` struct properly, then 499 it calls the `curl_easy_*()` functions of the library and when it gets back 500 control after the `curl_easy_perform()` it cleans up the library, checks 501 status and exits. 502 503 When the operation is done, the `ourWriteOut()` function in `src/writeout.c` 504 may be called to report about the operation. That function is using the 505 `curl_easy_getinfo()` function to extract useful information from the curl 506 session. 507 508 It may loop and do all this several times if many URLs were specified on the 509 command line or config file. 510 511<a name="memorydebug"></a> 512Memory Debugging 513================ 514 515 The file `lib/memdebug.c` contains debug-versions of a few functions. 516 Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that 517 somehow deal with resources that might give us problems if we "leak" them. 518 The functions in the memdebug system do nothing fancy, they do their normal 519 function and then log information about what they just did. The logged data 520 can then be analyzed after a complete session, 521 522 `memanalyze.pl` is the perl script present in `tests/` that analyzes a log 523 file generated by the memory tracking system. It detects if resources are 524 allocated but never freed and other kinds of errors related to resource 525 management. 526 527 Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code 528 which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is 529 used to differentiate code which is _only_ used for memory 530 tracking/debugging. 531 532 Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also 533 switched on by running configure with `--enable-curldebug`. Use 534 `-DDEBUGBUILD` when compiling to enable a debug build or run configure with 535 `--enable-debug`. 536 537 `curl --version` will list 'Debug' feature for debug enabled builds, and 538 will list 'TrackMemory' feature for curl debug memory tracking capable 539 builds. These features are independent and can be controlled when running 540 the configure script. When `--enable-debug` is given both features will be 541 enabled, unless some restriction prevents memory tracking from being used. 542 543<a name="test"></a> 544Test Suite 545========== 546 547 The test suite is placed in its own subdirectory directly off the root in the 548 curl archive tree, and it contains a bunch of scripts and a lot of test case 549 data. 550 551 The main test script is `runtests.pl` that will invoke test servers like 552 `httpserver.pl` and `ftpserver.pl` before all the test cases are performed. 553 The test suite currently only runs on Unix-like platforms. 554 555 You'll find a description of the test suite in the `tests/README` file, and 556 the test case data files in the `tests/FILEFORMAT` file. 557 558 The test suite automatically detects if curl was built with the memory 559 debugging enabled, and if it was, it will detect memory leaks, too. 560 561<a name="asyncdns"></a> 562Asynchronous name resolves 563========================== 564 565 libcurl can be built to do name resolves asynchronously, using either the 566 normal resolver in a threaded manner or by using c-ares. 567 568<a name="cares"></a> 569[c-ares][3] 570------ 571 572### Build libcurl to use a c-ares 573 5741. ./configure --enable-ares=/path/to/ares/install 5752. make 576 577### c-ares on win32 578 579 First I compiled c-ares. I changed the default C runtime library to be the 580 single-threaded rather than the multi-threaded (this seems to be required to 581 prevent linking errors later on). Then I simply build the areslib project 582 (the other projects adig/ahost seem to fail under MSVC). 583 584 Next was libcurl. I opened `lib/config-win32.h` and I added a: 585 `#define USE_ARES 1` 586 587 Next thing I did was I added the path for the ares includes to the include 588 path, and the libares.lib to the libraries. 589 590 Lastly, I also changed libcurl to be single-threaded rather than 591 multi-threaded, again this was to prevent some duplicate symbol errors. I'm 592 not sure why I needed to change everything to single-threaded, but when I 593 didn't I got redefinition errors for several CRT functions (`malloc()`, 594 `stricmp()`, etc.) 595 596<a name="curl_off_t"></a> 597`curl_off_t` 598========== 599 600 `curl_off_t` is a data type provided by the external libcurl include 601 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1] 602 options that end with LARGE. The type is 64-bit large on most modern 603 platforms. 604 605<a name="curlx"></a> 606curlx 607===== 608 609 The libcurl source code offers a few functions by source only. They are not 610 part of the official libcurl API, but the source files might be useful for 611 others so apps can optionally compile/build with these sources to gain 612 additional functions. 613 614 We provide them through a single header file for easy access for apps: 615 `curlx.h` 616 617`curlx_strtoofft()` 618------------------- 619 A macro that converts a string containing a number to a `curl_off_t` number. 620 This might use the `curlx_strtoll()` function which is provided as source 621 code in strtoofft.c. Note that the function is only provided if no 622 `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t` 623 is only a 32-bit number on your platform, this macro uses `strtol()`. 624 625Future 626------ 627 628 Several functions will be removed from the public `curl_` name space in a 629 future libcurl release. They will then only become available as `curlx_` 630 functions instead. To make the transition easier, we already today provide 631 these functions with the `curlx_` prefix to allow sources to be built 632 properly with the new function names. The concerned functions are: 633 634 - `curlx_getenv` 635 - `curlx_strequal` 636 - `curlx_strnequal` 637 - `curlx_mvsnprintf` 638 - `curlx_msnprintf` 639 - `curlx_maprintf` 640 - `curlx_mvaprintf` 641 - `curlx_msprintf` 642 - `curlx_mprintf` 643 - `curlx_mfprintf` 644 - `curlx_mvsprintf` 645 - `curlx_mvprintf` 646 - `curlx_mvfprintf` 647 648<a name="contentencoding"></a> 649Content Encoding 650================ 651 652## About content encodings 653 654 [HTTP/1.1][4] specifies that a client may request that a server encode its 655 response. This is usually used to compress a response using one (or more) 656 encodings from a set of commonly available compression techniques. These 657 schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and 658 `compress`. A client requests that the server perform an encoding by including 659 an `Accept-Encoding` header in the request document. The value of the header 660 should be one of the recognized tokens `deflate`, ... (there's a way to 661 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor 662 the client's encoding request. When a response is encoded, the server 663 includes a `Content-Encoding` header in the response. The value of the 664 `Content-Encoding` header indicates which encodings were used to encode the 665 data, in the order in which they were applied. 666 667 It's also possible for a client to attach priorities to different schemes so 668 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more 669 information on the `Accept-Encoding` header. See sec 670 [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding` 671 header. 672 673## Supported content encodings 674 675 The `deflate`, `gzip` and `br` content encodings are supported by libcurl. 676 Both regular and chunked transfers work fine. The zlib library is required 677 for the `deflate` and `gzip` encodings, while the brotli decoding library is 678 for the `br` encoding. 679 680## The libcurl interface 681 682 To cause libcurl to request a content encoding use: 683 684 [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string) 685 686 where string is the intended value of the `Accept-Encoding` header. 687 688 Currently, libcurl does support multiple encodings but only 689 understands how to process responses that use the `deflate`, `gzip` and/or 690 `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5] 691 that will work (besides `identity`, which does nothing) are `deflate`, 692 `gzip` and `br`. If a response is encoded using the `compress` or methods, 693 libcurl will return an error indicating that the response could 694 not be decoded. If `<string>` is NULL no `Accept-Encoding` header is 695 generated. If `<string>` is a zero-length string, then an `Accept-Encoding` 696 header containing all supported encodings will be generated. 697 698 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for 699 content to be automatically decoded. If it is not set and the server still 700 sends encoded content (despite not having been asked), the data is returned 701 in its raw form and the `Content-Encoding` type is not checked. 702 703## The curl interface 704 705 Use the [`--compressed`][6] option with curl to cause it to ask servers to 706 compress responses using any format supported by curl. 707 708<a name="hostip"></a> 709`hostip.c` explained 710==================== 711 712 The main compile-time defines to keep in mind when reading the `host*.c` 713 source file are these: 714 715## `CURLRES_IPV6` 716 717 this host has `getaddrinfo()` and family, and thus we use that. The host may 718 not be able to resolve IPv6, but we don't really have to take that into 719 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined. 720 721## `CURLRES_ARES` 722 723 is defined if libcurl is built to use c-ares for asynchronous name 724 resolves. This can be Windows or \*nix. 725 726## `CURLRES_THREADED` 727 728 is defined if libcurl is built to use threading for asynchronous name 729 resolves. The name resolve will be done in a new thread, and the supported 730 asynch API will be the same as for ares-builds. This is the default under 731 (native) Windows. 732 733 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If 734 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is 735 defined. 736 737## `host*.c` sources 738 739 The `host*.c` sources files are split up like this: 740 741 - `hostip.c` - method-independent resolver functions and utility functions 742 - `hostasyn.c` - functions for asynchronous name resolves 743 - `hostsyn.c` - functions for synchronous name resolves 744 - `asyn-ares.c` - functions for asynchronous name resolves using c-ares 745 - `asyn-thread.c` - functions for asynchronous name resolves using threads 746 - `hostip4.c` - IPv4 specific functions 747 - `hostip6.c` - IPv6 specific functions 748 749 The `hostip.h` is the single united header file for all this. It defines the 750 `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines. 751 752<a name="memoryleak"></a> 753Track Down Memory Leaks 754======================= 755 756## Single-threaded 757 758 Please note that this memory leak system is not adjusted to work in more 759 than one thread. If you want/need to use it in a multi-threaded app. Please 760 adjust accordingly. 761 762## Build 763 764 Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with 765 `--enable-debug` fixes this). `make clean` first, then `make` so that all 766 files are actually rebuilt properly. It will also make sense to build 767 libcurl with the debug option (usually `-g` to the compiler) so that 768 debugging it will be easier if you actually do find a leak in the library. 769 770 This will create a library that has memory debugging enabled. 771 772## Modify Your Application 773 774 Add a line in your application code: 775 776 `curl_dbg_memdebug("dump");` 777 778 This will make the malloc debug system output a full trace of all resource 779 using functions to the given file name. Make sure you rebuild your program 780 and that you link with the same libcurl you built for this purpose as 781 described above. 782 783## Run Your Application 784 785 Run your program as usual. Watch the specified memory trace file grow. 786 787 Make your program exit and use the proper libcurl cleanup functions etc. So 788 that all non-leaks are returned/freed properly. 789 790## Analyze the Flow 791 792 Use the `tests/memanalyze.pl` perl script to analyze the dump file: 793 794 tests/memanalyze.pl dump 795 796 This now outputs a report on what resources that were allocated but never 797 freed etc. This report is very fine for posting to the list! 798 799 If this doesn't produce any output, no leak was detected in libcurl. Then 800 the leak is mostly likely to be in your code. 801 802<a name="multi_socket"></a> 803`multi_socket` 804============== 805 806 Implementation of the `curl_multi_socket` API 807 808 The main ideas of this API are simply: 809 810 1. The application can use whatever event system it likes as it gets info 811 from libcurl about what file descriptors libcurl waits for what action 812 on. (The previous API returns `fd_sets` which is very 813 `select()`-centric). 814 815 2. When the application discovers action on a single socket, it calls 816 libcurl and informs that there was action on this particular socket and 817 libcurl can then act on that socket/transfer only and not care about 818 any other transfers. (The previous API always had to scan through all 819 the existing transfers.) 820 821 The idea is that [`curl_multi_socket_action()`][7] calls a given callback 822 with information about what socket to wait for what action on, and the 823 callback only gets called if the status of that socket has changed. 824 825 We also added a timer callback that makes libcurl call the application when 826 the timeout value changes, and you set that with [`curl_multi_setopt()`][9] 827 and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work, 828 Internally, there's an added struct to each easy handle in which we store 829 an "expire time" (if any). The structs are then "splay sorted" so that we 830 can add and remove times from the linked list and yet somewhat swiftly 831 figure out both how long there is until the next nearest timer expires 832 and which timer (handle) we should take care of now. Of course, the upside 833 of all this is that we get a [`curl_multi_timeout()`][8] that should also 834 work with old-style applications that use [`curl_multi_perform()`][11]. 835 836 We created an internal "socket to easy handles" hash table that given 837 a socket (file descriptor) returns the easy handle that waits for action on 838 that socket. This hash is made using the already existing hash code 839 (previously only used for the DNS cache). 840 841 To make libcurl able to report plain sockets in the socket callback, we had 842 to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that 843 the conversion from sockets to `fd_sets` for that function is only done in 844 the last step before the data is returned. I also had to extend c-ares to 845 get a function that can return plain sockets, as that library too returned 846 only `fd_sets` and that is no longer good enough. The changes done to c-ares 847 are available in c-ares 1.3.1 and later. 848 849<a name="structs"></a> 850Structs in libcurl 851================== 852 853This section should cover 7.32.0 pretty accurately, but will make sense even 854for older and later versions as things don't change drastically that often. 855 856<a name="Curl_easy"></a> 857## Curl_easy 858 859 The `Curl_easy` struct is the one returned to the outside in the external API 860 as a `CURL *`. This is usually known as an easy handle in API documentations 861 and examples. 862 863 Information and state that is related to the actual connection is in the 864 `connectdata` struct. When a transfer is about to be made, libcurl will 865 either create a new connection or re-use an existing one. The particular 866 connectdata that is used by this handle is pointed out by 867 `Curl_easy->easy_conn`. 868 869 Data and information that regard this particular single transfer is put in 870 the `SingleRequest` sub-struct. 871 872 When the `Curl_easy` struct is added to a multi handle, as it must be in 873 order to do any transfer, the `->multi` member will point to the `Curl_multi` 874 struct it belongs to. The `->prev` and `->next` members will then be used by 875 the multi code to keep a linked list of `Curl_easy` structs that are added to 876 that same multi handle. libcurl always uses multi so `->multi` *will* point 877 to a `Curl_multi` when a transfer is in progress. 878 879 `->mstate` is the multi state of this particular `Curl_easy`. When 880 `multi_runsingle()` is called, it will act on this handle according to which 881 state it is in. The mstate is also what tells which sockets to return for a 882 specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc. 883 884 The libcurl source code generally use the name `data` for the variable that 885 points to the `Curl_easy`. 886 887 When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with 888 an individual stream, sharing the same connectdata struct. Multiplexing 889 makes it even more important to keep things associated with the right thing! 890 891<a name="connectdata"></a> 892## connectdata 893 894 A general idea in libcurl is to keep connections around in a connection 895 "cache" after they have been used in case they will be used again and then 896 re-use an existing one instead of creating a new as it creates a significant 897 performance boost. 898 899 Each `connectdata` identifies a single physical connection to a server. If 900 the connection can't be kept alive, the connection will be closed after use 901 and then this struct can be removed from the cache and freed. 902 903 Thus, the same `Curl_easy` can be used multiple times and each time select 904 another `connectdata` struct to use for the connection. Keep this in mind, 905 as it is then important to consider if options or choices are based on the 906 connection or the `Curl_easy`. 907 908 Functions in libcurl will assume that `connectdata->data` points to the 909 `Curl_easy` that uses this connection (for the moment). 910 911 As a special complexity, some protocols supported by libcurl require a 912 special disconnect procedure that is more than just shutting down the 913 socket. It can involve sending one or more commands to the server before 914 doing so. Since connections are kept in the connection cache after use, the 915 original `Curl_easy` may no longer be around when the time comes to shut down 916 a particular connection. For this purpose, libcurl holds a special dummy 917 `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed. 918 919 FTP uses two TCP connections for a typical transfer but it keeps both in 920 this single struct and thus can be considered a single connection for most 921 internal concerns. 922 923 The libcurl source code generally use the name `conn` for the variable that 924 points to the connectdata. 925 926<a name="Curl_multi"></a> 927## Curl_multi 928 929 Internally, the easy interface is implemented as a wrapper around multi 930 interface functions. This makes everything multi interface. 931 932 `Curl_multi` is the multi handle struct exposed as `CURLM *` in external 933 APIs. 934 935 This struct holds a list of `Curl_easy` structs that have been added to this 936 handle with [`curl_multi_add_handle()`][13]. The start of the list is 937 `->easyp` and `->num_easy` is a counter of added `Curl_easy`s. 938 939 `->msglist` is a linked list of messages to send back when 940 [`curl_multi_info_read()`][14] is called. Basically a node is added to that 941 list when an individual `Curl_easy`'s transfer has completed. 942 943 `->hostcache` points to the name cache. It is a hash table for looking up 944 name to IP. The nodes have a limited life time in there and this cache is 945 meant to reduce the time for when the same name is wanted within a short 946 period of time. 947 948 `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time 949 until it should be checked - normally some sort of timeout. Each `Curl_easy` 950 has one node in the tree. 951 952 `->sockhash` is a hash table to allow fast lookups of socket descriptor for 953 which `Curl_easy` uses that descriptor. This is necessary for the 954 `multi_socket` API. 955 956 `->conn_cache` points to the connection cache. It keeps track of all 957 connections that are kept after use. The cache has a maximum size. 958 959 `->closure_handle` is described in the `connectdata` section. 960 961 The libcurl source code generally use the name `multi` for the variable that 962 points to the `Curl_multi` struct. 963 964<a name="Curl_handler"></a> 965## Curl_handler 966 967 Each unique protocol that is supported by libcurl needs to provide at least 968 one `Curl_handler` struct. It defines what the protocol is called and what 969 functions the main code should call to deal with protocol specific issues. 970 In general, there's a source file named `[protocol].c` in which there's a 971 `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's 972 then the main array with all individual `Curl_handler` structs pointed to 973 from a single array which is scanned through when a URL is given to libcurl 974 to work with. 975 976 `->scheme` is the URL scheme name, usually spelled out in uppercase. That's 977 "HTTP" or "FTP" etc. SSL versions of the protocol need their own 978 `Curl_handler` setup so HTTPS separate from HTTP. 979 980 `->setup_connection` is called to allow the protocol code to allocate 981 protocol specific data that then gets associated with that `Curl_easy` for 982 the rest of this transfer. It gets freed again at the end of the transfer. 983 It will be called before the `connectdata` for the transfer has been 984 selected/created. Most protocols will allocate its private 985 `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it. 986 987 `->connect_it` allows a protocol to do some specific actions after the TCP 988 connect is done, that can still be considered part of the connection phase. 989 990 Some protocols will alter the `connectdata->recv[]` and 991 `connectdata->send[]` function pointers in this function. 992 993 `->connecting` is similarly a function that keeps getting called as long as 994 the protocol considers itself still in the connecting phase. 995 996 `->do_it` is the function called to issue the transfer request. What we call 997 the DO action internally. If the DO is not enough and things need to be kept 998 getting done for the entire DO sequence to complete, `->doing` is then 999 usually also provided. Each protocol that needs to do multiple commands or 1000 similar for do/doing need to implement their own state machines (see SCP, 1001 SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has 1002 a separate piece of the DO state called `DO_MORE`. 1003 1004 `->doing` keeps getting called while issuing the transfer request command(s) 1005 1006 `->done` gets called when the transfer is complete and DONE. That's after the 1007 main data has been transferred. 1008 1009 `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses 1010 this state when setting up the second connection. 1011 1012 `->proto_getsock` 1013 `->doing_getsock` 1014 `->domore_getsock` 1015 `->perform_getsock` 1016 Functions that return socket information. Which socket(s) to wait for which 1017 action(s) during the particular multi state. 1018 1019 `->disconnect` is called immediately before the TCP connection is shutdown. 1020 1021 `->readwrite` gets called during transfer to allow the protocol to do extra 1022 reads/writes 1023 1024 `->defport` is the default report TCP or UDP port this protocol uses 1025 1026 `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions 1027 have their "base" protocol set and then the SSL variation. Like 1028 "HTTP|HTTPS". 1029 1030 `->flags` is a bitmask with additional information about the protocol that will 1031 make it get treated differently by the generic engine: 1032 1033 - `PROTOPT_SSL` - will make it connect and negotiate SSL 1034 1035 - `PROTOPT_DUAL` - this protocol uses two connections 1036 1037 - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the 1038 connection. This flag is no longer used by code, yet still set for a bunch 1039 of protocol handlers. 1040 1041 - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to 1042 limit which "direction" of socket actions that the main engine will 1043 concern itself with. 1044 1045 - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`) 1046 1047 - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default 1048 one unless one is provided 1049 1050 - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL 1051 (?foo=bar) 1052 1053<a name="conncache"></a> 1054## conncache 1055 1056 Is a hash table with connections for later re-use. Each `Curl_easy` has a 1057 pointer to its connection cache. Each multi handle sets up a connection 1058 cache that all added `Curl_easy`s share by default. 1059 1060<a name="Curl_share"></a> 1061## Curl_share 1062 1063 The libcurl share API allocates a `Curl_share` struct, exposed to the 1064 external API as `CURLSH *`. 1065 1066 The idea is that the struct can have a set of its own versions of caches and 1067 pools and then by providing this struct in the `CURLOPT_SHARE` option, those 1068 specific `Curl_easy`s will use the caches/pools that this share handle 1069 holds. 1070 1071 Then individual `Curl_easy` structs can be made to share specific things 1072 that they otherwise wouldn't, such as cookies. 1073 1074 The `Curl_share` struct can currently hold cookies, DNS cache and the SSL 1075 session cache. 1076 1077<a name="CookieInfo"></a> 1078## CookieInfo 1079 1080 This is the main cookie struct. It holds all known cookies and related 1081 information. Each `Curl_easy` has its own private `CookieInfo` even when 1082 they are added to a multi handle. They can be made to share cookies by using 1083 the share API. 1084 1085 1086[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html 1087[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html 1088[3]: https://c-ares.haxx.se/ 1089[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230" 1090[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html 1091[6]: https://curl.haxx.se/docs/manpage.html#--compressed 1092[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html 1093[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html 1094[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html 1095[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html 1096[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html 1097[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html 1098[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html 1099[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html 1100[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2 1101