1curl internals 2============== 3 4 - [Intro](#intro) 5 - [git](#git) 6 - [Portability](#Portability) 7 - [Windows vs Unix](#winvsunix) 8 - [Library](#Library) 9 - [`Curl_connect`](#Curl_connect) 10 - [`multi_do`](#multi_do) 11 - [`Curl_readwrite`](#Curl_readwrite) 12 - [`multi_done`](#multi_done) 13 - [`Curl_disconnect`](#Curl_disconnect) 14 - [HTTP(S)](#http) 15 - [FTP](#ftp) 16 - [Kerberos](#kerberos) 17 - [TELNET](#telnet) 18 - [FILE](#file) 19 - [SMB](#smb) 20 - [LDAP](#ldap) 21 - [E-mail](#email) 22 - [General](#general) 23 - [Persistent Connections](#persistent) 24 - [multi interface/non-blocking](#multi) 25 - [SSL libraries](#ssl) 26 - [Library Symbols](#symbols) 27 - [Return Codes and Informationals](#returncodes) 28 - [AP/ABI](#abi) 29 - [Client](#client) 30 - [Memory Debugging](#memorydebug) 31 - [Test Suite](#test) 32 - [Asynchronous name resolves](#asyncdns) 33 - [c-ares](#cares) 34 - [`curl_off_t`](#curl_off_t) 35 - [curlx](#curlx) 36 - [Content Encoding](#contentencoding) 37 - [`hostip.c` explained](#hostip) 38 - [Track Down Memory Leaks](#memoryleak) 39 - [`multi_socket`](#multi_socket) 40 - [Structs in libcurl](#structs) 41 - [Curl_easy](#Curl_easy) 42 - [connectdata](#connectdata) 43 - [Curl_multi](#Curl_multi) 44 - [Curl_handler](#Curl_handler) 45 - [conncache](#conncache) 46 - [Curl_share](#Curl_share) 47 - [CookieInfo](#CookieInfo) 48 49<a name="intro"></a> 50Intro 51===== 52 53 This project is split in two. The library and the client. The client part 54 uses the library, but the library is designed to allow other applications to 55 use it. 56 57 The largest amount of code and complexity is in the library part. 58 59 60<a name="git"></a> 61git 62=== 63 64 All changes to the sources are committed to the git repository as soon as 65 they're somewhat verified to work. Changes shall be committed as independently 66 as possible so that individual changes can be easily spotted and tracked 67 afterwards. 68 69 Tagging shall be used extensively, and by the time we release new archives we 70 should tag the sources with a name similar to the released version number. 71 72<a name="Portability"></a> 73Portability 74=========== 75 76 We write curl and libcurl to compile with C89 compilers. On 32-bit and up 77 machines. Most of libcurl assumes more or less POSIX compliance but that's 78 not a requirement. 79 80 We write libcurl to build and work with lots of third party tools, and we 81 want it to remain functional and buildable with these and later versions 82 (older versions may still work but is not what we work hard to maintain): 83 84Dependencies 85------------ 86 87 - OpenSSL 0.9.7 88 - GnuTLS 3.1.10 89 - zlib 1.1.4 90 - libssh2 0.16 91 - c-ares 1.6.0 92 - libidn2 2.0.0 93 - wolfSSL 2.0.0 94 - openldap 2.0 95 - MIT Kerberos 1.2.4 96 - GSKit V5R3M0 97 - NSS 3.14.x 98 - Heimdal ? 99 - nghttp2 1.12.0 100 - WinSock 2.2 (on Windows 95+ and Windows CE .NET 4.1+) 101 102Operating Systems 103----------------- 104 105 On systems where configure runs, we aim at working on them all - if they have 106 a suitable C compiler. On systems that don't run configure, we strive to keep 107 curl running correctly on: 108 109 - Windows 98 110 - AS/400 V5R3M0 111 - Symbian 9.1 112 - Windows CE ? 113 - TPF ? 114 115Build tools 116----------- 117 118 When writing code (mostly for generating stuff included in release tarballs) 119 we use a few "build tools" and we make sure that we remain functional with 120 these versions: 121 122 - GNU Libtool 1.4.2 123 - GNU Autoconf 2.57 124 - GNU Automake 1.7 125 - GNU M4 1.4 126 - perl 5.004 127 - roffit 0.5 128 - groff ? (any version that supports `groff -Tps -man [in] [out]`) 129 - ps2pdf (gs) ? 130 131<a name="winvsunix"></a> 132Windows vs Unix 133=============== 134 135 There are a few differences in how to program curl the Unix way compared to 136 the Windows way. Perhaps the four most notable details are: 137 138 1. Different function names for socket operations. 139 140 In curl, this is solved with defines and macros, so that the source looks 141 the same in all places except for the header file that defines them. The 142 macros in use are `sclose()`, `sread()` and `swrite()`. 143 144 2. Windows requires a couple of init calls for the socket stuff. 145 146 That's taken care of by the `curl_global_init()` call, but if other libs 147 also do it etc there might be reasons for applications to alter that 148 behaviour. 149 150 We require WinSock version 2.2 and load this version during global init. 151 152 3. The file descriptors for network communication and file operations are 153 not as easily interchangeable as in Unix. 154 155 We avoid this by not trying any funny tricks on file descriptors. 156 157 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus 158 destroying binary data, although you do want that conversion if it is 159 text coming through... (sigh) 160 161 We set stdout to binary under windows 162 163 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All 164 conditionals that deal with features *should* instead be in the format 165 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts, 166 we maintain a `curl_config-win32.h` file in lib directory that is supposed to 167 look exactly like a `curl_config.h` file would have looked like on a Windows 168 machine! 169 170 Generally speaking: always remember that this will be compiled on dozens of 171 operating systems. Don't walk on the edge! 172 173<a name="Library"></a> 174Library 175======= 176 177 (See [Structs in libcurl](#structs) for the separate section describing all 178 major internal structs and their purposes.) 179 180 There are plenty of entry points to the library, namely each publicly defined 181 function that libcurl offers to applications. All of those functions are 182 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are 183 put in the `lib/easy.c` file. 184 185 `curl_global_init()` and `curl_global_cleanup()` should be called by the 186 application to initialize and clean up global stuff in the library. As of 187 today, it can handle the global SSL initing if SSL is enabled and it can init 188 the socket layer on windows machines. libcurl itself has no "global" scope. 189 190 All printf()-style functions use the supplied clones in `lib/mprintf.c`. This 191 makes sure we stay absolutely platform independent. 192 193 [ `curl_easy_init()`][2] allocates an internal struct and makes some 194 initializations. The returned handle does not reveal internals. This is the 195 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy` 196 functions. All connections performed will get connect-specific data allocated 197 that should be used for things related to particular connections/requests. 198 199 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must 200 be passed in pairs: the parameter-ID and the parameter-value. The list of 201 options is documented in the man page. This function mainly sets things in 202 the `Curl_easy` struct. 203 204 `curl_easy_perform()` is just a wrapper function that makes use of the multi 205 API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`, 206 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done 207 and then returns. 208 209 Some of the most important key functions in `url.c` are called from 210 `multi.c` when certain key steps are to be made in the transfer operation. 211 212<a name="Curl_connect"></a> 213Curl_connect() 214-------------- 215 216 Analyzes the URL, it separates the different components and connects to the 217 remote host. This may involve using a proxy and/or using SSL. The 218 `Curl_resolv()` function in `lib/hostip.c` is used for looking up host 219 names (it does then use the proper underlying method, which may vary 220 between platforms and builds). 221 222 When `Curl_connect` is done, we are connected to the remote site. Then it 223 is time to tell the server to get a document/file. `Curl_do()` arranges 224 this. 225 226 This function makes sure there's an allocated and initiated `connectdata` 227 struct that is used for this particular connection only (although there may 228 be several requests performed on the same connect). A bunch of things are 229 inited/inherited from the `Curl_easy` struct. 230 231<a name="multi_do"></a> 232multi_do() 233--------- 234 235 `multi_do()` makes sure the proper protocol-specific function is called. 236 The functions are named after the protocols they handle. 237 238 The protocol-specific functions of course deal with protocol-specific 239 negotiations and setup. They have access to the `Curl_sendf()` (from 240 `lib/sendf.c`) function to send printf-style formatted data to the remote 241 host and when they're ready to make the actual file transfer they call the 242 `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the 243 transfer and returns. 244 245 If this DO function fails and the connection is being re-used, libcurl will 246 then close this connection, setup a new connection and re-issue the DO 247 request on that. This is because there is no way to be perfectly sure that 248 we have discovered a dead connection before the DO function and thus we 249 might wrongly be re-using a connection that was closed by the remote peer. 250 251<a name="Curl_readwrite"></a> 252Curl_readwrite() 253---------------- 254 255 Called during the transfer of the actual protocol payload. 256 257 During transfer, the progress functions in `lib/progress.c` are called at 258 frequent intervals (or at the user's choice, a specified callback might get 259 called). The speedcheck functions in `lib/speedcheck.c` are also used to 260 verify that the transfer is as fast as required. 261 262<a name="multi_done"></a> 263multi_done() 264----------- 265 266 Called after a transfer is done. This function takes care of everything 267 that has to be done after a transfer. This function attempts to leave 268 matters in a state so that `multi_do()` should be possible to call again on 269 the same connection (in a persistent connection case). It might also soon 270 be closed with `Curl_disconnect()`. 271 272<a name="Curl_disconnect"></a> 273Curl_disconnect() 274----------------- 275 276 When doing normal connections and transfers, no one ever tries to close any 277 connections so this is not normally called when `curl_easy_perform()` is 278 used. This function is only used when we are certain that no more transfers 279 are going to be made on the connection. It can be also closed by force, or 280 it can be called to make sure that libcurl doesn't keep too many 281 connections alive at the same time. 282 283 This function cleans up all resources that are associated with a single 284 connection. 285 286<a name="http"></a> 287HTTP(S) 288======= 289 290 HTTP offers a lot and is the protocol in curl that uses the most lines of 291 code. There is a special file `lib/formdata.c` that offers all the 292 multipart post functions. 293 294 base64-functions for user+password stuff (and more) is in `lib/base64.c` 295 and all functions for parsing and sending cookies are found in 296 `lib/cookie.c`. 297 298 HTTPS uses in almost every case the same procedure as HTTP, with only two 299 exceptions: the connect procedure is different and the function used to read 300 or write from the socket is different, although the latter fact is hidden in 301 the source by the use of `Curl_read()` for reading and `Curl_write()` for 302 writing data to the remote server. 303 304 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer 305 encoding. 306 307 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()` 308 series of functions we use. They append data to one single buffer, and when 309 the building is finished the entire request is sent off in one single write. 310 This is done this way to overcome problems with flawed firewalls and lame 311 servers. 312 313<a name="ftp"></a> 314FTP 315=== 316 317 The `Curl_if2ip()` function can be used for getting the IP number of a 318 specified network interface, and it resides in `lib/if2ip.c`. 319 320 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It 321 was made a separate function to prevent us programmers from forgetting that 322 they must be CRLF terminated. They must also be sent in one single `write()` 323 to make firewalls and similar happy. 324 325<a name="kerberos"></a> 326Kerberos 327======== 328 329 Kerberos support is mainly in `lib/krb5.c` but also `curl_sasl_sspi.c` and 330 `curl_sasl_gssapi.c` for the email protocols and `socks_gssapi.c` and 331 `socks_sspi.c` for SOCKS5 proxy specifics. 332 333<a name="telnet"></a> 334TELNET 335====== 336 337 Telnet is implemented in `lib/telnet.c`. 338 339<a name="file"></a> 340FILE 341==== 342 343 The `file://` protocol is dealt with in `lib/file.c`. 344 345<a name="smb"></a> 346SMB 347=== 348 349 The `smb://` protocol is dealt with in `lib/smb.c`. 350 351<a name="ldap"></a> 352LDAP 353==== 354 355 Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`. 356 357<a name="email"></a> 358E-mail 359====== 360 361 The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and 362 `lib/smtp.c`. 363 364<a name="general"></a> 365General 366======= 367 368 URL encoding and decoding, called escaping and unescaping in the source code, 369 is found in `lib/escape.c`. 370 371 While transferring data in `Transfer()` a few functions might get used. 372 `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and 373 more). 374 375 `lib/getenv.c` offers `curl_getenv()` which is for reading environment 376 variables in a neat platform independent way. That's used in the client, but 377 also in `lib/url.c` when checking the proxy environment variables. Note that 378 contrary to the normal unix `getenv()`, this returns an allocated buffer that 379 must be `free()`ed after use. 380 381 `lib/netrc.c` holds the `.netrc` parser. 382 383 `lib/timeval.c` features replacement functions for systems that don't have 384 `gettimeofday()` and a few support functions for timeval conversions. 385 386 A function named `curl_version()` that returns the full curl version string 387 is found in `lib/version.c`. 388 389<a name="persistent"></a> 390Persistent Connections 391====================== 392 393 The persistent connection support in libcurl requires some considerations on 394 how to do things inside of the library. 395 396 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call 397 must never hold connection-oriented data. It is meant to hold the root data 398 as well as all the options etc that the library-user may choose. 399 400 - The `Curl_easy` struct holds the "connection cache" (an array of 401 pointers to `connectdata` structs). 402 403 - This enables the 'curl handle' to be reused on subsequent transfers. 404 405 - When libcurl is told to perform a transfer, it first checks for an already 406 existing connection in the cache that we can use. Otherwise it creates a 407 new one and adds that to the cache. If the cache is full already when a new 408 connection is added, it will first close the oldest unused one. 409 410 - When the transfer operation is complete, the connection is left 411 open. Particular options may tell libcurl not to, and protocols may signal 412 closure on connections and then they won't be kept open, of course. 413 414 - When `curl_easy_cleanup()` is called, we close all still opened connections, 415 unless of course the multi interface "owns" the connections. 416 417 The curl handle must be re-used in order for the persistent connections to 418 work. 419 420<a name="multi"></a> 421multi interface/non-blocking 422============================ 423 424 The multi interface is a non-blocking interface to the library. To make that 425 interface work as well as possible, no low-level functions within libcurl 426 must be written to work in a blocking manner. (There are still a few spots 427 violating this rule.) 428 429 One of the primary reasons we introduced c-ares support was to allow the name 430 resolve phase to be perfectly non-blocking as well. 431 432 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust 433 the code to allow non-blocking operations even on multi-stage command- 434 response protocols. They are built around state machines that return when 435 they would otherwise block waiting for data. The DICT, LDAP and TELNET 436 protocols are crappy examples and they are subject for rewrite in the future 437 to better fit the libcurl protocol family. 438 439<a name="ssl"></a> 440SSL libraries 441============= 442 443 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then 444 extended to its successor OpenSSL but has since also been extended to several 445 other SSL/TLS libraries and we expect and hope to further extend the support 446 in future libcurl versions. 447 448 To deal with this internally in the best way possible, we have a generic SSL 449 function API as provided by the `vtls/vtls.[ch]` system, and they are the only 450 SSL functions we must use from within libcurl. vtls is then crafted to use 451 the appropriate lower-level function calls to whatever SSL library that is in 452 use. For example `vtls/openssl.[ch]` for the OpenSSL library. 453 454<a name="symbols"></a> 455Library Symbols 456=============== 457 458 All symbols used internally in libcurl must use a `Curl_` prefix if they're 459 used in more than a single file. Single-file symbols must be made static. 460 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions, 461 but they are to be changed to follow this pattern in future versions.) Public 462 API functions are marked with `CURL_EXTERN` in the public header files so 463 that all others can be hidden on platforms where this is possible. 464 465<a name="returncodes"></a> 466Return Codes and Informationals 467=============================== 468 469 I've made things simple. Almost every function in libcurl returns a CURLcode, 470 that must be `CURLE_OK` if everything is OK or otherwise a suitable error 471 code as the `curl/curl.h` include file defines. The very spot that detects an 472 error must use the `Curl_failf()` function to set the human-readable error 473 description. 474 475 In aiding the user to understand what's happening and to debug curl usage, we 476 must supply a fair number of informational messages by using the 477 `Curl_infof()` function. Those messages are only displayed when the user 478 explicitly asks for them. They are best used when revealing information that 479 isn't otherwise obvious. 480 481<a name="abi"></a> 482API/ABI 483======= 484 485 We make an effort to not export or show internals or how internals work, as 486 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI 487 for our promise to users. 488 489<a name="client"></a> 490Client 491====== 492 493 `main()` resides in `src/tool_main.c`. 494 495 `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl 496 script to display the complete "manual" and the `src/tool_urlglob.c` file 497 holds the functions used for the URL-"globbing" support. Globbing in the 498 sense that the `{}` and `[]` expansion stuff is there. 499 500 The client mostly sets up its `config` struct properly, then 501 it calls the `curl_easy_*()` functions of the library and when it gets back 502 control after the `curl_easy_perform()` it cleans up the library, checks 503 status and exits. 504 505 When the operation is done, the `ourWriteOut()` function in `src/writeout.c` 506 may be called to report about the operation. That function is mostly using the 507 `curl_easy_getinfo()` function to extract useful information from the curl 508 session. 509 510 It may loop and do all this several times if many URLs were specified on the 511 command line or config file. 512 513<a name="memorydebug"></a> 514Memory Debugging 515================ 516 517 The file `lib/memdebug.c` contains debug-versions of a few functions. 518 Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that 519 somehow deal with resources that might give us problems if we "leak" them. 520 The functions in the memdebug system do nothing fancy, they do their normal 521 function and then log information about what they just did. The logged data 522 can then be analyzed after a complete session, 523 524 `memanalyze.pl` is the perl script present in `tests/` that analyzes a log 525 file generated by the memory tracking system. It detects if resources are 526 allocated but never freed and other kinds of errors related to resource 527 management. 528 529 Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code 530 which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is 531 used to differentiate code which is _only_ used for memory 532 tracking/debugging. 533 534 Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also 535 switched on by running configure with `--enable-curldebug`. Use 536 `-DDEBUGBUILD` when compiling to enable a debug build or run configure with 537 `--enable-debug`. 538 539 `curl --version` will list 'Debug' feature for debug enabled builds, and 540 will list 'TrackMemory' feature for curl debug memory tracking capable 541 builds. These features are independent and can be controlled when running 542 the configure script. When `--enable-debug` is given both features will be 543 enabled, unless some restriction prevents memory tracking from being used. 544 545<a name="test"></a> 546Test Suite 547========== 548 549 The test suite is placed in its own subdirectory directly off the root in the 550 curl archive tree, and it contains a bunch of scripts and a lot of test case 551 data. 552 553 The main test script is `runtests.pl` that will invoke test servers like 554 `httpserver.pl` and `ftpserver.pl` before all the test cases are performed. 555 The test suite currently only runs on Unix-like platforms. 556 557 You'll find a description of the test suite in the `tests/README` file, and 558 the test case data files in the `tests/FILEFORMAT` file. 559 560 The test suite automatically detects if curl was built with the memory 561 debugging enabled, and if it was, it will detect memory leaks, too. 562 563<a name="asyncdns"></a> 564Asynchronous name resolves 565========================== 566 567 libcurl can be built to do name resolves asynchronously, using either the 568 normal resolver in a threaded manner or by using c-ares. 569 570<a name="cares"></a> 571[c-ares][3] 572------ 573 574### Build libcurl to use a c-ares 575 5761. ./configure --enable-ares=/path/to/ares/install 5772. make 578 579### c-ares on win32 580 581 First I compiled c-ares. I changed the default C runtime library to be the 582 single-threaded rather than the multi-threaded (this seems to be required to 583 prevent linking errors later on). Then I simply build the areslib project 584 (the other projects adig/ahost seem to fail under MSVC). 585 586 Next was libcurl. I opened `lib/config-win32.h` and I added a: 587 `#define USE_ARES 1` 588 589 Next thing I did was I added the path for the ares includes to the include 590 path, and the libares.lib to the libraries. 591 592 Lastly, I also changed libcurl to be single-threaded rather than 593 multi-threaded, again this was to prevent some duplicate symbol errors. I'm 594 not sure why I needed to change everything to single-threaded, but when I 595 didn't I got redefinition errors for several CRT functions (`malloc()`, 596 `stricmp()`, etc.) 597 598<a name="curl_off_t"></a> 599`curl_off_t` 600========== 601 602 `curl_off_t` is a data type provided by the external libcurl include 603 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1] 604 options that end with LARGE. The type is 64-bit large on most modern 605 platforms. 606 607<a name="curlx"></a> 608curlx 609===== 610 611 The libcurl source code offers a few functions by source only. They are not 612 part of the official libcurl API, but the source files might be useful for 613 others so apps can optionally compile/build with these sources to gain 614 additional functions. 615 616 We provide them through a single header file for easy access for apps: 617 `curlx.h` 618 619`curlx_strtoofft()` 620------------------- 621 A macro that converts a string containing a number to a `curl_off_t` number. 622 This might use the `curlx_strtoll()` function which is provided as source 623 code in strtoofft.c. Note that the function is only provided if no 624 `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t` 625 is only a 32-bit number on your platform, this macro uses `strtol()`. 626 627Future 628------ 629 630 Several functions will be removed from the public `curl_` name space in a 631 future libcurl release. They will then only become available as `curlx_` 632 functions instead. To make the transition easier, we already today provide 633 these functions with the `curlx_` prefix to allow sources to be built 634 properly with the new function names. The concerned functions are: 635 636 - `curlx_getenv` 637 - `curlx_strequal` 638 - `curlx_strnequal` 639 - `curlx_mvsnprintf` 640 - `curlx_msnprintf` 641 - `curlx_maprintf` 642 - `curlx_mvaprintf` 643 - `curlx_msprintf` 644 - `curlx_mprintf` 645 - `curlx_mfprintf` 646 - `curlx_mvsprintf` 647 - `curlx_mvprintf` 648 - `curlx_mvfprintf` 649 650<a name="contentencoding"></a> 651Content Encoding 652================ 653 654## About content encodings 655 656 [HTTP/1.1][4] specifies that a client may request that a server encode its 657 response. This is usually used to compress a response using one (or more) 658 encodings from a set of commonly available compression techniques. These 659 schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and 660 `compress`. A client requests that the server perform an encoding by including 661 an `Accept-Encoding` header in the request document. The value of the header 662 should be one of the recognized tokens `deflate`, ... (there's a way to 663 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor 664 the client's encoding request. When a response is encoded, the server 665 includes a `Content-Encoding` header in the response. The value of the 666 `Content-Encoding` header indicates which encodings were used to encode the 667 data, in the order in which they were applied. 668 669 It's also possible for a client to attach priorities to different schemes so 670 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more 671 information on the `Accept-Encoding` header. See sec 672 [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding` 673 header. 674 675## Supported content encodings 676 677 The `deflate`, `gzip` and `br` content encodings are supported by libcurl. 678 Both regular and chunked transfers work fine. The zlib library is required 679 for the `deflate` and `gzip` encodings, while the brotli decoding library is 680 for the `br` encoding. 681 682## The libcurl interface 683 684 To cause libcurl to request a content encoding use: 685 686 [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string) 687 688 where string is the intended value of the `Accept-Encoding` header. 689 690 Currently, libcurl does support multiple encodings but only 691 understands how to process responses that use the `deflate`, `gzip` and/or 692 `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5] 693 that will work (besides `identity`, which does nothing) are `deflate`, 694 `gzip` and `br`. If a response is encoded using the `compress` or methods, 695 libcurl will return an error indicating that the response could 696 not be decoded. If `<string>` is NULL no `Accept-Encoding` header is 697 generated. If `<string>` is a zero-length string, then an `Accept-Encoding` 698 header containing all supported encodings will be generated. 699 700 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for 701 content to be automatically decoded. If it is not set and the server still 702 sends encoded content (despite not having been asked), the data is returned 703 in its raw form and the `Content-Encoding` type is not checked. 704 705## The curl interface 706 707 Use the [`--compressed`][6] option with curl to cause it to ask servers to 708 compress responses using any format supported by curl. 709 710<a name="hostip"></a> 711`hostip.c` explained 712==================== 713 714 The main compile-time defines to keep in mind when reading the `host*.c` 715 source file are these: 716 717## `CURLRES_IPV6` 718 719 this host has `getaddrinfo()` and family, and thus we use that. The host may 720 not be able to resolve IPv6, but we don't really have to take that into 721 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined. 722 723## `CURLRES_ARES` 724 725 is defined if libcurl is built to use c-ares for asynchronous name 726 resolves. This can be Windows or \*nix. 727 728## `CURLRES_THREADED` 729 730 is defined if libcurl is built to use threading for asynchronous name 731 resolves. The name resolve will be done in a new thread, and the supported 732 asynch API will be the same as for ares-builds. This is the default under 733 (native) Windows. 734 735 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If 736 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is 737 defined. 738 739## `host*.c` sources 740 741 The `host*.c` sources files are split up like this: 742 743 - `hostip.c` - method-independent resolver functions and utility functions 744 - `hostasyn.c` - functions for asynchronous name resolves 745 - `hostsyn.c` - functions for synchronous name resolves 746 - `asyn-ares.c` - functions for asynchronous name resolves using c-ares 747 - `asyn-thread.c` - functions for asynchronous name resolves using threads 748 - `hostip4.c` - IPv4 specific functions 749 - `hostip6.c` - IPv6 specific functions 750 751 The `hostip.h` is the single united header file for all this. It defines the 752 `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines. 753 754<a name="memoryleak"></a> 755Track Down Memory Leaks 756======================= 757 758## Single-threaded 759 760 Please note that this memory leak system is not adjusted to work in more 761 than one thread. If you want/need to use it in a multi-threaded app. Please 762 adjust accordingly. 763 764## Build 765 766 Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with 767 `--enable-debug` fixes this). `make clean` first, then `make` so that all 768 files are actually rebuilt properly. It will also make sense to build 769 libcurl with the debug option (usually `-g` to the compiler) so that 770 debugging it will be easier if you actually do find a leak in the library. 771 772 This will create a library that has memory debugging enabled. 773 774## Modify Your Application 775 776 Add a line in your application code: 777 778 `curl_dbg_memdebug("dump");` 779 780 This will make the malloc debug system output a full trace of all resource 781 using functions to the given file name. Make sure you rebuild your program 782 and that you link with the same libcurl you built for this purpose as 783 described above. 784 785## Run Your Application 786 787 Run your program as usual. Watch the specified memory trace file grow. 788 789 Make your program exit and use the proper libcurl cleanup functions etc. So 790 that all non-leaks are returned/freed properly. 791 792## Analyze the Flow 793 794 Use the `tests/memanalyze.pl` perl script to analyze the dump file: 795 796 tests/memanalyze.pl dump 797 798 This now outputs a report on what resources that were allocated but never 799 freed etc. This report is very fine for posting to the list! 800 801 If this doesn't produce any output, no leak was detected in libcurl. Then 802 the leak is mostly likely to be in your code. 803 804<a name="multi_socket"></a> 805`multi_socket` 806============== 807 808 Implementation of the `curl_multi_socket` API 809 810 The main ideas of this API are simply: 811 812 1. The application can use whatever event system it likes as it gets info 813 from libcurl about what file descriptors libcurl waits for what action 814 on. (The previous API returns `fd_sets` which is very 815 `select()`-centric). 816 817 2. When the application discovers action on a single socket, it calls 818 libcurl and informs that there was action on this particular socket and 819 libcurl can then act on that socket/transfer only and not care about 820 any other transfers. (The previous API always had to scan through all 821 the existing transfers.) 822 823 The idea is that [`curl_multi_socket_action()`][7] calls a given callback 824 with information about what socket to wait for what action on, and the 825 callback only gets called if the status of that socket has changed. 826 827 We also added a timer callback that makes libcurl call the application when 828 the timeout value changes, and you set that with [`curl_multi_setopt()`][9] 829 and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work, 830 Internally, there's an added struct to each easy handle in which we store 831 an "expire time" (if any). The structs are then "splay sorted" so that we 832 can add and remove times from the linked list and yet somewhat swiftly 833 figure out both how long there is until the next nearest timer expires 834 and which timer (handle) we should take care of now. Of course, the upside 835 of all this is that we get a [`curl_multi_timeout()`][8] that should also 836 work with old-style applications that use [`curl_multi_perform()`][11]. 837 838 We created an internal "socket to easy handles" hash table that given 839 a socket (file descriptor) returns the easy handle that waits for action on 840 that socket. This hash is made using the already existing hash code 841 (previously only used for the DNS cache). 842 843 To make libcurl able to report plain sockets in the socket callback, we had 844 to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that 845 the conversion from sockets to `fd_sets` for that function is only done in 846 the last step before the data is returned. I also had to extend c-ares to 847 get a function that can return plain sockets, as that library too returned 848 only `fd_sets` and that is no longer good enough. The changes done to c-ares 849 are available in c-ares 1.3.1 and later. 850 851<a name="structs"></a> 852Structs in libcurl 853================== 854 855This section should cover 7.32.0 pretty accurately, but will make sense even 856for older and later versions as things don't change drastically that often. 857 858<a name="Curl_easy"></a> 859## Curl_easy 860 861 The `Curl_easy` struct is the one returned to the outside in the external API 862 as a `CURL *`. This is usually known as an easy handle in API documentations 863 and examples. 864 865 Information and state that is related to the actual connection is in the 866 `connectdata` struct. When a transfer is about to be made, libcurl will 867 either create a new connection or re-use an existing one. The particular 868 connectdata that is used by this handle is pointed out by 869 `Curl_easy->easy_conn`. 870 871 Data and information that regard this particular single transfer is put in 872 the `SingleRequest` sub-struct. 873 874 When the `Curl_easy` struct is added to a multi handle, as it must be in 875 order to do any transfer, the `->multi` member will point to the `Curl_multi` 876 struct it belongs to. The `->prev` and `->next` members will then be used by 877 the multi code to keep a linked list of `Curl_easy` structs that are added to 878 that same multi handle. libcurl always uses multi so `->multi` *will* point 879 to a `Curl_multi` when a transfer is in progress. 880 881 `->mstate` is the multi state of this particular `Curl_easy`. When 882 `multi_runsingle()` is called, it will act on this handle according to which 883 state it is in. The mstate is also what tells which sockets to return for a 884 specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc. 885 886 The libcurl source code generally use the name `data` for the variable that 887 points to the `Curl_easy`. 888 889 When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with 890 an individual stream, sharing the same connectdata struct. Multiplexing 891 makes it even more important to keep things associated with the right thing! 892 893<a name="connectdata"></a> 894## connectdata 895 896 A general idea in libcurl is to keep connections around in a connection 897 "cache" after they have been used in case they will be used again and then 898 re-use an existing one instead of creating a new as it creates a significant 899 performance boost. 900 901 Each `connectdata` identifies a single physical connection to a server. If 902 the connection can't be kept alive, the connection will be closed after use 903 and then this struct can be removed from the cache and freed. 904 905 Thus, the same `Curl_easy` can be used multiple times and each time select 906 another `connectdata` struct to use for the connection. Keep this in mind, 907 as it is then important to consider if options or choices are based on the 908 connection or the `Curl_easy`. 909 910 Functions in libcurl will assume that `connectdata->data` points to the 911 `Curl_easy` that uses this connection (for the moment). 912 913 As a special complexity, some protocols supported by libcurl require a 914 special disconnect procedure that is more than just shutting down the 915 socket. It can involve sending one or more commands to the server before 916 doing so. Since connections are kept in the connection cache after use, the 917 original `Curl_easy` may no longer be around when the time comes to shut down 918 a particular connection. For this purpose, libcurl holds a special dummy 919 `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed. 920 921 FTP uses two TCP connections for a typical transfer but it keeps both in 922 this single struct and thus can be considered a single connection for most 923 internal concerns. 924 925 The libcurl source code generally use the name `conn` for the variable that 926 points to the connectdata. 927 928<a name="Curl_multi"></a> 929## Curl_multi 930 931 Internally, the easy interface is implemented as a wrapper around multi 932 interface functions. This makes everything multi interface. 933 934 `Curl_multi` is the multi handle struct exposed as `CURLM *` in external 935 APIs. 936 937 This struct holds a list of `Curl_easy` structs that have been added to this 938 handle with [`curl_multi_add_handle()`][13]. The start of the list is 939 `->easyp` and `->num_easy` is a counter of added `Curl_easy`s. 940 941 `->msglist` is a linked list of messages to send back when 942 [`curl_multi_info_read()`][14] is called. Basically a node is added to that 943 list when an individual `Curl_easy`'s transfer has completed. 944 945 `->hostcache` points to the name cache. It is a hash table for looking up 946 name to IP. The nodes have a limited life time in there and this cache is 947 meant to reduce the time for when the same name is wanted within a short 948 period of time. 949 950 `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time 951 until it should be checked - normally some sort of timeout. Each `Curl_easy` 952 has one node in the tree. 953 954 `->sockhash` is a hash table to allow fast lookups of socket descriptor for 955 which `Curl_easy` uses that descriptor. This is necessary for the 956 `multi_socket` API. 957 958 `->conn_cache` points to the connection cache. It keeps track of all 959 connections that are kept after use. The cache has a maximum size. 960 961 `->closure_handle` is described in the `connectdata` section. 962 963 The libcurl source code generally use the name `multi` for the variable that 964 points to the `Curl_multi` struct. 965 966<a name="Curl_handler"></a> 967## Curl_handler 968 969 Each unique protocol that is supported by libcurl needs to provide at least 970 one `Curl_handler` struct. It defines what the protocol is called and what 971 functions the main code should call to deal with protocol specific issues. 972 In general, there's a source file named `[protocol].c` in which there's a 973 `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's 974 then the main array with all individual `Curl_handler` structs pointed to 975 from a single array which is scanned through when a URL is given to libcurl 976 to work with. 977 978 `->scheme` is the URL scheme name, usually spelled out in uppercase. That's 979 "HTTP" or "FTP" etc. SSL versions of the protocol need their own 980 `Curl_handler` setup so HTTPS separate from HTTP. 981 982 `->setup_connection` is called to allow the protocol code to allocate 983 protocol specific data that then gets associated with that `Curl_easy` for 984 the rest of this transfer. It gets freed again at the end of the transfer. 985 It will be called before the `connectdata` for the transfer has been 986 selected/created. Most protocols will allocate its private 987 `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it. 988 989 `->connect_it` allows a protocol to do some specific actions after the TCP 990 connect is done, that can still be considered part of the connection phase. 991 992 Some protocols will alter the `connectdata->recv[]` and 993 `connectdata->send[]` function pointers in this function. 994 995 `->connecting` is similarly a function that keeps getting called as long as 996 the protocol considers itself still in the connecting phase. 997 998 `->do_it` is the function called to issue the transfer request. What we call 999 the DO action internally. If the DO is not enough and things need to be kept 1000 getting done for the entire DO sequence to complete, `->doing` is then 1001 usually also provided. Each protocol that needs to do multiple commands or 1002 similar for do/doing need to implement their own state machines (see SCP, 1003 SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has 1004 a separate piece of the DO state called `DO_MORE`. 1005 1006 `->doing` keeps getting called while issuing the transfer request command(s) 1007 1008 `->done` gets called when the transfer is complete and DONE. That's after the 1009 main data has been transferred. 1010 1011 `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses 1012 this state when setting up the second connection. 1013 1014 `->proto_getsock` 1015 `->doing_getsock` 1016 `->domore_getsock` 1017 `->perform_getsock` 1018 Functions that return socket information. Which socket(s) to wait for which 1019 action(s) during the particular multi state. 1020 1021 `->disconnect` is called immediately before the TCP connection is shutdown. 1022 1023 `->readwrite` gets called during transfer to allow the protocol to do extra 1024 reads/writes 1025 1026 `->defport` is the default report TCP or UDP port this protocol uses 1027 1028 `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions 1029 have their "base" protocol set and then the SSL variation. Like 1030 "HTTP|HTTPS". 1031 1032 `->flags` is a bitmask with additional information about the protocol that will 1033 make it get treated differently by the generic engine: 1034 1035 - `PROTOPT_SSL` - will make it connect and negotiate SSL 1036 1037 - `PROTOPT_DUAL` - this protocol uses two connections 1038 1039 - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the 1040 connection. This flag is no longer used by code, yet still set for a bunch 1041 of protocol handlers. 1042 1043 - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to 1044 limit which "direction" of socket actions that the main engine will 1045 concern itself with. 1046 1047 - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`) 1048 1049 - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default 1050 one unless one is provided 1051 1052 - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL 1053 (?foo=bar) 1054 1055<a name="conncache"></a> 1056## conncache 1057 1058 Is a hash table with connections for later re-use. Each `Curl_easy` has a 1059 pointer to its connection cache. Each multi handle sets up a connection 1060 cache that all added `Curl_easy`s share by default. 1061 1062<a name="Curl_share"></a> 1063## Curl_share 1064 1065 The libcurl share API allocates a `Curl_share` struct, exposed to the 1066 external API as `CURLSH *`. 1067 1068 The idea is that the struct can have a set of its own versions of caches and 1069 pools and then by providing this struct in the `CURLOPT_SHARE` option, those 1070 specific `Curl_easy`s will use the caches/pools that this share handle 1071 holds. 1072 1073 Then individual `Curl_easy` structs can be made to share specific things 1074 that they otherwise wouldn't, such as cookies. 1075 1076 The `Curl_share` struct can currently hold cookies, DNS cache and the SSL 1077 session cache. 1078 1079<a name="CookieInfo"></a> 1080## CookieInfo 1081 1082 This is the main cookie struct. It holds all known cookies and related 1083 information. Each `Curl_easy` has its own private `CookieInfo` even when 1084 they are added to a multi handle. They can be made to share cookies by using 1085 the share API. 1086 1087 1088[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html 1089[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html 1090[3]: https://c-ares.haxx.se/ 1091[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230" 1092[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html 1093[6]: https://curl.haxx.se/docs/manpage.html#--compressed 1094[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html 1095[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html 1096[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html 1097[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html 1098[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html 1099[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html 1100[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html 1101[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html 1102[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2 1103