1curl internals 2============== 3 4 - [Intro](#intro) 5 - [git](#git) 6 - [Portability](#Portability) 7 - [Windows vs Unix](#winvsunix) 8 - [Library](#Library) 9 - [`Curl_connect`](#Curl_connect) 10 - [`multi_do`](#multi_do) 11 - [`Curl_readwrite`](#Curl_readwrite) 12 - [`multi_done`](#multi_done) 13 - [`Curl_disconnect`](#Curl_disconnect) 14 - [HTTP(S)](#http) 15 - [FTP](#ftp) 16 - [Kerberos](#kerberos) 17 - [TELNET](#telnet) 18 - [FILE](#file) 19 - [SMB](#smb) 20 - [LDAP](#ldap) 21 - [E-mail](#email) 22 - [General](#general) 23 - [Persistent Connections](#persistent) 24 - [multi interface/non-blocking](#multi) 25 - [SSL libraries](#ssl) 26 - [Library Symbols](#symbols) 27 - [Return Codes and Informationals](#returncodes) 28 - [AP/ABI](#abi) 29 - [Client](#client) 30 - [Memory Debugging](#memorydebug) 31 - [Test Suite](#test) 32 - [Asynchronous name resolves](#asyncdns) 33 - [c-ares](#cares) 34 - [`curl_off_t`](#curl_off_t) 35 - [curlx](#curlx) 36 - [Content Encoding](#contentencoding) 37 - [hostip.c explained](#hostip) 38 - [Track Down Memory Leaks](#memoryleak) 39 - [`multi_socket`](#multi_socket) 40 - [Structs in libcurl](#structs) 41 42<a name="intro"></a> 43Intro 44===== 45 46 This project is split in two. The library and the client. The client part 47 uses the library, but the library is designed to allow other applications to 48 use it. 49 50 The largest amount of code and complexity is in the library part. 51 52 53<a name="git"></a> 54git 55=== 56 57 All changes to the sources are committed to the git repository as soon as 58 they're somewhat verified to work. Changes shall be committed as independently 59 as possible so that individual changes can be easily spotted and tracked 60 afterwards. 61 62 Tagging shall be used extensively, and by the time we release new archives we 63 should tag the sources with a name similar to the released version number. 64 65<a name="Portability"></a> 66Portability 67=========== 68 69 We write curl and libcurl to compile with C89 compilers. On 32bit and up 70 machines. Most of libcurl assumes more or less POSIX compliance but that's 71 not a requirement. 72 73 We write libcurl to build and work with lots of third party tools, and we 74 want it to remain functional and buildable with these and later versions 75 (older versions may still work but is not what we work hard to maintain): 76 77Dependencies 78------------ 79 80 - OpenSSL 0.9.7 81 - GnuTLS 2.11.3 82 - zlib 1.1.4 83 - libssh2 0.16 84 - c-ares 1.6.0 85 - libidn2 2.0.0 86 - cyassl 2.0.0 87 - openldap 2.0 88 - MIT Kerberos 1.2.4 89 - GSKit V5R3M0 90 - NSS 3.14.x 91 - PolarSSL 1.3.0 92 - Heimdal ? 93 - nghttp2 1.0.0 94 95Operating Systems 96----------------- 97 98 On systems where configure runs, we aim at working on them all - if they have 99 a suitable C compiler. On systems that don't run configure, we strive to keep 100 curl running correctly on: 101 102 - Windows 98 103 - AS/400 V5R3M0 104 - Symbian 9.1 105 - Windows CE ? 106 - TPF ? 107 108Build tools 109----------- 110 111 When writing code (mostly for generating stuff included in release tarballs) 112 we use a few "build tools" and we make sure that we remain functional with 113 these versions: 114 115 - GNU Libtool 1.4.2 116 - GNU Autoconf 2.57 117 - GNU Automake 1.7 118 - GNU M4 1.4 119 - perl 5.004 120 - roffit 0.5 121 - groff ? (any version that supports "groff -Tps -man [in] [out]") 122 - ps2pdf (gs) ? 123 124<a name="winvsunix"></a> 125Windows vs Unix 126=============== 127 128 There are a few differences in how to program curl the Unix way compared to 129 the Windows way. Perhaps the four most notable details are: 130 131 1. Different function names for socket operations. 132 133 In curl, this is solved with defines and macros, so that the source looks 134 the same in all places except for the header file that defines them. The 135 macros in use are sclose(), sread() and swrite(). 136 137 2. Windows requires a couple of init calls for the socket stuff. 138 139 That's taken care of by the `curl_global_init()` call, but if other libs 140 also do it etc there might be reasons for applications to alter that 141 behaviour. 142 143 3. The file descriptors for network communication and file operations are 144 not as easily interchangeable as in Unix. 145 146 We avoid this by not trying any funny tricks on file descriptors. 147 148 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus 149 destroying binary data, although you do want that conversion if it is 150 text coming through... (sigh) 151 152 We set stdout to binary under windows 153 154 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All 155 conditionals that deal with features *should* instead be in the format 156 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts, 157 we maintain a `curl_config-win32.h` file in lib directory that is supposed to 158 look exactly like a `curl_config.h` file would have looked like on a Windows 159 machine! 160 161 Generally speaking: always remember that this will be compiled on dozens of 162 operating systems. Don't walk on the edge! 163 164<a name="Library"></a> 165Library 166======= 167 168 (See [Structs in libcurl](#structs) for the separate section describing all 169 major internal structs and their purposes.) 170 171 There are plenty of entry points to the library, namely each publicly defined 172 function that libcurl offers to applications. All of those functions are 173 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are 174 put in the lib/easy.c file. 175 176 `curl_global_init()` and `curl_global_cleanup()` should be called by the 177 application to initialize and clean up global stuff in the library. As of 178 today, it can handle the global SSL initing if SSL is enabled and it can init 179 the socket layer on windows machines. libcurl itself has no "global" scope. 180 181 All printf()-style functions use the supplied clones in lib/mprintf.c. This 182 makes sure we stay absolutely platform independent. 183 184 [ `curl_easy_init()`][2] allocates an internal struct and makes some 185 initializations. The returned handle does not reveal internals. This is the 186 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy` 187 functions. All connections performed will get connect-specific data allocated 188 that should be used for things related to particular connections/requests. 189 190 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must 191 be passed in pairs: the parameter-ID and the parameter-value. The list of 192 options is documented in the man page. This function mainly sets things in 193 the `Curl_easy` struct. 194 195 `curl_easy_perform()` is just a wrapper function that makes use of the multi 196 API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`, 197 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done 198 and then returns. 199 200 Some of the most important key functions in url.c are called from multi.c 201 when certain key steps are to be made in the transfer operation. 202 203<a name="Curl_connect"></a> 204Curl_connect() 205-------------- 206 207 Analyzes the URL, it separates the different components and connects to the 208 remote host. This may involve using a proxy and/or using SSL. The 209 `Curl_resolv()` function in lib/hostip.c is used for looking up host names 210 (it does then use the proper underlying method, which may vary between 211 platforms and builds). 212 213 When `Curl_connect` is done, we are connected to the remote site. Then it 214 is time to tell the server to get a document/file. `Curl_do()` arranges 215 this. 216 217 This function makes sure there's an allocated and initiated 'connectdata' 218 struct that is used for this particular connection only (although there may 219 be several requests performed on the same connect). A bunch of things are 220 inited/inherited from the `Curl_easy` struct. 221 222<a name="multi_do"></a> 223multi_do() 224--------- 225 226 `multi_do()` makes sure the proper protocol-specific function is called. The 227 functions are named after the protocols they handle. 228 229 The protocol-specific functions of course deal with protocol-specific 230 negotiations and setup. They have access to the `Curl_sendf()` (from 231 lib/sendf.c) function to send printf-style formatted data to the remote 232 host and when they're ready to make the actual file transfer they call the 233 `Curl_setup_transfer()` function (in lib/transfer.c) to setup the transfer 234 and returns. 235 236 If this DO function fails and the connection is being re-used, libcurl will 237 then close this connection, setup a new connection and re-issue the DO 238 request on that. This is because there is no way to be perfectly sure that 239 we have discovered a dead connection before the DO function and thus we 240 might wrongly be re-using a connection that was closed by the remote peer. 241 242<a name="Curl_readwrite"></a> 243Curl_readwrite() 244---------------- 245 246 Called during the transfer of the actual protocol payload. 247 248 During transfer, the progress functions in lib/progress.c are called at 249 frequent intervals (or at the user's choice, a specified callback might get 250 called). The speedcheck functions in lib/speedcheck.c are also used to 251 verify that the transfer is as fast as required. 252 253<a name="multi_done"></a> 254multi_done() 255----------- 256 257 Called after a transfer is done. This function takes care of everything 258 that has to be done after a transfer. This function attempts to leave 259 matters in a state so that `multi_do()` should be possible to call again on 260 the same connection (in a persistent connection case). It might also soon 261 be closed with `Curl_disconnect()`. 262 263<a name="Curl_disconnect"></a> 264Curl_disconnect() 265----------------- 266 267 When doing normal connections and transfers, no one ever tries to close any 268 connections so this is not normally called when `curl_easy_perform()` is 269 used. This function is only used when we are certain that no more transfers 270 are going to be made on the connection. It can be also closed by force, or 271 it can be called to make sure that libcurl doesn't keep too many 272 connections alive at the same time. 273 274 This function cleans up all resources that are associated with a single 275 connection. 276 277<a name="http"></a> 278HTTP(S) 279======= 280 281 HTTP offers a lot and is the protocol in curl that uses the most lines of 282 code. There is a special file (lib/formdata.c) that offers all the multipart 283 post functions. 284 285 base64-functions for user+password stuff (and more) is in (lib/base64.c) and 286 all functions for parsing and sending cookies are found in (lib/cookie.c). 287 288 HTTPS uses in almost every case the same procedure as HTTP, with only two 289 exceptions: the connect procedure is different and the function used to read 290 or write from the socket is different, although the latter fact is hidden in 291 the source by the use of `Curl_read()` for reading and `Curl_write()` for 292 writing data to the remote server. 293 294 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer 295 encoding. 296 297 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()` 298 series of functions we use. They append data to one single buffer, and when 299 the building is finished the entire request is sent off in one single write. 300 This is done this way to overcome problems with flawed firewalls and lame 301 servers. 302 303<a name="ftp"></a> 304FTP 305=== 306 307 The `Curl_if2ip()` function can be used for getting the IP number of a 308 specified network interface, and it resides in lib/if2ip.c. 309 310 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It 311 was made a separate function to prevent us programmers from forgetting that 312 they must be CRLF terminated. They must also be sent in one single write() to 313 make firewalls and similar happy. 314 315<a name="kerberos"></a> 316Kerberos 317======== 318 319 Kerberos support is mainly in lib/krb5.c and lib/security.c but also 320 `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and 321 `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics. 322 323<a name="telnet"></a> 324TELNET 325====== 326 327 Telnet is implemented in lib/telnet.c. 328 329<a name="file"></a> 330FILE 331==== 332 333 The file:// protocol is dealt with in lib/file.c. 334 335<a name="smb"></a> 336SMB 337=== 338 339 The smb:// protocol is dealt with in lib/smb.c. 340 341<a name="ldap"></a> 342LDAP 343==== 344 345 Everything LDAP is in lib/ldap.c and lib/openldap.c 346 347<a name="email"></a> 348E-mail 349====== 350 351 The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c. 352 353<a name="general"></a> 354General 355======= 356 357 URL encoding and decoding, called escaping and unescaping in the source code, 358 is found in lib/escape.c. 359 360 While transferring data in Transfer() a few functions might get used. 361 `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more). 362 363 lib/getenv.c offers `curl_getenv()` which is for reading environment 364 variables in a neat platform independent way. That's used in the client, but 365 also in lib/url.c when checking the proxy environment variables. Note that 366 contrary to the normal unix getenv(), this returns an allocated buffer that 367 must be free()ed after use. 368 369 lib/netrc.c holds the .netrc parser 370 371 lib/timeval.c features replacement functions for systems that don't have 372 gettimeofday() and a few support functions for timeval conversions. 373 374 A function named `curl_version()` that returns the full curl version string 375 is found in lib/version.c. 376 377<a name="persistent"></a> 378Persistent Connections 379====================== 380 381 The persistent connection support in libcurl requires some considerations on 382 how to do things inside of the library. 383 384 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call 385 must never hold connection-oriented data. It is meant to hold the root data 386 as well as all the options etc that the library-user may choose. 387 388 - The `Curl_easy` struct holds the "connection cache" (an array of 389 pointers to 'connectdata' structs). 390 391 - This enables the 'curl handle' to be reused on subsequent transfers. 392 393 - When libcurl is told to perform a transfer, it first checks for an already 394 existing connection in the cache that we can use. Otherwise it creates a 395 new one and adds that to the cache. If the cache is full already when a new 396 connection is added, it will first close the oldest unused one. 397 398 - When the transfer operation is complete, the connection is left 399 open. Particular options may tell libcurl not to, and protocols may signal 400 closure on connections and then they won't be kept open, of course. 401 402 - When `curl_easy_cleanup()` is called, we close all still opened connections, 403 unless of course the multi interface "owns" the connections. 404 405 The curl handle must be re-used in order for the persistent connections to 406 work. 407 408<a name="multi"></a> 409multi interface/non-blocking 410============================ 411 412 The multi interface is a non-blocking interface to the library. To make that 413 interface work as well as possible, no low-level functions within libcurl 414 must be written to work in a blocking manner. (There are still a few spots 415 violating this rule.) 416 417 One of the primary reasons we introduced c-ares support was to allow the name 418 resolve phase to be perfectly non-blocking as well. 419 420 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust 421 the code to allow non-blocking operations even on multi-stage command- 422 response protocols. They are built around state machines that return when 423 they would otherwise block waiting for data. The DICT, LDAP and TELNET 424 protocols are crappy examples and they are subject for rewrite in the future 425 to better fit the libcurl protocol family. 426 427<a name="ssl"></a> 428SSL libraries 429============= 430 431 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then 432 extended to its successor OpenSSL but has since also been extended to several 433 other SSL/TLS libraries and we expect and hope to further extend the support 434 in future libcurl versions. 435 436 To deal with this internally in the best way possible, we have a generic SSL 437 function API as provided by the vtls/vtls.[ch] system, and they are the only 438 SSL functions we must use from within libcurl. vtls is then crafted to use 439 the appropriate lower-level function calls to whatever SSL library that is in 440 use. For example vtls/openssl.[ch] for the OpenSSL library. 441 442<a name="symbols"></a> 443Library Symbols 444=============== 445 446 All symbols used internally in libcurl must use a `Curl_` prefix if they're 447 used in more than a single file. Single-file symbols must be made static. 448 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions, 449 but they are to be changed to follow this pattern in future versions.) Public 450 API functions are marked with `CURL_EXTERN` in the public header files so 451 that all others can be hidden on platforms where this is possible. 452 453<a name="returncodes"></a> 454Return Codes and Informationals 455=============================== 456 457 I've made things simple. Almost every function in libcurl returns a CURLcode, 458 that must be `CURLE_OK` if everything is OK or otherwise a suitable error 459 code as the curl/curl.h include file defines. The very spot that detects an 460 error must use the `Curl_failf()` function to set the human-readable error 461 description. 462 463 In aiding the user to understand what's happening and to debug curl usage, we 464 must supply a fair number of informational messages by using the 465 `Curl_infof()` function. Those messages are only displayed when the user 466 explicitly asks for them. They are best used when revealing information that 467 isn't otherwise obvious. 468 469<a name="abi"></a> 470API/ABI 471======= 472 473 We make an effort to not export or show internals or how internals work, as 474 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI 475 for our promise to users. 476 477<a name="client"></a> 478Client 479====== 480 481 main() resides in `src/tool_main.c`. 482 483 `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script 484 to display the complete "manual" and the `src/tool_urlglob.c` file holds the 485 functions used for the URL-"globbing" support. Globbing in the sense that the 486 {} and [] expansion stuff is there. 487 488 The client mostly sets up its 'config' struct properly, then 489 it calls the `curl_easy_*()` functions of the library and when it gets back 490 control after the `curl_easy_perform()` it cleans up the library, checks 491 status and exits. 492 493 When the operation is done, the ourWriteOut() function in src/writeout.c may 494 be called to report about the operation. That function is using the 495 `curl_easy_getinfo()` function to extract useful information from the curl 496 session. 497 498 It may loop and do all this several times if many URLs were specified on the 499 command line or config file. 500 501<a name="memorydebug"></a> 502Memory Debugging 503================ 504 505 The file lib/memdebug.c contains debug-versions of a few functions. Functions 506 such as malloc, free, fopen, fclose, etc that somehow deal with resources 507 that might give us problems if we "leak" them. The functions in the memdebug 508 system do nothing fancy, they do their normal function and then log 509 information about what they just did. The logged data can then be analyzed 510 after a complete session, 511 512 memanalyze.pl is the perl script present in tests/ that analyzes a log file 513 generated by the memory tracking system. It detects if resources are 514 allocated but never freed and other kinds of errors related to resource 515 management. 516 517 Internally, definition of preprocessor symbol DEBUGBUILD restricts code which 518 is only compiled for debug enabled builds. And symbol CURLDEBUG is used to 519 differentiate code which is _only_ used for memory tracking/debugging. 520 521 Use -DCURLDEBUG when compiling to enable memory debugging, this is also 522 switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD 523 when compiling to enable a debug build or run configure with --enable-debug. 524 525 curl --version will list 'Debug' feature for debug enabled builds, and 526 will list 'TrackMemory' feature for curl debug memory tracking capable 527 builds. These features are independent and can be controlled when running 528 the configure script. When --enable-debug is given both features will be 529 enabled, unless some restriction prevents memory tracking from being used. 530 531<a name="test"></a> 532Test Suite 533========== 534 535 The test suite is placed in its own subdirectory directly off the root in the 536 curl archive tree, and it contains a bunch of scripts and a lot of test case 537 data. 538 539 The main test script is runtests.pl that will invoke test servers like 540 httpserver.pl and ftpserver.pl before all the test cases are performed. The 541 test suite currently only runs on Unix-like platforms. 542 543 You'll find a description of the test suite in the tests/README file, and the 544 test case data files in the tests/FILEFORMAT file. 545 546 The test suite automatically detects if curl was built with the memory 547 debugging enabled, and if it was, it will detect memory leaks, too. 548 549<a name="asyncdns"></a> 550Asynchronous name resolves 551========================== 552 553 libcurl can be built to do name resolves asynchronously, using either the 554 normal resolver in a threaded manner or by using c-ares. 555 556<a name="cares"></a> 557[c-ares][3] 558------ 559 560### Build libcurl to use a c-ares 561 5621. ./configure --enable-ares=/path/to/ares/install 5632. make 564 565### c-ares on win32 566 567 First I compiled c-ares. I changed the default C runtime library to be the 568 single-threaded rather than the multi-threaded (this seems to be required to 569 prevent linking errors later on). Then I simply build the areslib project 570 (the other projects adig/ahost seem to fail under MSVC). 571 572 Next was libcurl. I opened lib/config-win32.h and I added a: 573 `#define USE_ARES 1` 574 575 Next thing I did was I added the path for the ares includes to the include 576 path, and the libares.lib to the libraries. 577 578 Lastly, I also changed libcurl to be single-threaded rather than 579 multi-threaded, again this was to prevent some duplicate symbol errors. I'm 580 not sure why I needed to change everything to single-threaded, but when I 581 didn't I got redefinition errors for several CRT functions (malloc, stricmp, 582 etc.) 583 584<a name="curl_off_t"></a> 585`curl_off_t` 586========== 587 588 `curl_off_t` is a data type provided by the external libcurl include 589 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1] 590 options that end with LARGE. The type is 64bit large on most modern 591 platforms. 592 593<a name="curlx"></a> 594curlx 595===== 596 597 The libcurl source code offers a few functions by source only. They are not 598 part of the official libcurl API, but the source files might be useful for 599 others so apps can optionally compile/build with these sources to gain 600 additional functions. 601 602 We provide them through a single header file for easy access for apps: 603 "curlx.h" 604 605`curlx_strtoofft()` 606------------------- 607 A macro that converts a string containing a number to a `curl_off_t` number. 608 This might use the `curlx_strtoll()` function which is provided as source 609 code in strtoofft.c. Note that the function is only provided if no 610 strtoll() (or equivalent) function exist on your platform. If `curl_off_t` 611 is only a 32 bit number on your platform, this macro uses strtol(). 612 613Future 614------ 615 616 Several functions will be removed from the public `curl_` name space in a 617 future libcurl release. They will then only become available as `curlx_` 618 functions instead. To make the transition easier, we already today provide 619 these functions with the `curlx_` prefix to allow sources to be built 620 properly with the new function names. The concerned functions are: 621 622 - `curlx_getenv` 623 - `curlx_strequal` 624 - `curlx_strnequal` 625 - `curlx_mvsnprintf` 626 - `curlx_msnprintf` 627 - `curlx_maprintf` 628 - `curlx_mvaprintf` 629 - `curlx_msprintf` 630 - `curlx_mprintf` 631 - `curlx_mfprintf` 632 - `curlx_mvsprintf` 633 - `curlx_mvprintf` 634 - `curlx_mvfprintf` 635 636<a name="contentencoding"></a> 637Content Encoding 638================ 639 640## About content encodings 641 642 [HTTP/1.1][4] specifies that a client may request that a server encode its 643 response. This is usually used to compress a response using one (or more) 644 encodings from a set of commonly available compression techniques. These 645 schemes include 'deflate' (the zlib algorithm), 'gzip' 'br' (brotli) and 646 'compress'. A client requests that the server perform an encoding by including 647 an Accept-Encoding header in the request document. The value of the header 648 should be one of the recognized tokens 'deflate', ... (there's a way to 649 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor 650 the client's encoding request. When a response is encoded, the server 651 includes a Content-Encoding header in the response. The value of the 652 Content-Encoding header indicates which encodings were used to encode the 653 data, in the order in which they were applied. 654 655 It's also possible for a client to attach priorities to different schemes so 656 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more 657 information on the Accept-Encoding header. See sec [3.1.2.2 of RFC 7231][15] 658 for more information on the Content-Encoding header. 659 660## Supported content encodings 661 662 The 'deflate', 'gzip' and 'br' content encodings are supported by libcurl. 663 Both regular and chunked transfers work fine. The zlib library is required 664 for the 'deflate' and 'gzip' encodings, while the brotli decoding library is 665 for the 'br' encoding. 666 667## The libcurl interface 668 669 To cause libcurl to request a content encoding use: 670 671 [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string) 672 673 where string is the intended value of the Accept-Encoding header. 674 675 Currently, libcurl does support multiple encodings but only 676 understands how to process responses that use the "deflate", "gzip" and/or 677 "br" content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5] 678 that will work (besides "identity," which does nothing) are "deflate", 679 "gzip" and "br". If a response is encoded using the "compress" or methods, 680 libcurl will return an error indicating that the response could 681 not be decoded. If `<string>` is NULL no Accept-Encoding header is generated. 682 If `<string>` is a zero-length string, then an Accept-Encoding header 683 containing all supported encodings will be generated. 684 685 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for 686 content to be automatically decoded. If it is not set and the server still 687 sends encoded content (despite not having been asked), the data is returned 688 in its raw form and the Content-Encoding type is not checked. 689 690## The curl interface 691 692 Use the [--compressed][6] option with curl to cause it to ask servers to 693 compress responses using any format supported by curl. 694 695<a name="hostip"></a> 696hostip.c explained 697================== 698 699 The main compile-time defines to keep in mind when reading the host*.c source 700 file are these: 701 702## `CURLRES_IPV6` 703 704 this host has getaddrinfo() and family, and thus we use that. The host may 705 not be able to resolve IPv6, but we don't really have to take that into 706 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined. 707 708## `CURLRES_ARES` 709 710 is defined if libcurl is built to use c-ares for asynchronous name 711 resolves. This can be Windows or *nix. 712 713## `CURLRES_THREADED` 714 715 is defined if libcurl is built to use threading for asynchronous name 716 resolves. The name resolve will be done in a new thread, and the supported 717 asynch API will be the same as for ares-builds. This is the default under 718 (native) Windows. 719 720 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If 721 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is 722 defined. 723 724## host*.c sources 725 726 The host*.c sources files are split up like this: 727 728 - hostip.c - method-independent resolver functions and utility functions 729 - hostasyn.c - functions for asynchronous name resolves 730 - hostsyn.c - functions for synchronous name resolves 731 - asyn-ares.c - functions for asynchronous name resolves using c-ares 732 - asyn-thread.c - functions for asynchronous name resolves using threads 733 - hostip4.c - IPv4 specific functions 734 - hostip6.c - IPv6 specific functions 735 736 The hostip.h is the single united header file for all this. It defines the 737 `CURLRES_*` defines based on the config*.h and `curl_setup.h` defines. 738 739<a name="memoryleak"></a> 740Track Down Memory Leaks 741======================= 742 743## Single-threaded 744 745 Please note that this memory leak system is not adjusted to work in more 746 than one thread. If you want/need to use it in a multi-threaded app. Please 747 adjust accordingly. 748 749 750## Build 751 752 Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with 753 --enable-debug fixes this). 'make clean' first, then 'make' so that all 754 files are actually rebuilt properly. It will also make sense to build 755 libcurl with the debug option (usually -g to the compiler) so that debugging 756 it will be easier if you actually do find a leak in the library. 757 758 This will create a library that has memory debugging enabled. 759 760## Modify Your Application 761 762 Add a line in your application code: 763 764 `curl_memdebug("dump");` 765 766 This will make the malloc debug system output a full trace of all resource 767 using functions to the given file name. Make sure you rebuild your program 768 and that you link with the same libcurl you built for this purpose as 769 described above. 770 771## Run Your Application 772 773 Run your program as usual. Watch the specified memory trace file grow. 774 775 Make your program exit and use the proper libcurl cleanup functions etc. So 776 that all non-leaks are returned/freed properly. 777 778## Analyze the Flow 779 780 Use the tests/memanalyze.pl perl script to analyze the dump file: 781 782 tests/memanalyze.pl dump 783 784 This now outputs a report on what resources that were allocated but never 785 freed etc. This report is very fine for posting to the list! 786 787 If this doesn't produce any output, no leak was detected in libcurl. Then 788 the leak is mostly likely to be in your code. 789 790<a name="multi_socket"></a> 791`multi_socket` 792============== 793 794 Implementation of the `curl_multi_socket` API 795 796 The main ideas of this API are simply: 797 798 1 - The application can use whatever event system it likes as it gets info 799 from libcurl about what file descriptors libcurl waits for what action 800 on. (The previous API returns `fd_sets` which is very select()-centric). 801 802 2 - When the application discovers action on a single socket, it calls 803 libcurl and informs that there was action on this particular socket and 804 libcurl can then act on that socket/transfer only and not care about 805 any other transfers. (The previous API always had to scan through all 806 the existing transfers.) 807 808 The idea is that [`curl_multi_socket_action()`][7] calls a given callback 809 with information about what socket to wait for what action on, and the 810 callback only gets called if the status of that socket has changed. 811 812 We also added a timer callback that makes libcurl call the application when 813 the timeout value changes, and you set that with [`curl_multi_setopt()`][9] 814 and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work, 815 Internally, there's an added struct to each easy handle in which we store 816 an "expire time" (if any). The structs are then "splay sorted" so that we 817 can add and remove times from the linked list and yet somewhat swiftly 818 figure out both how long there is until the next nearest timer expires 819 and which timer (handle) we should take care of now. Of course, the upside 820 of all this is that we get a [`curl_multi_timeout()`][8] that should also 821 work with old-style applications that use [`curl_multi_perform()`][11]. 822 823 We created an internal "socket to easy handles" hash table that given 824 a socket (file descriptor) returns the easy handle that waits for action on 825 that socket. This hash is made using the already existing hash code 826 (previously only used for the DNS cache). 827 828 To make libcurl able to report plain sockets in the socket callback, we had 829 to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that 830 the conversion from sockets to `fd_sets` for that function is only done in 831 the last step before the data is returned. I also had to extend c-ares to 832 get a function that can return plain sockets, as that library too returned 833 only `fd_sets` and that is no longer good enough. The changes done to c-ares 834 are available in c-ares 1.3.1 and later. 835 836<a name="structs"></a> 837Structs in libcurl 838================== 839 840This section should cover 7.32.0 pretty accurately, but will make sense even 841for older and later versions as things don't change drastically that often. 842 843## Curl_easy 844 845 The `Curl_easy` struct is the one returned to the outside in the external API 846 as a "CURL *". This is usually known as an easy handle in API documentations 847 and examples. 848 849 Information and state that is related to the actual connection is in the 850 'connectdata' struct. When a transfer is about to be made, libcurl will 851 either create a new connection or re-use an existing one. The particular 852 connectdata that is used by this handle is pointed out by 853 `Curl_easy->easy_conn`. 854 855 Data and information that regard this particular single transfer is put in 856 the SingleRequest sub-struct. 857 858 When the `Curl_easy` struct is added to a multi handle, as it must be in 859 order to do any transfer, the ->multi member will point to the `Curl_multi` 860 struct it belongs to. The ->prev and ->next members will then be used by the 861 multi code to keep a linked list of `Curl_easy` structs that are added to 862 that same multi handle. libcurl always uses multi so ->multi *will* point to 863 a `Curl_multi` when a transfer is in progress. 864 865 ->mstate is the multi state of this particular `Curl_easy`. When 866 `multi_runsingle()` is called, it will act on this handle according to which 867 state it is in. The mstate is also what tells which sockets to return for a 868 specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc. 869 870 The libcurl source code generally use the name 'data' for the variable that 871 points to the `Curl_easy`. 872 873 When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with 874 an individual stream, sharing the same connectdata struct. Multiplexing 875 makes it even more important to keep things associated with the right thing! 876 877## connectdata 878 879 A general idea in libcurl is to keep connections around in a connection 880 "cache" after they have been used in case they will be used again and then 881 re-use an existing one instead of creating a new as it creates a significant 882 performance boost. 883 884 Each 'connectdata' identifies a single physical connection to a server. If 885 the connection can't be kept alive, the connection will be closed after use 886 and then this struct can be removed from the cache and freed. 887 888 Thus, the same `Curl_easy` can be used multiple times and each time select 889 another connectdata struct to use for the connection. Keep this in mind, as 890 it is then important to consider if options or choices are based on the 891 connection or the `Curl_easy`. 892 893 Functions in libcurl will assume that connectdata->data points to the 894 `Curl_easy` that uses this connection (for the moment). 895 896 As a special complexity, some protocols supported by libcurl require a 897 special disconnect procedure that is more than just shutting down the 898 socket. It can involve sending one or more commands to the server before 899 doing so. Since connections are kept in the connection cache after use, the 900 original `Curl_easy` may no longer be around when the time comes to shut down 901 a particular connection. For this purpose, libcurl holds a special dummy 902 `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed. 903 904 FTP uses two TCP connections for a typical transfer but it keeps both in 905 this single struct and thus can be considered a single connection for most 906 internal concerns. 907 908 The libcurl source code generally use the name 'conn' for the variable that 909 points to the connectdata. 910 911## Curl_multi 912 913 Internally, the easy interface is implemented as a wrapper around multi 914 interface functions. This makes everything multi interface. 915 916 `Curl_multi` is the multi handle struct exposed as "CURLM *" in external 917 APIs. 918 919 This struct holds a list of `Curl_easy` structs that have been added to this 920 handle with [`curl_multi_add_handle()`][13]. The start of the list is 921 `->easyp` and `->num_easy` is a counter of added `Curl_easy`s. 922 923 `->msglist` is a linked list of messages to send back when 924 [`curl_multi_info_read()`][14] is called. Basically a node is added to that 925 list when an individual `Curl_easy`'s transfer has completed. 926 927 `->hostcache` points to the name cache. It is a hash table for looking up 928 name to IP. The nodes have a limited life time in there and this cache is 929 meant to reduce the time for when the same name is wanted within a short 930 period of time. 931 932 `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time 933 until it should be checked - normally some sort of timeout. Each `Curl_easy` 934 has one node in the tree. 935 936 `->sockhash` is a hash table to allow fast lookups of socket descriptor for 937 which `Curl_easy` uses that descriptor. This is necessary for the 938 `multi_socket` API. 939 940 `->conn_cache` points to the connection cache. It keeps track of all 941 connections that are kept after use. The cache has a maximum size. 942 943 `->closure_handle` is described in the 'connectdata' section. 944 945 The libcurl source code generally use the name 'multi' for the variable that 946 points to the `Curl_multi` struct. 947 948## Curl_handler 949 950 Each unique protocol that is supported by libcurl needs to provide at least 951 one `Curl_handler` struct. It defines what the protocol is called and what 952 functions the main code should call to deal with protocol specific issues. 953 In general, there's a source file named [protocol].c in which there's a 954 "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's 955 then the main array with all individual `Curl_handler` structs pointed to 956 from a single array which is scanned through when a URL is given to libcurl 957 to work with. 958 959 `->scheme` is the URL scheme name, usually spelled out in uppercase. That's 960 "HTTP" or "FTP" etc. SSL versions of the protocol need their own 961 `Curl_handler` setup so HTTPS separate from HTTP. 962 963 `->setup_connection` is called to allow the protocol code to allocate 964 protocol specific data that then gets associated with that `Curl_easy` for 965 the rest of this transfer. It gets freed again at the end of the transfer. 966 It will be called before the 'connectdata' for the transfer has been 967 selected/created. Most protocols will allocate its private 968 'struct [PROTOCOL]' here and assign `Curl_easy->req.protop` to point to it. 969 970 `->connect_it` allows a protocol to do some specific actions after the TCP 971 connect is done, that can still be considered part of the connection phase. 972 973 Some protocols will alter the `connectdata->recv[]` and 974 `connectdata->send[]` function pointers in this function. 975 976 `->connecting` is similarly a function that keeps getting called as long as 977 the protocol considers itself still in the connecting phase. 978 979 `->do_it` is the function called to issue the transfer request. What we call 980 the DO action internally. If the DO is not enough and things need to be kept 981 getting done for the entire DO sequence to complete, `->doing` is then 982 usually also provided. Each protocol that needs to do multiple commands or 983 similar for do/doing need to implement their own state machines (see SCP, 984 SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has 985 a separate piece of the DO state called `DO_MORE`. 986 987 `->doing` keeps getting called while issuing the transfer request command(s) 988 989 `->done` gets called when the transfer is complete and DONE. That's after the 990 main data has been transferred. 991 992 `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses 993 this state when setting up the second connection. 994 995 ->`proto_getsock` 996 ->`doing_getsock` 997 ->`domore_getsock` 998 ->`perform_getsock` 999 Functions that return socket information. Which socket(s) to wait for which 1000 action(s) during the particular multi state. 1001 1002 ->disconnect is called immediately before the TCP connection is shutdown. 1003 1004 ->readwrite gets called during transfer to allow the protocol to do extra 1005 reads/writes 1006 1007 ->defport is the default report TCP or UDP port this protocol uses 1008 1009 ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions 1010 have their "base" protocol set and then the SSL variation. Like 1011 "HTTP|HTTPS". 1012 1013 ->flags is a bitmask with additional information about the protocol that will 1014 make it get treated differently by the generic engine: 1015 1016 - `PROTOPT_SSL` - will make it connect and negotiate SSL 1017 1018 - `PROTOPT_DUAL` - this protocol uses two connections 1019 1020 - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the 1021 connection. This flag is no longer used by code, yet still set for a bunch 1022 of protocol handlers. 1023 1024 - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to 1025 limit which "direction" of socket actions that the main engine will 1026 concern itself with. 1027 1028 - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:) 1029 1030 - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default 1031 one unless one is provided 1032 1033 - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL 1034 (?foo=bar) 1035 1036## conncache 1037 1038 Is a hash table with connections for later re-use. Each `Curl_easy` has a 1039 pointer to its connection cache. Each multi handle sets up a connection 1040 cache that all added `Curl_easy`s share by default. 1041 1042## Curl_share 1043 1044 The libcurl share API allocates a `Curl_share` struct, exposed to the 1045 external API as "CURLSH *". 1046 1047 The idea is that the struct can have a set of its own versions of caches and 1048 pools and then by providing this struct in the `CURLOPT_SHARE` option, those 1049 specific `Curl_easy`s will use the caches/pools that this share handle 1050 holds. 1051 1052 Then individual `Curl_easy` structs can be made to share specific things 1053 that they otherwise wouldn't, such as cookies. 1054 1055 The `Curl_share` struct can currently hold cookies, DNS cache and the SSL 1056 session cache. 1057 1058## CookieInfo 1059 1060 This is the main cookie struct. It holds all known cookies and related 1061 information. Each `Curl_easy` has its own private CookieInfo even when 1062 they are added to a multi handle. They can be made to share cookies by using 1063 the share API. 1064 1065 1066[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html 1067[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html 1068[3]: https://c-ares.haxx.se/ 1069[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230" 1070[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html 1071[6]: https://curl.haxx.se/docs/manpage.html#--compressed 1072[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html 1073[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html 1074[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html 1075[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html 1076[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html 1077[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html 1078[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html 1079[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html 1080[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2 1081