• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1curl internals
2==============
3
4 - [Intro](#intro)
5 - [git](#git)
6 - [Portability](#Portability)
7 - [Windows vs Unix](#winvsunix)
8 - [Library](#Library)
9   - [`Curl_connect`](#Curl_connect)
10   - [`multi_do`](#multi_do)
11   - [`Curl_readwrite`](#Curl_readwrite)
12   - [`multi_done`](#multi_done)
13   - [`Curl_disconnect`](#Curl_disconnect)
14 - [HTTP(S)](#http)
15 - [FTP](#ftp)
16 - [Kerberos](#kerberos)
17 - [TELNET](#telnet)
18 - [FILE](#file)
19 - [SMB](#smb)
20 - [LDAP](#ldap)
21 - [E-mail](#email)
22 - [General](#general)
23 - [Persistent Connections](#persistent)
24 - [multi interface/non-blocking](#multi)
25 - [SSL libraries](#ssl)
26 - [Library Symbols](#symbols)
27 - [Return Codes and Informationals](#returncodes)
28 - [AP/ABI](#abi)
29 - [Client](#client)
30 - [Memory Debugging](#memorydebug)
31 - [Test Suite](#test)
32 - [Asynchronous name resolves](#asyncdns)
33   - [c-ares](#cares)
34 - [`curl_off_t`](#curl_off_t)
35 - [curlx](#curlx)
36 - [Content Encoding](#contentencoding)
37 - [`hostip.c` explained](#hostip)
38 - [Track Down Memory Leaks](#memoryleak)
39 - [`multi_socket`](#multi_socket)
40 - [Structs in libcurl](#structs)
41   - [Curl_easy](#Curl_easy)
42   - [connectdata](#connectdata)
43   - [Curl_multi](#Curl_multi)
44   - [Curl_handler](#Curl_handler)
45   - [conncache](#conncache)
46   - [Curl_share](#Curl_share)
47   - [CookieInfo](#CookieInfo)
48
49<a name="intro"></a>
50Intro
51=====
52
53 This project is split in two. The library and the client. The client part
54 uses the library, but the library is designed to allow other applications to
55 use it.
56
57 The largest amount of code and complexity is in the library part.
58
59
60<a name="git"></a>
61git
62===
63
64 All changes to the sources are committed to the git repository as soon as
65 they're somewhat verified to work. Changes shall be committed as independently
66 as possible so that individual changes can be easily spotted and tracked
67 afterwards.
68
69 Tagging shall be used extensively, and by the time we release new archives we
70 should tag the sources with a name similar to the released version number.
71
72<a name="Portability"></a>
73Portability
74===========
75
76 We write curl and libcurl to compile with C89 compilers.  On 32-bit and up
77 machines. Most of libcurl assumes more or less POSIX compliance but that's
78 not a requirement.
79
80 We write libcurl to build and work with lots of third party tools, and we
81 want it to remain functional and buildable with these and later versions
82 (older versions may still work but is not what we work hard to maintain):
83
84Dependencies
85------------
86
87 - OpenSSL      0.9.7
88 - GnuTLS       2.11.3
89 - zlib         1.1.4
90 - libssh2      0.16
91 - c-ares       1.6.0
92 - libidn2      2.0.0
93 - wolfSSL      2.0.0
94 - openldap     2.0
95 - MIT Kerberos 1.2.4
96 - GSKit        V5R3M0
97 - NSS          3.14.x
98 - PolarSSL     1.3.0
99 - Heimdal      ?
100 - nghttp2      1.0.0
101
102Operating Systems
103-----------------
104
105 On systems where configure runs, we aim at working on them all - if they have
106 a suitable C compiler. On systems that don't run configure, we strive to keep
107 curl running correctly on:
108
109 - Windows      98
110 - AS/400       V5R3M0
111 - Symbian      9.1
112 - Windows CE   ?
113 - TPF          ?
114
115Build tools
116-----------
117
118 When writing code (mostly for generating stuff included in release tarballs)
119 we use a few "build tools" and we make sure that we remain functional with
120 these versions:
121
122 - GNU Libtool  1.4.2
123 - GNU Autoconf 2.57
124 - GNU Automake 1.7
125 - GNU M4       1.4
126 - perl         5.004
127 - roffit       0.5
128 - groff        ? (any version that supports `groff -Tps -man [in] [out]`)
129 - ps2pdf (gs)  ?
130
131<a name="winvsunix"></a>
132Windows vs Unix
133===============
134
135 There are a few differences in how to program curl the Unix way compared to
136 the Windows way. Perhaps the four most notable details are:
137
138 1. Different function names for socket operations.
139
140   In curl, this is solved with defines and macros, so that the source looks
141   the same in all places except for the header file that defines them. The
142   macros in use are `sclose()`, `sread()` and `swrite()`.
143
144 2. Windows requires a couple of init calls for the socket stuff.
145
146   That's taken care of by the `curl_global_init()` call, but if other libs
147   also do it etc there might be reasons for applications to alter that
148   behaviour.
149
150 3. The file descriptors for network communication and file operations are
151    not as easily interchangeable as in Unix.
152
153   We avoid this by not trying any funny tricks on file descriptors.
154
155 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
156    destroying binary data, although you do want that conversion if it is
157    text coming through... (sigh)
158
159   We set stdout to binary under windows
160
161 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
162 conditionals that deal with features *should* instead be in the format
163 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
164 we maintain a `curl_config-win32.h` file in lib directory that is supposed to
165 look exactly like a `curl_config.h` file would have looked like on a Windows
166 machine!
167
168 Generally speaking: always remember that this will be compiled on dozens of
169 operating systems. Don't walk on the edge!
170
171<a name="Library"></a>
172Library
173=======
174
175 (See [Structs in libcurl](#structs) for the separate section describing all
176 major internal structs and their purposes.)
177
178 There are plenty of entry points to the library, namely each publicly defined
179 function that libcurl offers to applications. All of those functions are
180 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
181 put in the `lib/easy.c` file.
182
183 `curl_global_init()` and `curl_global_cleanup()` should be called by the
184 application to initialize and clean up global stuff in the library. As of
185 today, it can handle the global SSL initing if SSL is enabled and it can init
186 the socket layer on windows machines. libcurl itself has no "global" scope.
187
188 All printf()-style functions use the supplied clones in `lib/mprintf.c`. This
189 makes sure we stay absolutely platform independent.
190
191 [ `curl_easy_init()`][2] allocates an internal struct and makes some
192 initializations.  The returned handle does not reveal internals. This is the
193 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
194 functions. All connections performed will get connect-specific data allocated
195 that should be used for things related to particular connections/requests.
196
197 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
198 be passed in pairs: the parameter-ID and the parameter-value. The list of
199 options is documented in the man page. This function mainly sets things in
200 the `Curl_easy` struct.
201
202 `curl_easy_perform()` is just a wrapper function that makes use of the multi
203 API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
204 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
205 and then returns.
206
207 Some of the most important key functions in `url.c` are called from
208 `multi.c` when certain key steps are to be made in the transfer operation.
209
210<a name="Curl_connect"></a>
211Curl_connect()
212--------------
213
214   Analyzes the URL, it separates the different components and connects to the
215   remote host. This may involve using a proxy and/or using SSL. The
216   `Curl_resolv()` function in `lib/hostip.c` is used for looking up host
217   names (it does then use the proper underlying method, which may vary
218   between platforms and builds).
219
220   When `Curl_connect` is done, we are connected to the remote site. Then it
221   is time to tell the server to get a document/file. `Curl_do()` arranges
222   this.
223
224   This function makes sure there's an allocated and initiated `connectdata`
225   struct that is used for this particular connection only (although there may
226   be several requests performed on the same connect). A bunch of things are
227   inited/inherited from the `Curl_easy` struct.
228
229<a name="multi_do"></a>
230multi_do()
231---------
232
233   `multi_do()` makes sure the proper protocol-specific function is called.
234   The functions are named after the protocols they handle.
235
236   The protocol-specific functions of course deal with protocol-specific
237   negotiations and setup. They have access to the `Curl_sendf()` (from
238   `lib/sendf.c`) function to send printf-style formatted data to the remote
239   host and when they're ready to make the actual file transfer they call the
240   `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the
241   transfer and returns.
242
243   If this DO function fails and the connection is being re-used, libcurl will
244   then close this connection, setup a new connection and re-issue the DO
245   request on that. This is because there is no way to be perfectly sure that
246   we have discovered a dead connection before the DO function and thus we
247   might wrongly be re-using a connection that was closed by the remote peer.
248
249<a name="Curl_readwrite"></a>
250Curl_readwrite()
251----------------
252
253   Called during the transfer of the actual protocol payload.
254
255   During transfer, the progress functions in `lib/progress.c` are called at
256   frequent intervals (or at the user's choice, a specified callback might get
257   called). The speedcheck functions in `lib/speedcheck.c` are also used to
258   verify that the transfer is as fast as required.
259
260<a name="multi_done"></a>
261multi_done()
262-----------
263
264   Called after a transfer is done. This function takes care of everything
265   that has to be done after a transfer. This function attempts to leave
266   matters in a state so that `multi_do()` should be possible to call again on
267   the same connection (in a persistent connection case). It might also soon
268   be closed with `Curl_disconnect()`.
269
270<a name="Curl_disconnect"></a>
271Curl_disconnect()
272-----------------
273
274   When doing normal connections and transfers, no one ever tries to close any
275   connections so this is not normally called when `curl_easy_perform()` is
276   used. This function is only used when we are certain that no more transfers
277   are going to be made on the connection. It can be also closed by force, or
278   it can be called to make sure that libcurl doesn't keep too many
279   connections alive at the same time.
280
281   This function cleans up all resources that are associated with a single
282   connection.
283
284<a name="http"></a>
285HTTP(S)
286=======
287
288 HTTP offers a lot and is the protocol in curl that uses the most lines of
289 code. There is a special file `lib/formdata.c` that offers all the
290 multipart post functions.
291
292 base64-functions for user+password stuff (and more) is in `lib/base64.c`
293 and all functions for parsing and sending cookies are found in
294 `lib/cookie.c`.
295
296 HTTPS uses in almost every case the same procedure as HTTP, with only two
297 exceptions: the connect procedure is different and the function used to read
298 or write from the socket is different, although the latter fact is hidden in
299 the source by the use of `Curl_read()` for reading and `Curl_write()` for
300 writing data to the remote server.
301
302 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
303 encoding.
304
305 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
306 series of functions we use. They append data to one single buffer, and when
307 the building is finished the entire request is sent off in one single write.
308 This is done this way to overcome problems with flawed firewalls and lame
309 servers.
310
311<a name="ftp"></a>
312FTP
313===
314
315 The `Curl_if2ip()` function can be used for getting the IP number of a
316 specified network interface, and it resides in `lib/if2ip.c`.
317
318 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
319 was made a separate function to prevent us programmers from forgetting that
320 they must be CRLF terminated. They must also be sent in one single `write()`
321 to make firewalls and similar happy.
322
323<a name="kerberos"></a>
324Kerberos
325========
326
327 Kerberos support is mainly in `lib/krb5.c` and `lib/security.c` but also
328 `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
329 `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
330
331<a name="telnet"></a>
332TELNET
333======
334
335 Telnet is implemented in `lib/telnet.c`.
336
337<a name="file"></a>
338FILE
339====
340
341 The `file://` protocol is dealt with in `lib/file.c`.
342
343<a name="smb"></a>
344SMB
345===
346
347 The `smb://` protocol is dealt with in `lib/smb.c`.
348
349<a name="ldap"></a>
350LDAP
351====
352
353 Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`.
354
355<a name="email"></a>
356E-mail
357======
358
359 The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and
360 `lib/smtp.c`.
361
362<a name="general"></a>
363General
364=======
365
366 URL encoding and decoding, called escaping and unescaping in the source code,
367 is found in `lib/escape.c`.
368
369 While transferring data in `Transfer()` a few functions might get used.
370 `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and
371 more).
372
373 `lib/getenv.c` offers `curl_getenv()` which is for reading environment
374 variables in a neat platform independent way. That's used in the client, but
375 also in `lib/url.c` when checking the proxy environment variables. Note that
376 contrary to the normal unix `getenv()`, this returns an allocated buffer that
377 must be `free()`ed after use.
378
379 `lib/netrc.c` holds the `.netrc` parser.
380
381 `lib/timeval.c` features replacement functions for systems that don't have
382 `gettimeofday()` and a few support functions for timeval conversions.
383
384 A function named `curl_version()` that returns the full curl version string
385 is found in `lib/version.c`.
386
387<a name="persistent"></a>
388Persistent Connections
389======================
390
391 The persistent connection support in libcurl requires some considerations on
392 how to do things inside of the library.
393
394 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
395   must never hold connection-oriented data. It is meant to hold the root data
396   as well as all the options etc that the library-user may choose.
397
398 - The `Curl_easy` struct holds the "connection cache" (an array of
399   pointers to `connectdata` structs).
400
401 - This enables the 'curl handle' to be reused on subsequent transfers.
402
403 - When libcurl is told to perform a transfer, it first checks for an already
404   existing connection in the cache that we can use. Otherwise it creates a
405   new one and adds that to the cache. If the cache is full already when a new
406   connection is added, it will first close the oldest unused one.
407
408 - When the transfer operation is complete, the connection is left
409   open. Particular options may tell libcurl not to, and protocols may signal
410   closure on connections and then they won't be kept open, of course.
411
412 - When `curl_easy_cleanup()` is called, we close all still opened connections,
413   unless of course the multi interface "owns" the connections.
414
415 The curl handle must be re-used in order for the persistent connections to
416 work.
417
418<a name="multi"></a>
419multi interface/non-blocking
420============================
421
422 The multi interface is a non-blocking interface to the library. To make that
423 interface work as well as possible, no low-level functions within libcurl
424 must be written to work in a blocking manner. (There are still a few spots
425 violating this rule.)
426
427 One of the primary reasons we introduced c-ares support was to allow the name
428 resolve phase to be perfectly non-blocking as well.
429
430 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
431 the code to allow non-blocking operations even on multi-stage command-
432 response protocols. They are built around state machines that return when
433 they would otherwise block waiting for data.  The DICT, LDAP and TELNET
434 protocols are crappy examples and they are subject for rewrite in the future
435 to better fit the libcurl protocol family.
436
437<a name="ssl"></a>
438SSL libraries
439=============
440
441 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
442 extended to its successor OpenSSL but has since also been extended to several
443 other SSL/TLS libraries and we expect and hope to further extend the support
444 in future libcurl versions.
445
446 To deal with this internally in the best way possible, we have a generic SSL
447 function API as provided by the `vtls/vtls.[ch]` system, and they are the only
448 SSL functions we must use from within libcurl. vtls is then crafted to use
449 the appropriate lower-level function calls to whatever SSL library that is in
450 use. For example `vtls/openssl.[ch]` for the OpenSSL library.
451
452<a name="symbols"></a>
453Library Symbols
454===============
455
456 All symbols used internally in libcurl must use a `Curl_` prefix if they're
457 used in more than a single file. Single-file symbols must be made static.
458 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
459 but they are to be changed to follow this pattern in future versions.) Public
460 API functions are marked with `CURL_EXTERN` in the public header files so
461 that all others can be hidden on platforms where this is possible.
462
463<a name="returncodes"></a>
464Return Codes and Informationals
465===============================
466
467 I've made things simple. Almost every function in libcurl returns a CURLcode,
468 that must be `CURLE_OK` if everything is OK or otherwise a suitable error
469 code as the `curl/curl.h` include file defines. The very spot that detects an
470 error must use the `Curl_failf()` function to set the human-readable error
471 description.
472
473 In aiding the user to understand what's happening and to debug curl usage, we
474 must supply a fair number of informational messages by using the
475 `Curl_infof()` function. Those messages are only displayed when the user
476 explicitly asks for them. They are best used when revealing information that
477 isn't otherwise obvious.
478
479<a name="abi"></a>
480API/ABI
481=======
482
483 We make an effort to not export or show internals or how internals work, as
484 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
485 for our promise to users.
486
487<a name="client"></a>
488Client
489======
490
491 `main()` resides in `src/tool_main.c`.
492
493 `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl
494 script to display the complete "manual" and the `src/tool_urlglob.c` file
495 holds the functions used for the URL-"globbing" support. Globbing in the
496 sense that the `{}` and `[]` expansion stuff is there.
497
498 The client mostly sets up its `config` struct properly, then
499 it calls the `curl_easy_*()` functions of the library and when it gets back
500 control after the `curl_easy_perform()` it cleans up the library, checks
501 status and exits.
502
503 When the operation is done, the `ourWriteOut()` function in `src/writeout.c`
504 may be called to report about the operation. That function is using the
505 `curl_easy_getinfo()` function to extract useful information from the curl
506 session.
507
508 It may loop and do all this several times if many URLs were specified on the
509 command line or config file.
510
511<a name="memorydebug"></a>
512Memory Debugging
513================
514
515 The file `lib/memdebug.c` contains debug-versions of a few functions.
516 Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that
517 somehow deal with resources that might give us problems if we "leak" them.
518 The functions in the memdebug system do nothing fancy, they do their normal
519 function and then log information about what they just did. The logged data
520 can then be analyzed after a complete session,
521
522 `memanalyze.pl` is the perl script present in `tests/` that analyzes a log
523 file generated by the memory tracking system. It detects if resources are
524 allocated but never freed and other kinds of errors related to resource
525 management.
526
527 Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code
528 which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is
529 used to differentiate code which is _only_ used for memory
530 tracking/debugging.
531
532 Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also
533 switched on by running configure with `--enable-curldebug`. Use
534 `-DDEBUGBUILD` when compiling to enable a debug build or run configure with
535 `--enable-debug`.
536
537 `curl --version` will list 'Debug' feature for debug enabled builds, and
538 will list 'TrackMemory' feature for curl debug memory tracking capable
539 builds. These features are independent and can be controlled when running
540 the configure script. When `--enable-debug` is given both features will be
541 enabled, unless some restriction prevents memory tracking from being used.
542
543<a name="test"></a>
544Test Suite
545==========
546
547 The test suite is placed in its own subdirectory directly off the root in the
548 curl archive tree, and it contains a bunch of scripts and a lot of test case
549 data.
550
551 The main test script is `runtests.pl` that will invoke test servers like
552 `httpserver.pl` and `ftpserver.pl` before all the test cases are performed.
553 The test suite currently only runs on Unix-like platforms.
554
555 You'll find a description of the test suite in the `tests/README` file, and
556 the test case data files in the `tests/FILEFORMAT` file.
557
558 The test suite automatically detects if curl was built with the memory
559 debugging enabled, and if it was, it will detect memory leaks, too.
560
561<a name="asyncdns"></a>
562Asynchronous name resolves
563==========================
564
565 libcurl can be built to do name resolves asynchronously, using either the
566 normal resolver in a threaded manner or by using c-ares.
567
568<a name="cares"></a>
569[c-ares][3]
570------
571
572### Build libcurl to use a c-ares
573
5741. ./configure --enable-ares=/path/to/ares/install
5752. make
576
577### c-ares on win32
578
579 First I compiled c-ares. I changed the default C runtime library to be the
580 single-threaded rather than the multi-threaded (this seems to be required to
581 prevent linking errors later on). Then I simply build the areslib project
582 (the other projects adig/ahost seem to fail under MSVC).
583
584 Next was libcurl. I opened `lib/config-win32.h` and I added a:
585 `#define USE_ARES 1`
586
587 Next thing I did was I added the path for the ares includes to the include
588 path, and the libares.lib to the libraries.
589
590 Lastly, I also changed libcurl to be single-threaded rather than
591 multi-threaded, again this was to prevent some duplicate symbol errors. I'm
592 not sure why I needed to change everything to single-threaded, but when I
593 didn't I got redefinition errors for several CRT functions (`malloc()`,
594 `stricmp()`, etc.)
595
596<a name="curl_off_t"></a>
597`curl_off_t`
598==========
599
600 `curl_off_t` is a data type provided by the external libcurl include
601 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
602 options that end with LARGE. The type is 64-bit large on most modern
603 platforms.
604
605<a name="curlx"></a>
606curlx
607=====
608
609 The libcurl source code offers a few functions by source only. They are not
610 part of the official libcurl API, but the source files might be useful for
611 others so apps can optionally compile/build with these sources to gain
612 additional functions.
613
614 We provide them through a single header file for easy access for apps:
615 `curlx.h`
616
617`curlx_strtoofft()`
618-------------------
619   A macro that converts a string containing a number to a `curl_off_t` number.
620   This might use the `curlx_strtoll()` function which is provided as source
621   code in strtoofft.c. Note that the function is only provided if no
622   `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t`
623   is only a 32-bit number on your platform, this macro uses `strtol()`.
624
625Future
626------
627
628 Several functions will be removed from the public `curl_` name space in a
629 future libcurl release. They will then only become available as `curlx_`
630 functions instead. To make the transition easier, we already today provide
631 these functions with the `curlx_` prefix to allow sources to be built
632 properly with the new function names. The concerned functions are:
633
634 - `curlx_getenv`
635 - `curlx_strequal`
636 - `curlx_strnequal`
637 - `curlx_mvsnprintf`
638 - `curlx_msnprintf`
639 - `curlx_maprintf`
640 - `curlx_mvaprintf`
641 - `curlx_msprintf`
642 - `curlx_mprintf`
643 - `curlx_mfprintf`
644 - `curlx_mvsprintf`
645 - `curlx_mvprintf`
646 - `curlx_mvfprintf`
647
648<a name="contentencoding"></a>
649Content Encoding
650================
651
652## About content encodings
653
654 [HTTP/1.1][4] specifies that a client may request that a server encode its
655 response. This is usually used to compress a response using one (or more)
656 encodings from a set of commonly available compression techniques. These
657 schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and
658 `compress`. A client requests that the server perform an encoding by including
659 an `Accept-Encoding` header in the request document. The value of the header
660 should be one of the recognized tokens `deflate`, ... (there's a way to
661 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
662 the client's encoding request. When a response is encoded, the server
663 includes a `Content-Encoding` header in the response. The value of the
664 `Content-Encoding` header indicates which encodings were used to encode the
665 data, in the order in which they were applied.
666
667 It's also possible for a client to attach priorities to different schemes so
668 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
669 information on the `Accept-Encoding` header. See sec
670 [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding`
671 header.
672
673## Supported content encodings
674
675 The `deflate`, `gzip` and `br` content encodings are supported by libcurl.
676 Both regular and chunked transfers work fine.  The zlib library is required
677 for the `deflate` and `gzip` encodings, while the brotli decoding library is
678 for the `br` encoding.
679
680## The libcurl interface
681
682 To cause libcurl to request a content encoding use:
683
684  [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
685
686 where string is the intended value of the `Accept-Encoding` header.
687
688 Currently, libcurl does support multiple encodings but only
689 understands how to process responses that use the `deflate`, `gzip` and/or
690 `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
691 that will work (besides `identity`, which does nothing) are `deflate`,
692 `gzip` and `br`. If a response is encoded using the `compress` or methods,
693 libcurl will return an error indicating that the response could
694 not be decoded.  If `<string>` is NULL no `Accept-Encoding` header is
695 generated. If `<string>` is a zero-length string, then an `Accept-Encoding`
696 header containing all supported encodings will be generated.
697
698 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
699 content to be automatically decoded.  If it is not set and the server still
700 sends encoded content (despite not having been asked), the data is returned
701 in its raw form and the `Content-Encoding` type is not checked.
702
703## The curl interface
704
705 Use the [`--compressed`][6] option with curl to cause it to ask servers to
706 compress responses using any format supported by curl.
707
708<a name="hostip"></a>
709`hostip.c` explained
710====================
711
712 The main compile-time defines to keep in mind when reading the `host*.c`
713 source file are these:
714
715## `CURLRES_IPV6`
716
717 this host has `getaddrinfo()` and family, and thus we use that. The host may
718 not be able to resolve IPv6, but we don't really have to take that into
719 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
720
721## `CURLRES_ARES`
722
723 is defined if libcurl is built to use c-ares for asynchronous name
724 resolves. This can be Windows or \*nix.
725
726## `CURLRES_THREADED`
727
728 is defined if libcurl is built to use threading for asynchronous name
729 resolves. The name resolve will be done in a new thread, and the supported
730 asynch API will be the same as for ares-builds. This is the default under
731 (native) Windows.
732
733 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
734 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
735 defined.
736
737## `host*.c` sources
738
739 The `host*.c` sources files are split up like this:
740
741 - `hostip.c`      - method-independent resolver functions and utility functions
742 - `hostasyn.c`    - functions for asynchronous name resolves
743 - `hostsyn.c`     - functions for synchronous name resolves
744 - `asyn-ares.c`   - functions for asynchronous name resolves using c-ares
745 - `asyn-thread.c` - functions for asynchronous name resolves using threads
746 - `hostip4.c`     - IPv4 specific functions
747 - `hostip6.c`     - IPv6 specific functions
748
749 The `hostip.h` is the single united header file for all this. It defines the
750 `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines.
751
752<a name="memoryleak"></a>
753Track Down Memory Leaks
754=======================
755
756## Single-threaded
757
758  Please note that this memory leak system is not adjusted to work in more
759  than one thread. If you want/need to use it in a multi-threaded app. Please
760  adjust accordingly.
761
762## Build
763
764  Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with
765  `--enable-debug` fixes this). `make clean` first, then `make` so that all
766  files are actually rebuilt properly. It will also make sense to build
767  libcurl with the debug option (usually `-g` to the compiler) so that
768  debugging it will be easier if you actually do find a leak in the library.
769
770  This will create a library that has memory debugging enabled.
771
772## Modify Your Application
773
774  Add a line in your application code:
775
776       `curl_dbg_memdebug("dump");`
777
778  This will make the malloc debug system output a full trace of all resource
779  using functions to the given file name. Make sure you rebuild your program
780  and that you link with the same libcurl you built for this purpose as
781  described above.
782
783## Run Your Application
784
785  Run your program as usual. Watch the specified memory trace file grow.
786
787  Make your program exit and use the proper libcurl cleanup functions etc. So
788  that all non-leaks are returned/freed properly.
789
790## Analyze the Flow
791
792  Use the `tests/memanalyze.pl` perl script to analyze the dump file:
793
794    tests/memanalyze.pl dump
795
796  This now outputs a report on what resources that were allocated but never
797  freed etc. This report is very fine for posting to the list!
798
799  If this doesn't produce any output, no leak was detected in libcurl. Then
800  the leak is mostly likely to be in your code.
801
802<a name="multi_socket"></a>
803`multi_socket`
804==============
805
806 Implementation of the `curl_multi_socket` API
807
808 The main ideas of this API are simply:
809
810 1. The application can use whatever event system it likes as it gets info
811    from libcurl about what file descriptors libcurl waits for what action
812    on. (The previous API returns `fd_sets` which is very
813    `select()`-centric).
814
815 2. When the application discovers action on a single socket, it calls
816    libcurl and informs that there was action on this particular socket and
817    libcurl can then act on that socket/transfer only and not care about
818    any other transfers. (The previous API always had to scan through all
819    the existing transfers.)
820
821 The idea is that [`curl_multi_socket_action()`][7] calls a given callback
822 with information about what socket to wait for what action on, and the
823 callback only gets called if the status of that socket has changed.
824
825 We also added a timer callback that makes libcurl call the application when
826 the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
827 and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
828 Internally, there's an added struct to each easy handle in which we store
829 an "expire time" (if any). The structs are then "splay sorted" so that we
830 can add and remove times from the linked list and yet somewhat swiftly
831 figure out both how long there is until the next nearest timer expires
832 and which timer (handle) we should take care of now. Of course, the upside
833 of all this is that we get a [`curl_multi_timeout()`][8] that should also
834 work with old-style applications that use [`curl_multi_perform()`][11].
835
836 We created an internal "socket to easy handles" hash table that given
837 a socket (file descriptor) returns the easy handle that waits for action on
838 that socket.  This hash is made using the already existing hash code
839 (previously only used for the DNS cache).
840
841 To make libcurl able to report plain sockets in the socket callback, we had
842 to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
843 the conversion from sockets to `fd_sets` for that function is only done in
844 the last step before the data is returned. I also had to extend c-ares to
845 get a function that can return plain sockets, as that library too returned
846 only `fd_sets` and that is no longer good enough. The changes done to c-ares
847 are available in c-ares 1.3.1 and later.
848
849<a name="structs"></a>
850Structs in libcurl
851==================
852
853This section should cover 7.32.0 pretty accurately, but will make sense even
854for older and later versions as things don't change drastically that often.
855
856<a name="Curl_easy"></a>
857## Curl_easy
858
859  The `Curl_easy` struct is the one returned to the outside in the external API
860  as a `CURL *`. This is usually known as an easy handle in API documentations
861  and examples.
862
863  Information and state that is related to the actual connection is in the
864  `connectdata` struct. When a transfer is about to be made, libcurl will
865  either create a new connection or re-use an existing one. The particular
866  connectdata that is used by this handle is pointed out by
867  `Curl_easy->easy_conn`.
868
869  Data and information that regard this particular single transfer is put in
870  the `SingleRequest` sub-struct.
871
872  When the `Curl_easy` struct is added to a multi handle, as it must be in
873  order to do any transfer, the `->multi` member will point to the `Curl_multi`
874  struct it belongs to. The `->prev` and `->next` members will then be used by
875  the multi code to keep a linked list of `Curl_easy` structs that are added to
876  that same multi handle. libcurl always uses multi so `->multi` *will* point
877  to a `Curl_multi` when a transfer is in progress.
878
879  `->mstate` is the multi state of this particular `Curl_easy`. When
880  `multi_runsingle()` is called, it will act on this handle according to which
881  state it is in. The mstate is also what tells which sockets to return for a
882  specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
883
884  The libcurl source code generally use the name `data` for the variable that
885  points to the `Curl_easy`.
886
887  When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
888  an individual stream, sharing the same connectdata struct. Multiplexing
889  makes it even more important to keep things associated with the right thing!
890
891<a name="connectdata"></a>
892## connectdata
893
894  A general idea in libcurl is to keep connections around in a connection
895  "cache" after they have been used in case they will be used again and then
896  re-use an existing one instead of creating a new as it creates a significant
897  performance boost.
898
899  Each `connectdata` identifies a single physical connection to a server. If
900  the connection can't be kept alive, the connection will be closed after use
901  and then this struct can be removed from the cache and freed.
902
903  Thus, the same `Curl_easy` can be used multiple times and each time select
904  another `connectdata` struct to use for the connection. Keep this in mind,
905  as it is then important to consider if options or choices are based on the
906  connection or the `Curl_easy`.
907
908  Functions in libcurl will assume that `connectdata->data` points to the
909  `Curl_easy` that uses this connection (for the moment).
910
911  As a special complexity, some protocols supported by libcurl require a
912  special disconnect procedure that is more than just shutting down the
913  socket. It can involve sending one or more commands to the server before
914  doing so. Since connections are kept in the connection cache after use, the
915  original `Curl_easy` may no longer be around when the time comes to shut down
916  a particular connection. For this purpose, libcurl holds a special dummy
917  `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
918
919  FTP uses two TCP connections for a typical transfer but it keeps both in
920  this single struct and thus can be considered a single connection for most
921  internal concerns.
922
923  The libcurl source code generally use the name `conn` for the variable that
924  points to the connectdata.
925
926<a name="Curl_multi"></a>
927## Curl_multi
928
929  Internally, the easy interface is implemented as a wrapper around multi
930  interface functions. This makes everything multi interface.
931
932  `Curl_multi` is the multi handle struct exposed as `CURLM *` in external
933  APIs.
934
935  This struct holds a list of `Curl_easy` structs that have been added to this
936  handle with [`curl_multi_add_handle()`][13]. The start of the list is
937  `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
938
939  `->msglist` is a linked list of messages to send back when
940  [`curl_multi_info_read()`][14] is called. Basically a node is added to that
941  list when an individual `Curl_easy`'s transfer has completed.
942
943  `->hostcache` points to the name cache. It is a hash table for looking up
944  name to IP. The nodes have a limited life time in there and this cache is
945  meant to reduce the time for when the same name is wanted within a short
946  period of time.
947
948  `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
949  until it should be checked - normally some sort of timeout. Each `Curl_easy`
950  has one node in the tree.
951
952  `->sockhash` is a hash table to allow fast lookups of socket descriptor for
953  which `Curl_easy` uses that descriptor. This is necessary for the
954  `multi_socket` API.
955
956  `->conn_cache` points to the connection cache. It keeps track of all
957  connections that are kept after use. The cache has a maximum size.
958
959  `->closure_handle` is described in the `connectdata` section.
960
961  The libcurl source code generally use the name `multi` for the variable that
962  points to the `Curl_multi` struct.
963
964<a name="Curl_handler"></a>
965## Curl_handler
966
967  Each unique protocol that is supported by libcurl needs to provide at least
968  one `Curl_handler` struct. It defines what the protocol is called and what
969  functions the main code should call to deal with protocol specific issues.
970  In general, there's a source file named `[protocol].c` in which there's a
971  `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's
972  then the main array with all individual `Curl_handler` structs pointed to
973  from a single array which is scanned through when a URL is given to libcurl
974  to work with.
975
976  `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
977  "HTTP" or "FTP" etc. SSL versions of the protocol need their own
978  `Curl_handler` setup so HTTPS separate from HTTP.
979
980  `->setup_connection` is called to allow the protocol code to allocate
981  protocol specific data that then gets associated with that `Curl_easy` for
982  the rest of this transfer. It gets freed again at the end of the transfer.
983  It will be called before the `connectdata` for the transfer has been
984  selected/created. Most protocols will allocate its private
985  `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it.
986
987  `->connect_it` allows a protocol to do some specific actions after the TCP
988  connect is done, that can still be considered part of the connection phase.
989
990  Some protocols will alter the `connectdata->recv[]` and
991  `connectdata->send[]` function pointers in this function.
992
993  `->connecting` is similarly a function that keeps getting called as long as
994  the protocol considers itself still in the connecting phase.
995
996  `->do_it` is the function called to issue the transfer request. What we call
997  the DO action internally. If the DO is not enough and things need to be kept
998  getting done for the entire DO sequence to complete, `->doing` is then
999  usually also provided. Each protocol that needs to do multiple commands or
1000  similar for do/doing need to implement their own state machines (see SCP,
1001  SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
1002  a separate piece of the DO state called `DO_MORE`.
1003
1004  `->doing` keeps getting called while issuing the transfer request command(s)
1005
1006  `->done` gets called when the transfer is complete and DONE. That's after the
1007  main data has been transferred.
1008
1009  `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
1010  this state when setting up the second connection.
1011
1012  `->proto_getsock`
1013  `->doing_getsock`
1014  `->domore_getsock`
1015  `->perform_getsock`
1016  Functions that return socket information. Which socket(s) to wait for which
1017  action(s) during the particular multi state.
1018
1019  `->disconnect` is called immediately before the TCP connection is shutdown.
1020
1021  `->readwrite` gets called during transfer to allow the protocol to do extra
1022  reads/writes
1023
1024  `->defport` is the default report TCP or UDP port this protocol uses
1025
1026  `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions
1027  have their "base" protocol set and then the SSL variation. Like
1028  "HTTP|HTTPS".
1029
1030  `->flags` is a bitmask with additional information about the protocol that will
1031  make it get treated differently by the generic engine:
1032
1033  - `PROTOPT_SSL` - will make it connect and negotiate SSL
1034
1035  - `PROTOPT_DUAL` - this protocol uses two connections
1036
1037  - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1038    connection. This flag is no longer used by code, yet still set for a bunch
1039    of protocol handlers.
1040
1041  - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1042    limit which "direction" of socket actions that the main engine will
1043    concern itself with.
1044
1045  - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`)
1046
1047  - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1048    one unless one is provided
1049
1050  - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1051    (?foo=bar)
1052
1053<a name="conncache"></a>
1054## conncache
1055
1056  Is a hash table with connections for later re-use. Each `Curl_easy` has a
1057  pointer to its connection cache. Each multi handle sets up a connection
1058  cache that all added `Curl_easy`s share by default.
1059
1060<a name="Curl_share"></a>
1061## Curl_share
1062
1063  The libcurl share API allocates a `Curl_share` struct, exposed to the
1064  external API as `CURLSH *`.
1065
1066  The idea is that the struct can have a set of its own versions of caches and
1067  pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1068  specific `Curl_easy`s will use the caches/pools that this share handle
1069  holds.
1070
1071  Then individual `Curl_easy` structs can be made to share specific things
1072  that they otherwise wouldn't, such as cookies.
1073
1074  The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1075  session cache.
1076
1077<a name="CookieInfo"></a>
1078## CookieInfo
1079
1080  This is the main cookie struct. It holds all known cookies and related
1081  information. Each `Curl_easy` has its own private `CookieInfo` even when
1082  they are added to a multi handle. They can be made to share cookies by using
1083  the share API.
1084
1085
1086[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1087[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
1088[3]: https://c-ares.haxx.se/
1089[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1090[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1091[6]: https://curl.haxx.se/docs/manpage.html#--compressed
1092[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1093[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1094[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1095[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1096[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
1097[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1098[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1099[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
1100[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
1101