• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# URL syntax and their use in curl
2
3## Specifications
4
5The official "URL syntax" is primarily defined in these two different
6specifications:
7
8 - [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) (although URL is called
9   "URI" in there)
10 - [The WHATWG URL Specification](https://url.spec.whatwg.org/)
11
12RFC 3986 is the earlier one, and curl has always tried to adhere to that one
13(since it shipped in January 2005).
14
15The WHATWG URL spec was written later, is incompatible with the RFC 3986 and
16changes over time.
17
18## Variations
19
20URL parsers as implemented in browsers, libraries and tools usually opt to
21support one of the mentioned specifications. Bugs, differences in
22interpretations and the moving nature of the WHATWG spec does however make it
23unlikely that multiple parsers treat URLs the same way.
24
25## Security
26
27Due to the inherent differences between URL parser implementations, it is
28considered a security risk to mix different implementations and assume the
29same behavior!
30
31For example, if you use one parser to check if a URL uses a good host name or
32the correct auth field, and then pass on that same URL to a *second* parser,
33there will always be a risk it treats the same URL differently. There is no
34right and wrong in URL land, only differences of opinions.
35
36libcurl offers a separate API to its URL parser for this reason, among others.
37
38Applications may at times find it convenient to allow users to specify URLs
39for various purposes and that string would then end up fed to curl. Getting a
40URL from an external untrusted party and using it with curl brings several
41security concerns:
42
431. If you have an application that runs as or in a server application, getting
44   an unfiltered URL can trick your application to access a local resource
45   instead of a remote resource. Protecting yourself against localhost accesses
46   is hard when accepting user provided URLs.
47
482. Such custom URLs can access other ports than you planned as port numbers
49   are part of the regular URL format. The combination of a local host and a
50   custom port number can allow external users to play tricks with your local
51   services.
52
533. Such a URL might use other schemes than you thought of or planned for.
54
55## "RFC 3986 plus"
56
57curl recognizes a URL syntax that we call "RFC 3986 plus". It is grounded on
58the well established RFC 3986 to make sure previously written command lines and
59curl using scripts will remain working.
60
61curl's URL parser allows a few deviations from the spec in order to
62inter-operate better with URLs that appear in the wild.
63
64### spaces
65
66A URL provided to curl cannot contain spaces. They need to be provided URL
67encoded to be accepted in a URL by curl.
68
69An exception to this rule: `Location:` response headers that indicate to a
70client where a resource has been redirected to, sometimes contain spaces. This
71is a violation of RFC 3986 but is fine in the WHATWG spec. curl handles these
72by re-encoding them to `%20`.
73
74### non-ASCII
75
76Byte values in a provided URL that are outside of the printable ASCII range
77are percent-encoded by curl.
78
79### multiple slashes
80
81An absolute URL always starts with a "scheme" followed by a colon. For all the
82schemes curl supports, the colon must be followed by two slashes according to
83RFC 3986 but not according to the WHATWG spec - which allows one to infinity
84amount.
85
86curl allows one, two or three slashes after the colon to still be considered a
87valid URL.
88
89### "scheme-less"
90
91curl supports "URLs" that do not start with a scheme. This is not supported by
92any of the specifications. This is a shortcut to entering URLs that was
93supported by browsers early on and has been mimicked by curl.
94
95Based on what the host name starts with, curl will "guess" what protocol to
96use:
97
98 - `ftp.` means FTP
99 - `dict.` means DICT
100 - `ldap.` means LDAP
101 - `imap.` means IMAP
102 - `smtp.` means SMTP
103 - `pop3.` means POP3
104 - all other means HTTP
105
106### globbing letters
107
108The curl command line tool supports "globbing" of URLs. It means that you can
109create ranges and lists using `[N-M]` and `{one,two,three}` sequences. The
110letters used for this (`[]{}`) are reserved in RFC 3986 and can therefore not
111legitimately be part of such a URL.
112
113They are however not reserved or special in the WHATWG specification, so
114globbing can mess up such URLs. Globbing can be turned off for such occasions
115(using `--globoff`).
116
117# URL syntax details
118
119A URL may consist of the following components - many of them are optional:
120
121    [scheme][divider][userinfo][hostname][port number][path][query][fragment]
122
123Each component is separated from the following component with a divider
124character or string.
125
126For example, this could look like:
127
128    http://user:password@www.example.com:80/index.html?foo=bar#top
129
130## Scheme
131
132The scheme specifies the protocol to use. A curl build can support a few or
133many different schemes. You can limit what schemes curl should accept.
134
135curl supports the following schemes on URLs specified to transfer. They are
136matched case insensitively:
137
138`dict`, `file`, `ftp`, `ftps`, `gopher`, `gophers`, `http`, `https`, `imap`,
139`imaps`, `ldap`, `ldaps`, `mqtt`, `pop3`, `pop3s`, `rtmp`, `rtmpe`, `rtmps`,
140`rtmpt`, `rtmpte`, `rtmpts`, `rtsp`, `smb`, `smbs`, `smtp`, `smtps`, `telnet`,
141`tftp`
142
143When the URL is specified to identify a proxy, curl recognizes the following
144schemes:
145
146`http`, `https`, `socks4`, `socks4a`, `socks5`, `socks5h`, `socks`
147
148## Userinfo
149
150The userinfo field can be used to set user name and password for
151authentication purposes in this transfer. The use of this field is discouraged
152since it often means passing around the password in plain text and is thus a
153security risk.
154
155URLs for IMAP, POP3 and SMTP also support *login options* as part of the
156userinfo field. They are provided as a semicolon after the password and then
157the options.
158
159## Hostname
160
161The hostname part of the URL contains the address of the server that you want
162to connect to. This can be the fully qualified domain name of the server, the
163local network name of the machine on your network or the IP address of the
164server or machine represented by either an IPv4 or IPv6 address (within
165brackets). For example:
166
167    http://www.example.com/
168
169    http://hostname/
170
171    http://192.168.0.1/
172
173    http://[2001:1890:1112:1::20]/
174
175### "localhost"
176
177Starting in curl 7.77.0, curl uses loopback IP addresses for the name
178`localhost`: `127.0.0.1` and `::1`. It does not resolve the name using the
179resolver functions.
180
181This is done to make sure the host accessed is truly the localhost - the local
182machine.
183
184### IDNA
185
186If curl was built with International Domain Name (IDN) support, it can also
187handle host names using non-ASCII characters.
188
189When built with libidn2, curl uses the IDNA 2008 standard. This is equivalent
190to the WHATWG URL spec, but differs from certain browsers that use IDNA 2003
191Transitional Processing. The two standards have a huge overlap but differ
192slightly, perhaps most famously in how they deal with the German "double s"
193(`ß`).
194
195When winidn is used, curl uses IDNA 2003 Transitional Processing, like the rest
196of Windows.
197
198## Port number
199
200If there is a colon after the hostname, that should be followed by the port
201number to use. 1 - 65535. curl also supports a blank port number field - but
202only if the URL starts with a scheme.
203
204If the port number is not specified in the URL, curl will used a default port
205based on the provide scheme:
206
207DICT 2628, FTP 21, FTPS 990, GOPHER 70, GOPHERS 70, HTTP 80, HTTPS 443,
208IMAP 132, IMAPS 993, LDAP 369, LDAPS 636, MQTT 1883, POP3 110, POP3S 995,
209RTMP 1935, RTMPS 443, RTMPT 80, RTSP 554, SCP 22, SFTP 22, SMB 445, SMBS 445,
210SMTP 25, SMTPS 465, TELNET 23, TFTP 69
211
212# Scheme specific behaviors
213
214## FTP
215
216The path part of an FTP request specifies the file to retrieve and from which
217directory. If the file part is omitted then libcurl downloads the directory
218listing for the directory specified. If the directory is omitted then the
219directory listing for the root / home directory will be returned.
220
221FTP servers typically put the user in its "home directory" after login, which
222then differs between users. To explicitly specify the root directory of an FTP
223server, start the path with double slash `//` or `/%2f` (2F is the hexadecimal
224value of the ascii code for the slash).
225
226## FILE
227
228When a `FILE://` URL is accessed on Windows systems, it can be crafted in a
229way so that Windows attempts to connect to a (remote) machine when curl wants
230to read or write such a path.
231
232curl only allows the hostname part of a FILE URL to be one out of these three
233alternatives: `localhost`, `127.0.0.1` or blank ("", zero characters).
234Anything else will make curl fail to parse the URL.
235
236### Windows-specific FILE details
237
238curl accepts that the FILE URL's path starts with a "drive letter". That is a
239single letter `a` to `z` followed by a colon or a pipe character (`|`).
240
241The Windows operating system itself will convert some file accesses to perform
242network accesses over SMB/CIFS, through several different file path patterns.
243This way, a `file://` URL passed to curl *might* be converted into a network
244access inadvertently and unknowingly to curl. This is a Windows feature curl
245cannot control or disable.
246
247## IMAP
248
249The path part of an IMAP request not only specifies the mailbox to list or
250select, but can also be used to check the `UIDVALIDITY` of the mailbox, to
251specify the `UID`, `SECTION` and `PARTIAL` octets of the message to fetch and
252to specify what messages to search for.
253
254A top level folder list:
255
256    imap://user:password@mail.example.com
257
258A folder list on the user's inbox:
259
260    imap://user:password@mail.example.com/INBOX
261
262Select the user's inbox and fetch message with `uid = 1`:
263
264    imap://user:password@mail.example.com/INBOX/;UID=1
265
266Select the user's inbox and fetch the first message in the mail box:
267
268    imap://user:password@mail.example.com/INBOX/;MAILINDEX=1
269
270Select the user's inbox, check the `UIDVALIDITY` of the mailbox is 50 and
271fetch message 2 if it is:
272
273    imap://user:password@mail.example.com/INBOX;UIDVALIDITY=50/;UID=2
274
275Select the user's inbox and fetch the text portion of message 3:
276
277    imap://user:password@mail.example.com/INBOX/;UID=3/;SECTION=TEXT
278
279Select the user's inbox and fetch the first 1024 octets of message 4:
280
281    imap://user:password@mail.example.com/INBOX/;UID=4/;PARTIAL=0.1024
282
283Select the user's inbox and check for NEW messages:
284
285    imap://user:password@mail.example.com/INBOX?NEW
286
287Select the user's inbox and search for messages containing "shadows" in the
288subject line:
289
290    imap://user:password@mail.example.com/INBOX?SUBJECT%20shadows
291
292Searching via the query part of the URL `?` is a search request for the
293results to be returned as message sequence numbers (`MAILINDEX`). It is
294possible to make a search request for results to be returned as unique ID
295numbers (`UID`) by using a custom curl request via `-X`. `UID` numbers are
296unique per session (and multiple sessions when `UIDVALIDITY` is the same). For
297example, if you are searching for `"foo bar"` in header+body (`TEXT`) and you
298want the matching `MAILINDEX` numbers returned then you could search via URL:
299
300    imap://user:password@mail.example.com/INBOX?TEXT%20%22foo%20bar%22
301
302If you want matching `UID` numbers you have to use a custom request:
303
304    imap://user:password@mail.example.com/INBOX -X "UID SEARCH TEXT \"foo bar\""
305
306For more information about IMAP commands please see RFC 9051. For more
307information about the individual components of an IMAP URL please see RFC 5092.
308
309* Note old curl versions would `FETCH` by message sequence number when `UID`
310was specified in the URL. That was a bug fixed in 7.62.0, which added
311`MAILINDEX` to `FETCH` by mail sequence number.
312
313## LDAP
314
315The path part of a LDAP request can be used to specify the: Distinguished
316Name, Attributes, Scope, Filter and Extension for a LDAP search. Each field is
317separated by a question mark and when that field is not required an empty
318string with the question mark separator should be included.
319
320Search for the `DN` as `My Organization`:
321
322    ldap://ldap.example.com/o=My%20Organization
323
324the same search but will only return `postalAddress` attributes:
325
326    ldap://ldap.example.com/o=My%20Organization?postalAddress
327
328Search for an empty `DN` and request information about the
329`rootDomainNamingContext` attribute for an Active Directory server:
330
331    ldap://ldap.example.com/?rootDomainNamingContext
332
333For more information about the individual components of a LDAP URL please
334see [RFC 4516](https://datatracker.ietf.org/doc/html/rfc4516).
335
336## POP3
337
338The path part of a POP3 request specifies the message ID to retrieve. If the
339ID is not specified then a list of waiting messages is returned instead.
340
341## SCP
342
343The path part of an SCP URL specifies the path and file to retrieve or
344upload. The file is taken as an absolute path from the root directory on the
345server.
346
347To specify a path relative to the user's home directory on the server, prepend
348`~/` to the path portion.
349
350## SFTP
351
352The path part of an SFTP URL specifies the file to retrieve or upload. If the
353path ends with a slash (`/`) then a directory listing is returned instead of a
354file. If the path is omitted entirely then the directory listing for the root
355/ home directory will be returned.
356
357## SMB
358The path part of a SMB request specifies the file to retrieve and from what
359share and directory or the share to upload to and as such, may not be omitted.
360If the user name is embedded in the URL then it must contain the domain name
361and as such, the backslash must be URL encoded as %2f.
362
363When uploading to SMB, the size of the file needs to be known ahead of time,
364meaning that you can upload a file passed to curl over a pipe like stdin.
365
366curl supports SMB version 1 (only)
367
368## SMTP
369
370The path part of a SMTP request specifies the host name to present during
371communication with the mail server. If the path is omitted, then libcurl will
372attempt to resolve the local computer's host name. However, this may not
373return the fully qualified domain name that is required by some mail servers
374and specifying this path allows you to set an alternative name, such as your
375machine's fully qualified domain name, which you might have obtained from an
376external function such as gethostname or getaddrinfo.
377
378The default smtp port is 25. Some servers use port 587 as an alternative.
379
380## RTMP
381
382There is no official URL spec for RTMP so libcurl uses the URL syntax supported
383by the underlying librtmp library. It has a syntax where it wants a
384traditional URL, followed by a space and a series of space-separated
385`name=value` pairs.
386
387While space is not typically a "legal" letter, libcurl accepts them. When a
388user wants to pass in a `#` (hash) character it will be treated as a fragment
389and get cut off by libcurl if provided literally. You will instead have to
390escape it by providing it as backslash and its ASCII value in hexadecimal:
391`\23`.
392