1# URL syntax and their use in curl 2 3## Specifications 4 5The official "URL syntax" is primarily defined in these two different 6specifications: 7 8 - [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) (although URL is called 9 "URI" in there) 10 - [The WHATWG URL Specification](https://url.spec.whatwg.org/) 11 12RFC 3986 is the earlier one, and curl has always tried to adhere to that one 13(since it shipped in January 2005). 14 15The WHATWG URL spec was written later, is incompatible with the RFC 3986 and 16changes over time. 17 18## Variations 19 20URL parsers as implemented in browsers, libraries and tools usually opt to 21support one of the mentioned specifications. Bugs, differences in 22interpretations and the moving nature of the WHATWG spec does however make it 23unlikely that multiple parsers treat URLs the same way. 24 25## Security 26 27Due to the inherent differences between URL parser implementations, it is 28considered a security risk to mix different implementations and assume the 29same behavior! 30 31For example, if you use one parser to check if a URL uses a good host name or 32the correct auth field, and then pass on that same URL to a *second* parser, 33there will always be a risk it treats the same URL differently. There is no 34right and wrong in URL land, only differences of opinions. 35 36libcurl offers a separate API to its URL parser for this reason, among others. 37 38Applications may at times find it convenient to allow users to specify URLs 39for various purposes and that string would then end up fed to curl. Getting a 40URL from an external untrusted party and using it with curl brings several 41security concerns: 42 431. If you have an application that runs as or in a server application, getting 44 an unfiltered URL can trick your application to access a local resource 45 instead of a remote resource. Protecting yourself against localhost accesses 46 is hard when accepting user provided URLs. 47 482. Such custom URLs can access other ports than you planned as port numbers 49 are part of the regular URL format. The combination of a local host and a 50 custom port number can allow external users to play tricks with your local 51 services. 52 533. Such a URL might use other schemes than you thought of or planned for. 54 55## "RFC 3986 plus" 56 57curl recognizes a URL syntax that we call "RFC 3986 plus". It is grounded on 58the well established RFC 3986 to make sure previously written command lines and 59curl using scripts will remain working. 60 61curl's URL parser allows a few deviations from the spec in order to 62inter-operate better with URLs that appear in the wild. 63 64### spaces 65 66A URL provided to curl cannot contain spaces. They need to be provided URL 67encoded to be accepted in a URL by curl. 68 69An exception to this rule: `Location:` response headers that indicate to a 70client where a resource has been redirected to, sometimes contain spaces. This 71is a violation of RFC 3986 but is fine in the WHATWG spec. curl handles these 72by re-encoding them to `%20`. 73 74### non-ASCII 75 76Byte values in a provided URL that are outside of the printable ASCII range 77are percent-encoded by curl. 78 79### multiple slashes 80 81An absolute URL always starts with a "scheme" followed by a colon. For all the 82schemes curl supports, the colon must be followed by two slashes according to 83RFC 3986 but not according to the WHATWG spec - which allows one to infinity 84amount. 85 86curl allows one, two or three slashes after the colon to still be considered a 87valid URL. 88 89### "scheme-less" 90 91curl supports "URLs" that do not start with a scheme. This is not supported by 92any of the specifications. This is a shortcut to entering URLs that was 93supported by browsers early on and has been mimicked by curl. 94 95Based on what the host name starts with, curl will "guess" what protocol to 96use: 97 98 - `ftp.` means FTP 99 - `dict.` means DICT 100 - `ldap.` means LDAP 101 - `imap.` means IMAP 102 - `smtp.` means SMTP 103 - `pop3.` means POP3 104 - all other means HTTP 105 106### globbing letters 107 108The curl command line tool supports "globbing" of URLs. It means that you can 109create ranges and lists using `[N-M]` and `{one,two,three}` sequences. The 110letters used for this (`[]{}`) are reserved in RFC 3986 and can therefore not 111legitimately be part of such a URL. 112 113They are however not reserved or special in the WHATWG specification, so 114globbing can mess up such URLs. Globbing can be turned off for such occasions 115(using `--globoff`). 116 117# URL syntax details 118 119A URL may consist of the following components - many of them are optional: 120 121 [scheme][divider][userinfo][hostname][port number][path][query][fragment] 122 123Each component is separated from the following component with a divider 124character or string. 125 126For example, this could look like: 127 128 http://user:password@www.example.com:80/index.html?foo=bar#top 129 130## Scheme 131 132The scheme specifies the protocol to use. A curl build can support a few or 133many different schemes. You can limit what schemes curl should accept. 134 135curl supports the following schemes on URLs specified to transfer. They are 136matched case insensitively: 137 138`dict`, `file`, `ftp`, `ftps`, `gopher`, `gophers`, `http`, `https`, `imap`, 139`imaps`, `ldap`, `ldaps`, `mqtt`, `pop3`, `pop3s`, `rtmp`, `rtmpe`, `rtmps`, 140`rtmpt`, `rtmpte`, `rtmpts`, `rtsp`, `smb`, `smbs`, `smtp`, `smtps`, `telnet`, 141`tftp` 142 143When the URL is specified to identify a proxy, curl recognizes the following 144schemes: 145 146`http`, `https`, `socks4`, `socks4a`, `socks5`, `socks5h`, `socks` 147 148## Userinfo 149 150The userinfo field can be used to set user name and password for 151authentication purposes in this transfer. The use of this field is discouraged 152since it often means passing around the password in plain text and is thus a 153security risk. 154 155URLs for IMAP, POP3 and SMTP also support *login options* as part of the 156userinfo field. They are provided as a semicolon after the password and then 157the options. 158 159## Hostname 160 161The hostname part of the URL contains the address of the server that you want 162to connect to. This can be the fully qualified domain name of the server, the 163local network name of the machine on your network or the IP address of the 164server or machine represented by either an IPv4 or IPv6 address (within 165brackets). For example: 166 167 http://www.example.com/ 168 169 http://hostname/ 170 171 http://192.168.0.1/ 172 173 http://[2001:1890:1112:1::20]/ 174 175### "localhost" 176 177Starting in curl 7.77.0, curl uses loopback IP addresses for the name 178`localhost`: `127.0.0.1` and `::1`. It does not resolve the name using the 179resolver functions. 180 181This is done to make sure the host accessed is truly the localhost - the local 182machine. 183 184### IDNA 185 186If curl was built with International Domain Name (IDN) support, it can also 187handle host names using non-ASCII characters. 188 189When built with libidn2, curl uses the IDNA 2008 standard. This is equivalent 190to the WHATWG URL spec, but differs from certain browsers that use IDNA 2003 191Transitional Processing. The two standards have a huge overlap but differ 192slightly, perhaps most famously in how they deal with the German "double s" 193(`ß`). 194 195When winidn is used, curl uses IDNA 2003 Transitional Processing, like the rest 196of Windows. 197 198## Port number 199 200If there is a colon after the hostname, that should be followed by the port 201number to use. 1 - 65535. curl also supports a blank port number field - but 202only if the URL starts with a scheme. 203 204If the port number is not specified in the URL, curl will used a default port 205based on the provide scheme: 206 207DICT 2628, FTP 21, FTPS 990, GOPHER 70, GOPHERS 70, HTTP 80, HTTPS 443, 208IMAP 132, IMAPS 993, LDAP 369, LDAPS 636, MQTT 1883, POP3 110, POP3S 995, 209RTMP 1935, RTMPS 443, RTMPT 80, RTSP 554, SCP 22, SFTP 22, SMB 445, SMBS 445, 210SMTP 25, SMTPS 465, TELNET 23, TFTP 69 211 212# Scheme specific behaviors 213 214## FTP 215 216The path part of an FTP request specifies the file to retrieve and from which 217directory. If the file part is omitted then libcurl downloads the directory 218listing for the directory specified. If the directory is omitted then the 219directory listing for the root / home directory will be returned. 220 221FTP servers typically put the user in its "home directory" after login, which 222then differs between users. To explicitly specify the root directory of an FTP 223server, start the path with double slash `//` or `/%2f` (2F is the hexadecimal 224value of the ascii code for the slash). 225 226## FILE 227 228When a `FILE://` URL is accessed on Windows systems, it can be crafted in a 229way so that Windows attempts to connect to a (remote) machine when curl wants 230to read or write such a path. 231 232curl only allows the hostname part of a FILE URL to be one out of these three 233alternatives: `localhost`, `127.0.0.1` or blank ("", zero characters). 234Anything else will make curl fail to parse the URL. 235 236### Windows-specific FILE details 237 238curl accepts that the FILE URL's path starts with a "drive letter". That is a 239single letter `a` to `z` followed by a colon or a pipe character (`|`). 240 241The Windows operating system itself will convert some file accesses to perform 242network accesses over SMB/CIFS, through several different file path patterns. 243This way, a `file://` URL passed to curl *might* be converted into a network 244access inadvertently and unknowingly to curl. This is a Windows feature curl 245cannot control or disable. 246 247## IMAP 248 249The path part of an IMAP request not only specifies the mailbox to list or 250select, but can also be used to check the `UIDVALIDITY` of the mailbox, to 251specify the `UID`, `SECTION` and `PARTIAL` octets of the message to fetch and 252to specify what messages to search for. 253 254A top level folder list: 255 256 imap://user:password@mail.example.com 257 258A folder list on the user's inbox: 259 260 imap://user:password@mail.example.com/INBOX 261 262Select the user's inbox and fetch message with `uid = 1`: 263 264 imap://user:password@mail.example.com/INBOX/;UID=1 265 266Select the user's inbox and fetch the first message in the mail box: 267 268 imap://user:password@mail.example.com/INBOX/;MAILINDEX=1 269 270Select the user's inbox, check the `UIDVALIDITY` of the mailbox is 50 and 271fetch message 2 if it is: 272 273 imap://user:password@mail.example.com/INBOX;UIDVALIDITY=50/;UID=2 274 275Select the user's inbox and fetch the text portion of message 3: 276 277 imap://user:password@mail.example.com/INBOX/;UID=3/;SECTION=TEXT 278 279Select the user's inbox and fetch the first 1024 octets of message 4: 280 281 imap://user:password@mail.example.com/INBOX/;UID=4/;PARTIAL=0.1024 282 283Select the user's inbox and check for NEW messages: 284 285 imap://user:password@mail.example.com/INBOX?NEW 286 287Select the user's inbox and search for messages containing "shadows" in the 288subject line: 289 290 imap://user:password@mail.example.com/INBOX?SUBJECT%20shadows 291 292Searching via the query part of the URL `?` is a search request for the 293results to be returned as message sequence numbers (`MAILINDEX`). It is 294possible to make a search request for results to be returned as unique ID 295numbers (`UID`) by using a custom curl request via `-X`. `UID` numbers are 296unique per session (and multiple sessions when `UIDVALIDITY` is the same). For 297example, if you are searching for `"foo bar"` in header+body (`TEXT`) and you 298want the matching `MAILINDEX` numbers returned then you could search via URL: 299 300 imap://user:password@mail.example.com/INBOX?TEXT%20%22foo%20bar%22 301 302If you want matching `UID` numbers you have to use a custom request: 303 304 imap://user:password@mail.example.com/INBOX -X "UID SEARCH TEXT \"foo bar\"" 305 306For more information about IMAP commands please see RFC 9051. For more 307information about the individual components of an IMAP URL please see RFC 5092. 308 309* Note old curl versions would `FETCH` by message sequence number when `UID` 310was specified in the URL. That was a bug fixed in 7.62.0, which added 311`MAILINDEX` to `FETCH` by mail sequence number. 312 313## LDAP 314 315The path part of a LDAP request can be used to specify the: Distinguished 316Name, Attributes, Scope, Filter and Extension for a LDAP search. Each field is 317separated by a question mark and when that field is not required an empty 318string with the question mark separator should be included. 319 320Search for the `DN` as `My Organization`: 321 322 ldap://ldap.example.com/o=My%20Organization 323 324the same search but will only return `postalAddress` attributes: 325 326 ldap://ldap.example.com/o=My%20Organization?postalAddress 327 328Search for an empty `DN` and request information about the 329`rootDomainNamingContext` attribute for an Active Directory server: 330 331 ldap://ldap.example.com/?rootDomainNamingContext 332 333For more information about the individual components of a LDAP URL please 334see [RFC 4516](https://datatracker.ietf.org/doc/html/rfc4516). 335 336## POP3 337 338The path part of a POP3 request specifies the message ID to retrieve. If the 339ID is not specified then a list of waiting messages is returned instead. 340 341## SCP 342 343The path part of an SCP URL specifies the path and file to retrieve or 344upload. The file is taken as an absolute path from the root directory on the 345server. 346 347To specify a path relative to the user's home directory on the server, prepend 348`~/` to the path portion. 349 350## SFTP 351 352The path part of an SFTP URL specifies the file to retrieve or upload. If the 353path ends with a slash (`/`) then a directory listing is returned instead of a 354file. If the path is omitted entirely then the directory listing for the root 355/ home directory will be returned. 356 357## SMB 358The path part of a SMB request specifies the file to retrieve and from what 359share and directory or the share to upload to and as such, may not be omitted. 360If the user name is embedded in the URL then it must contain the domain name 361and as such, the backslash must be URL encoded as %2f. 362 363When uploading to SMB, the size of the file needs to be known ahead of time, 364meaning that you can upload a file passed to curl over a pipe like stdin. 365 366curl supports SMB version 1 (only) 367 368## SMTP 369 370The path part of a SMTP request specifies the host name to present during 371communication with the mail server. If the path is omitted, then libcurl will 372attempt to resolve the local computer's host name. However, this may not 373return the fully qualified domain name that is required by some mail servers 374and specifying this path allows you to set an alternative name, such as your 375machine's fully qualified domain name, which you might have obtained from an 376external function such as gethostname or getaddrinfo. 377 378The default smtp port is 25. Some servers use port 587 as an alternative. 379 380## RTMP 381 382There is no official URL spec for RTMP so libcurl uses the URL syntax supported 383by the underlying librtmp library. It has a syntax where it wants a 384traditional URL, followed by a space and a series of space-separated 385`name=value` pairs. 386 387While space is not typically a "legal" letter, libcurl accepts them. When a 388user wants to pass in a `#` (hash) character it will be treated as a fragment 389and get cut off by libcurl if provided literally. You will instead have to 390escape it by providing it as backslash and its ASCII value in hexadecimal: 391`\23`. 392