1--- 2c: Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al. 3SPDX-License-Identifier: curl 4Title: curl_url_get 5Section: 3 6Source: libcurl 7See-also: 8 - CURLOPT_CURLU (3) 9 - curl_url (3) 10 - curl_url_cleanup (3) 11 - curl_url_dup (3) 12 - curl_url_set (3) 13 - curl_url_strerror (3) 14Protocol: 15 - All 16--- 17 18# NAME 19 20curl_url_get - extract a part from a URL 21 22# SYNOPSIS 23 24~~~c 25#include <curl/curl.h> 26 27CURLUcode curl_url_get(const CURLU *url, 28 CURLUPart part, 29 char **content, 30 unsigned int flags); 31~~~ 32 33# DESCRIPTION 34 35Given a *url* handle of a URL object, this function extracts an individual 36piece or the full URL from it. 37 38The *part* argument specifies which part to extract (see list below) and 39*content* points to a 'char *' to get updated to point to a newly 40allocated string with the contents. 41 42The *flags* argument is a bitmask with individual features. 43 44The returned content pointer must be freed with curl_free(3) after use. 45 46# FLAGS 47 48The flags argument is zero, one or more bits set in a bitmask. 49 50## CURLU_DEFAULT_PORT 51 52If the handle has no port stored, this option makes curl_url_get(3) 53return the default port for the used scheme. 54 55## CURLU_DEFAULT_SCHEME 56 57If the handle has no scheme stored, this option makes curl_url_get(3) 58return the default scheme instead of error. 59 60## CURLU_NO_DEFAULT_PORT 61 62Instructs curl_url_get(3) to not return a port number if it matches the 63default port for the scheme. 64 65## CURLU_URLDECODE 66 67Asks curl_url_get(3) to URL decode the contents before returning it. It 68does not decode the scheme, the port number or the full URL. 69 70The query component also gets plus-to-space conversion as a bonus when this 71bit is set. 72 73Note that this URL decoding is charset unaware and you get a zero terminated 74string back with data that could be intended for a particular encoding. 75 76If there are byte values lower than 32 in the decoded string, the get 77operation returns an error instead. 78 79## CURLU_URLENCODE 80 81If set, curl_url_get(3) URL encodes the hostname part when a full URL is 82retrieved. If not set (default), libcurl returns the URL with the hostname raw 83to support IDN names to appear as-is. IDN hostnames are typically using 84non-ASCII bytes that otherwise gets percent-encoded. 85 86Note that even when not asking for URL encoding, the '%' (byte 37) is URL 87encoded to make sure the hostname remains valid. 88 89## CURLU_PUNYCODE 90 91If set and *CURLU_URLENCODE* is not set, and asked to retrieve the 92**CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host 93name in its punycode version if it contains any non-ASCII octets (and is an 94IDN name). 95 96If libcurl is built without IDN capabilities, using this bit makes 97curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname contains 98anything outside the ASCII range. 99 100(Added in curl 7.88.0) 101 102## CURLU_PUNY2IDN 103 104If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL** 105parts, libcurl returns the hostname in its IDN (International Domain Name) 106UTF-8 version if it otherwise is a punycode version. If the punycode name 107cannot be converted to IDN correctly, libcurl returns 108*CURLUE_BAD_HOSTNAME*. 109 110If libcurl is built without IDN capabilities, using this bit makes 111curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname is using 112punycode. 113 114(Added in curl 8.3.0) 115 116## CURLU_GET_EMPTY 117 118When this flag is used in curl_url_get(), it makes the function return empty 119query and fragments parts or when used in the full URL. By default, libcurl 120otherwise considers empty parts non-existing. 121 122An empty query part is one where this is nothing following the question mark 123(before the possible fragment). An empty fragments part is one where there is 124nothing following the hash sign. 125 126(Added in curl 8.8.0) 127 128# PARTS 129 130## CURLUPART_URL 131 132When asked to return the full URL, curl_url_get(3) returns a normalized and 133possibly cleaned up version using all available URL parts. 134 135We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized" as 136possible since IDN allows hostnames to be written in many different ways that 137still end up the same punycode version. 138 139Zero-length queries and fragments are excluded from the URL unless 140CURLU_GET_EMPTY is set. 141 142## CURLUPART_SCHEME 143 144Scheme cannot be URL decoded on get. 145 146## CURLUPART_USER 147 148## CURLUPART_PASSWORD 149 150## CURLUPART_OPTIONS 151 152The options field is an optional field that might follow the password in the 153userinfo part. It is only recognized/used when parsing URLs for the following 154schemes: pop3, smtp and imap. The URL API still allows users to set and get 155this field independently of scheme when not parsing full URLs. 156 157## CURLUPART_HOST 158 159The hostname. If it is an IPv6 numeric address, the zone id is not part of it 160but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses 161are returned within brackets ([]). 162 163IPv6 names are normalized when set, which should make them as short as 164possible while maintaining correct syntax. 165 166## CURLUPART_ZONEID 167 168If the hostname is a numeric IPv6 address, this field might also be set. 169 170## CURLUPART_PORT 171 172A port cannot be URL decoded on get. This number is returned in a string just 173like all other parts. That string is guaranteed to hold a valid port number in 174ASCII using base 10. 175 176## CURLUPART_PATH 177 178The *part* is always at least a slash ('/') even if no path was supplied 179in the URL. A URL path always starts with a slash. 180 181## CURLUPART_QUERY 182 183The initial question mark that denotes the beginning of the query part is a 184delimiter only. It is not part of the query contents. 185 186A not-present query returns *part* set to NULL. 187 188A zero-length query returns *part* as NULL unless CURLU_GET_EMPTY is set. 189 190The query part gets pluses converted to space when asked to URL decode on get 191with the CURLU_URLDECODE bit. 192 193## CURLUPART_FRAGMENT 194 195The initial hash sign that denotes the beginning of the fragment is a 196delimiter only. It is not part of the fragment contents. 197 198A not-present fragment returns *part* set to NULL. 199 200A zero-length fragment returns *part* as NULL unless CURLU_GET_EMPTY is set. 201 202# EXAMPLE 203 204~~~c 205int main(void) 206{ 207 CURLUcode rc; 208 CURLU *url = curl_url(); 209 rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0); 210 if(!rc) { 211 char *scheme; 212 rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0); 213 if(!rc) { 214 printf("the scheme is %s\n", scheme); 215 curl_free(scheme); 216 } 217 curl_url_cleanup(url); 218 } 219} 220~~~ 221 222# AVAILABILITY 223 224Added in 7.62.0. CURLUPART_ZONEID was added in 7.65.0. 225 226# RETURN VALUE 227 228Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went 229fine. See the libcurl-errors(3) man page for the full list with 230descriptions. 231 232If this function returns an error, no URL part is returned. 233