1--- 2c: Copyright (C) Daniel Stenberg, <daniel.se>, et al. 3SPDX-License-Identifier: curl 4Title: curl_url_get 5Section: 3 6Source: libcurl 7See-also: 8 - CURLOPT_CURLU (3) 9 - curl_url (3) 10 - curl_url_cleanup (3) 11 - curl_url_dup (3) 12 - curl_url_set (3) 13 - curl_url_strerror (3) 14--- 15 16# NAME 17 18curl_url_get - extract a part from a URL 19 20# SYNOPSIS 21 22~~~c 23#include <curl/curl.h> 24 25CURLUcode curl_url_get(const CURLU *url, 26 CURLUPart part, 27 char **content, 28 unsigned int flags); 29~~~ 30 31# DESCRIPTION 32 33Given a *url* handle of a URL object, this function extracts an individual 34piece or the full URL from it. 35 36The *part* argument specifies which part to extract (see list below) and 37*content* points to a 'char *' to get updated to point to a newly 38allocated string with the contents. 39 40The *flags* argument is a bitmask with individual features. 41 42The returned content pointer must be freed with curl_free(3) after use. 43 44# FLAGS 45 46The flags argument is zero, one or more bits set in a bitmask. 47 48## CURLU_DEFAULT_PORT 49 50If the handle has no port stored, this option makes curl_url_get(3) 51return the default port for the used scheme. 52 53## CURLU_DEFAULT_SCHEME 54 55If the handle has no scheme stored, this option makes curl_url_get(3) 56return the default scheme instead of error. 57 58## CURLU_NO_DEFAULT_PORT 59 60Instructs curl_url_get(3) to not return a port number if it matches the 61default port for the scheme. 62 63## CURLU_URLDECODE 64 65Asks curl_url_get(3) to URL decode the contents before returning it. It 66does not decode the scheme, the port number or the full URL. 67 68The query component also gets plus-to-space conversion as a bonus when this 69bit is set. 70 71Note that this URL decoding is charset unaware and you get a zero terminated 72string back with data that could be intended for a particular encoding. 73 74If there are byte values lower than 32 in the decoded string, the get 75operation returns an error instead. 76 77## CURLU_URLENCODE 78 79If set, curl_url_get(3) URL encodes the hostname part when a full URL 80is retrieved. If not set (default), libcurl returns the URL with the host name 81"raw" to support IDN names to appear as-is. IDN host names are typically using 82non-ASCII bytes that otherwise gets percent-encoded. 83 84Note that even when not asking for URL encoding, the '%' (byte 37) is URL 85encoded to make sure the hostname remains valid. 86 87## CURLU_PUNYCODE 88 89If set and *CURLU_URLENCODE* is not set, and asked to retrieve the 90**CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host 91name in its punycode version if it contains any non-ASCII octets (and is an 92IDN name). 93 94If libcurl is built without IDN capabilities, using this bit makes 95curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname contains 96anything outside the ASCII range. 97 98(Added in curl 7.88.0) 99 100## CURLU_PUNY2IDN 101 102If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL** 103parts, libcurl returns the hostname in its IDN (International Domain Name) 104UTF-8 version if it otherwise is a punycode version. If the punycode name 105cannot be converted to IDN correctly, libcurl returns 106*CURLUE_BAD_HOSTNAME*. 107 108If libcurl is built without IDN capabilities, using this bit makes 109curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname is using 110punycode. 111 112(Added in curl 8.3.0) 113 114# PARTS 115 116## CURLUPART_URL 117 118When asked to return the full URL, curl_url_get(3) returns a normalized 119and possibly cleaned up version using all available URL parts. 120 121We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized" 122as possible since IDN allows host names to be written in many different ways 123that still end up the same punycode version. 124 125## CURLUPART_SCHEME 126 127Scheme cannot be URL decoded on get. 128 129## CURLUPART_USER 130 131## CURLUPART_PASSWORD 132 133## CURLUPART_OPTIONS 134 135The options field is an optional field that might follow the password in the 136userinfo part. It is only recognized/used when parsing URLs for the following 137schemes: pop3, smtp and imap. The URL API still allows users to set and get 138this field independently of scheme when not parsing full URLs. 139 140## CURLUPART_HOST 141 142The hostname. If it is an IPv6 numeric address, the zone id is not part of it 143but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses 144are returned within brackets ([]). 145 146IPv6 names are normalized when set, which should make them as short as 147possible while maintaining correct syntax. 148 149## CURLUPART_ZONEID 150 151If the hostname is a numeric IPv6 address, this field might also be set. 152 153## CURLUPART_PORT 154 155A port cannot be URL decoded on get. This number is returned in a string just 156like all other parts. That string is guaranteed to hold a valid port number in 157ASCII using base 10. 158 159## CURLUPART_PATH 160 161The *part* is always at least a slash ('/') even if no path was supplied 162in the URL. A URL path always starts with a slash. 163 164## CURLUPART_QUERY 165 166The initial question mark that denotes the beginning of the query part is a 167delimiter only. It is not part of the query contents. 168 169A not-present query returns *part* set to NULL. 170A zero-length query returns *part* as a zero-length string. 171 172The query part gets pluses converted to space when asked to URL decode on get 173with the CURLU_URLDECODE bit. 174 175## CURLUPART_FRAGMENT 176 177The initial hash sign that denotes the beginning of the fragment is a 178delimiter only. It is not part of the fragment contents. 179 180# EXAMPLE 181 182~~~c 183int main(void) 184{ 185 CURLUcode rc; 186 CURLU *url = curl_url(); 187 rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0); 188 if(!rc) { 189 char *scheme; 190 rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0); 191 if(!rc) { 192 printf("the scheme is %s\n", scheme); 193 curl_free(scheme); 194 } 195 curl_url_cleanup(url); 196 } 197} 198~~~ 199 200# AVAILABILITY 201 202Added in 7.62.0. CURLUPART_ZONEID was added in 7.65.0. 203 204# RETURN VALUE 205 206Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went 207fine. See the libcurl-errors(3) man page for the full list with 208descriptions. 209 210If this function returns an error, no URL part is returned. 211