• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2c: Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
3SPDX-License-Identifier: curl
4Title: curl_url_get
5Section: 3
6Source: libcurl
7See-also:
8  - CURLOPT_CURLU (3)
9  - curl_url (3)
10  - curl_url_cleanup (3)
11  - curl_url_dup (3)
12  - curl_url_set (3)
13  - curl_url_strerror (3)
14Protocol:
15  - All
16---
17
18# NAME
19
20curl_url_get - extract a part from a URL
21
22# SYNOPSIS
23
24~~~c
25#include <curl/curl.h>
26
27CURLUcode curl_url_get(const CURLU *url,
28                       CURLUPart part,
29                       char **content,
30                       unsigned int flags);
31~~~
32
33# DESCRIPTION
34
35Given a *url* handle of a URL object, this function extracts an individual
36piece or the full URL from it.
37
38The *part* argument specifies which part to extract (see list below) and
39*content* points to a 'char *' to get updated to point to a newly
40allocated string with the contents.
41
42The *flags* argument is a bitmask with individual features.
43
44The returned content pointer must be freed with curl_free(3) after use.
45
46# FLAGS
47
48The flags argument is zero, one or more bits set in a bitmask.
49
50## CURLU_DEFAULT_PORT
51
52If the handle has no port stored, this option makes curl_url_get(3)
53return the default port for the used scheme.
54
55## CURLU_DEFAULT_SCHEME
56
57If the handle has no scheme stored, this option makes curl_url_get(3)
58return the default scheme instead of error.
59
60## CURLU_NO_DEFAULT_PORT
61
62Instructs curl_url_get(3) to not return a port number if it matches the
63default port for the scheme.
64
65## CURLU_URLDECODE
66
67Asks curl_url_get(3) to URL decode the contents before returning it. It
68does not decode the scheme, the port number or the full URL.
69
70The query component also gets plus-to-space conversion as a bonus when this
71bit is set.
72
73Note that this URL decoding is charset unaware and you get a zero terminated
74string back with data that could be intended for a particular encoding.
75
76If there are byte values lower than 32 in the decoded string, the get
77operation returns an error instead.
78
79## CURLU_URLENCODE
80
81If set, curl_url_get(3) URL encodes the hostname part when a full URL is
82retrieved. If not set (default), libcurl returns the URL with the hostname raw
83to support IDN names to appear as-is. IDN hostnames are typically using
84non-ASCII bytes that otherwise gets percent-encoded.
85
86Note that even when not asking for URL encoding, the '%' (byte 37) is URL
87encoded to make sure the hostname remains valid.
88
89## CURLU_PUNYCODE
90
91If set and *CURLU_URLENCODE* is not set, and asked to retrieve the
92**CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host
93name in its punycode version if it contains any non-ASCII octets (and is an
94IDN name).
95
96If libcurl is built without IDN capabilities, using this bit makes
97curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname contains
98anything outside the ASCII range.
99
100(Added in curl 7.88.0)
101
102## CURLU_PUNY2IDN
103
104If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL**
105parts, libcurl returns the hostname in its IDN (International Domain Name)
106UTF-8 version if it otherwise is a punycode version. If the punycode name
107cannot be converted to IDN correctly, libcurl returns
108*CURLUE_BAD_HOSTNAME*.
109
110If libcurl is built without IDN capabilities, using this bit makes
111curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname is using
112punycode.
113
114(Added in curl 8.3.0)
115
116## CURLU_GET_EMPTY
117
118When this flag is used in curl_url_get(), it makes the function return empty
119query and fragments parts or when used in the full URL. By default, libcurl
120otherwise considers empty parts non-existing.
121
122An empty query part is one where this is nothing following the question mark
123(before the possible fragment). An empty fragments part is one where there is
124nothing following the hash sign.
125
126(Added in curl 8.8.0)
127
128# PARTS
129
130## CURLUPART_URL
131
132When asked to return the full URL, curl_url_get(3) returns a normalized and
133possibly cleaned up version using all available URL parts.
134
135We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized" as
136possible since IDN allows hostnames to be written in many different ways that
137still end up the same punycode version.
138
139Zero-length queries and fragments are excluded from the URL unless
140CURLU_GET_EMPTY is set.
141
142## CURLUPART_SCHEME
143
144Scheme cannot be URL decoded on get.
145
146## CURLUPART_USER
147
148## CURLUPART_PASSWORD
149
150## CURLUPART_OPTIONS
151
152The options field is an optional field that might follow the password in the
153userinfo part. It is only recognized/used when parsing URLs for the following
154schemes: pop3, smtp and imap. The URL API still allows users to set and get
155this field independently of scheme when not parsing full URLs.
156
157## CURLUPART_HOST
158
159The hostname. If it is an IPv6 numeric address, the zone id is not part of it
160but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses
161are returned within brackets ([]).
162
163IPv6 names are normalized when set, which should make them as short as
164possible while maintaining correct syntax.
165
166## CURLUPART_ZONEID
167
168If the hostname is a numeric IPv6 address, this field might also be set.
169
170## CURLUPART_PORT
171
172A port cannot be URL decoded on get. This number is returned in a string just
173like all other parts. That string is guaranteed to hold a valid port number in
174ASCII using base 10.
175
176## CURLUPART_PATH
177
178The *part* is always at least a slash ('/') even if no path was supplied
179in the URL. A URL path always starts with a slash.
180
181## CURLUPART_QUERY
182
183The initial question mark that denotes the beginning of the query part is a
184delimiter only. It is not part of the query contents.
185
186A not-present query returns *part* set to NULL.
187
188A zero-length query returns *part* as NULL unless CURLU_GET_EMPTY is set.
189
190The query part gets pluses converted to space when asked to URL decode on get
191with the CURLU_URLDECODE bit.
192
193## CURLUPART_FRAGMENT
194
195The initial hash sign that denotes the beginning of the fragment is a
196delimiter only. It is not part of the fragment contents.
197
198A not-present fragment returns *part* set to NULL.
199
200A zero-length fragment returns *part* as NULL unless CURLU_GET_EMPTY is set.
201
202# EXAMPLE
203
204~~~c
205int main(void)
206{
207  CURLUcode rc;
208  CURLU *url = curl_url();
209  rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0);
210  if(!rc) {
211    char *scheme;
212    rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0);
213    if(!rc) {
214      printf("the scheme is %s\n", scheme);
215      curl_free(scheme);
216    }
217    curl_url_cleanup(url);
218  }
219}
220~~~
221
222# AVAILABILITY
223
224Added in 7.62.0. CURLUPART_ZONEID was added in 7.65.0.
225
226# RETURN VALUE
227
228Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went
229fine. See the libcurl-errors(3) man page for the full list with
230descriptions.
231
232If this function returns an error, no URL part is returned.
233