• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`urllib.parse` --- Parse URLs into components
2==================================================
3
4.. module:: urllib.parse
5   :synopsis: Parse URLs into or assemble them from components.
6
7**Source code:** :source:`Lib/urllib/parse.py`
8
9.. index::
10   single: WWW
11   single: World Wide Web
12   single: URL
13   pair: URL; parsing
14   pair: relative; URL
15
16--------------
17
18This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the Internet RFC on Relative Uniform
24Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
28``wais``, ``ws``, ``wss``.
29
30The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
39
40.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
41
42   Parse a URL into six components, returning a 6-item :term:`named tuple`.  This
43   corresponds to the general structure of a URL:
44   ``scheme://netloc/path;parameters?query#fragment``.
45   Each tuple item is a string, possibly empty. The components are not broken up
46   into smaller parts (for example, the network location is a single string), and %
47   escapes are not expanded. The delimiters as shown above are not part of the
48   result, except for a leading slash in the *path* component, which is retained if
49   present.  For example:
50
51      >>> from urllib.parse import urlparse
52      >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
53      >>> o   # doctest: +NORMALIZE_WHITESPACE
54      ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55                  params='', query='', fragment='')
56      >>> o.scheme
57      'http'
58      >>> o.port
59      80
60      >>> o.geturl()
61      'http://www.cwi.nl:80/%7Eguido/Python.html'
62
63   Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64   a netloc only if it is properly introduced by '//'.  Otherwise the
65   input is presumed to be a relative URL and thus to start with
66   a path component.
67
68   .. doctest::
69      :options: +NORMALIZE_WHITESPACE
70
71      >>> from urllib.parse import urlparse
72      >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
73      ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
74                  params='', query='', fragment='')
75      >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
76      ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
77                  params='', query='', fragment='')
78      >>> urlparse('help/Python.html')
79      ParseResult(scheme='', netloc='', path='help/Python.html', params='',
80                  query='', fragment='')
81
82   The *scheme* argument gives the default addressing scheme, to be
83   used only if the URL does not specify one.  It should be the same type
84   (text or bytes) as *urlstring*, except that the default value ``''`` is
85   always allowed, and is automatically converted to ``b''`` if appropriate.
86
87   If the *allow_fragments* argument is false, fragment identifiers are not
88   recognized.  Instead, they are parsed as part of the path, parameters
89   or query component, and :attr:`fragment` is set to the empty string in
90   the return value.
91
92   The return value is a :term:`named tuple`, which means that its items can
93   be accessed by index or as named attributes, which are:
94
95   +------------------+-------+--------------------------+----------------------+
96   | Attribute        | Index | Value                    | Value if not present |
97   +==================+=======+==========================+======================+
98   | :attr:`scheme`   | 0     | URL scheme specifier     | *scheme* parameter   |
99   +------------------+-------+--------------------------+----------------------+
100   | :attr:`netloc`   | 1     | Network location part    | empty string         |
101   +------------------+-------+--------------------------+----------------------+
102   | :attr:`path`     | 2     | Hierarchical path        | empty string         |
103   +------------------+-------+--------------------------+----------------------+
104   | :attr:`params`   | 3     | Parameters for last path | empty string         |
105   |                  |       | element                  |                      |
106   +------------------+-------+--------------------------+----------------------+
107   | :attr:`query`    | 4     | Query component          | empty string         |
108   +------------------+-------+--------------------------+----------------------+
109   | :attr:`fragment` | 5     | Fragment identifier      | empty string         |
110   +------------------+-------+--------------------------+----------------------+
111   | :attr:`username` |       | User name                | :const:`None`        |
112   +------------------+-------+--------------------------+----------------------+
113   | :attr:`password` |       | Password                 | :const:`None`        |
114   +------------------+-------+--------------------------+----------------------+
115   | :attr:`hostname` |       | Host name (lower case)   | :const:`None`        |
116   +------------------+-------+--------------------------+----------------------+
117   | :attr:`port`     |       | Port number as integer,  | :const:`None`        |
118   |                  |       | if present               |                      |
119   +------------------+-------+--------------------------+----------------------+
120
121   Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
122   an invalid port is specified in the URL.  See section
123   :ref:`urlparse-result-object` for more information on the result object.
124
125   Unmatched square brackets in the :attr:`netloc` attribute will raise a
126   :exc:`ValueError`.
127
128   Characters in the :attr:`netloc` attribute that decompose under NFKC
129   normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
130   ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
131   decomposed before parsing, no error will be raised.
132
133   As is the case with all named tuples, the subclass has a few additional methods
134   and attributes that are particularly useful. One such method is :meth:`_replace`.
135   The :meth:`_replace` method will return a new ParseResult object replacing specified
136   fields with new values.
137
138   .. doctest::
139      :options: +NORMALIZE_WHITESPACE
140
141      >>> from urllib.parse import urlparse
142      >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
143      >>> u
144      ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
145                  params='', query='', fragment='')
146      >>> u._replace(scheme='http')
147      ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
148                  params='', query='', fragment='')
149
150
151   .. versionchanged:: 3.2
152      Added IPv6 URL parsing capabilities.
153
154   .. versionchanged:: 3.3
155      The fragment is now parsed for all URL schemes (unless *allow_fragment* is
156      false), in accordance with :rfc:`3986`.  Previously, a whitelist of
157      schemes that support fragments existed.
158
159   .. versionchanged:: 3.6
160      Out-of-range port numbers now raise :exc:`ValueError`, instead of
161      returning :const:`None`.
162
163   .. versionchanged:: 3.8
164      Characters that affect netloc parsing under NFKC normalization will
165      now raise :exc:`ValueError`.
166
167
168.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
169
170   Parse a query string given as a string argument (data of type
171   :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
172   dictionary.  The dictionary keys are the unique query variable names and the
173   values are lists of values for each name.
174
175   The optional argument *keep_blank_values* is a flag indicating whether blank
176   values in percent-encoded queries should be treated as blank strings. A true value
177   indicates that blanks should be retained as  blank strings.  The default false
178   value indicates that blank values are to be ignored and treated as if they were
179   not included.
180
181   The optional argument *strict_parsing* is a flag indicating what to do with
182   parsing errors.  If false (the default), errors are silently ignored.  If true,
183   errors raise a :exc:`ValueError` exception.
184
185   The optional *encoding* and *errors* parameters specify how to decode
186   percent-encoded sequences into Unicode characters, as accepted by the
187   :meth:`bytes.decode` method.
188
189   The optional argument *max_num_fields* is the maximum number of fields to
190   read. If set, then throws a :exc:`ValueError` if there are more than
191   *max_num_fields* fields read.
192
193   Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
194   parameter set to ``True``) to convert such dictionaries into query
195   strings.
196
197
198   .. versionchanged:: 3.2
199      Add *encoding* and *errors* parameters.
200
201   .. versionchanged:: 3.8
202      Added *max_num_fields* parameter.
203
204
205.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
206
207   Parse a query string given as a string argument (data of type
208   :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a list of
209   name, value pairs.
210
211   The optional argument *keep_blank_values* is a flag indicating whether blank
212   values in percent-encoded queries should be treated as blank strings. A true value
213   indicates that blanks should be retained as  blank strings.  The default false
214   value indicates that blank values are to be ignored and treated as if they were
215   not included.
216
217   The optional argument *strict_parsing* is a flag indicating what to do with
218   parsing errors.  If false (the default), errors are silently ignored.  If true,
219   errors raise a :exc:`ValueError` exception.
220
221   The optional *encoding* and *errors* parameters specify how to decode
222   percent-encoded sequences into Unicode characters, as accepted by the
223   :meth:`bytes.decode` method.
224
225   The optional argument *max_num_fields* is the maximum number of fields to
226   read. If set, then throws a :exc:`ValueError` if there are more than
227   *max_num_fields* fields read.
228
229   Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
230   query strings.
231
232   .. versionchanged:: 3.2
233      Add *encoding* and *errors* parameters.
234
235   .. versionchanged:: 3.8
236      Added *max_num_fields* parameter.
237
238
239.. function:: urlunparse(parts)
240
241   Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
242   argument can be any six-item iterable. This may result in a slightly
243   different, but equivalent URL, if the URL that was parsed originally had
244   unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
245   states that these are equivalent).
246
247
248.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
249
250   This is similar to :func:`urlparse`, but does not split the params from the URL.
251   This should generally be used instead of :func:`urlparse` if the more recent URL
252   syntax allowing parameters to be applied to each segment of the *path* portion
253   of the URL (see :rfc:`2396`) is wanted.  A separate function is needed to
254   separate the path segments and parameters.  This function returns a 5-item
255   :term:`named tuple`::
256
257      (addressing scheme, network location, path, query, fragment identifier).
258
259   The return value is a :term:`named tuple`, its items can be accessed by index
260   or as named attributes:
261
262   +------------------+-------+-------------------------+----------------------+
263   | Attribute        | Index | Value                   | Value if not present |
264   +==================+=======+=========================+======================+
265   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter   |
266   +------------------+-------+-------------------------+----------------------+
267   | :attr:`netloc`   | 1     | Network location part   | empty string         |
268   +------------------+-------+-------------------------+----------------------+
269   | :attr:`path`     | 2     | Hierarchical path       | empty string         |
270   +------------------+-------+-------------------------+----------------------+
271   | :attr:`query`    | 3     | Query component         | empty string         |
272   +------------------+-------+-------------------------+----------------------+
273   | :attr:`fragment` | 4     | Fragment identifier     | empty string         |
274   +------------------+-------+-------------------------+----------------------+
275   | :attr:`username` |       | User name               | :const:`None`        |
276   +------------------+-------+-------------------------+----------------------+
277   | :attr:`password` |       | Password                | :const:`None`        |
278   +------------------+-------+-------------------------+----------------------+
279   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`        |
280   +------------------+-------+-------------------------+----------------------+
281   | :attr:`port`     |       | Port number as integer, | :const:`None`        |
282   |                  |       | if present              |                      |
283   +------------------+-------+-------------------------+----------------------+
284
285   Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
286   an invalid port is specified in the URL.  See section
287   :ref:`urlparse-result-object` for more information on the result object.
288
289   Unmatched square brackets in the :attr:`netloc` attribute will raise a
290   :exc:`ValueError`.
291
292   Characters in the :attr:`netloc` attribute that decompose under NFKC
293   normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
294   ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
295   decomposed before parsing, no error will be raised.
296
297   .. versionchanged:: 3.6
298      Out-of-range port numbers now raise :exc:`ValueError`, instead of
299      returning :const:`None`.
300
301   .. versionchanged:: 3.8
302      Characters that affect netloc parsing under NFKC normalization will
303      now raise :exc:`ValueError`.
304
305
306.. function:: urlunsplit(parts)
307
308   Combine the elements of a tuple as returned by :func:`urlsplit` into a
309   complete URL as a string. The *parts* argument can be any five-item
310   iterable. This may result in a slightly different, but equivalent URL, if the
311   URL that was parsed originally had unnecessary delimiters (for example, a ?
312   with an empty query; the RFC states that these are equivalent).
313
314
315.. function:: urljoin(base, url, allow_fragments=True)
316
317   Construct a full ("absolute") URL by combining a "base URL" (*base*) with
318   another URL (*url*).  Informally, this uses components of the base URL, in
319   particular the addressing scheme, the network location and (part of) the
320   path, to provide missing components in the relative URL.  For example:
321
322      >>> from urllib.parse import urljoin
323      >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
324      'http://www.cwi.nl/%7Eguido/FAQ.html'
325
326   The *allow_fragments* argument has the same meaning and default as for
327   :func:`urlparse`.
328
329   .. note::
330
331      If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``),
332      the *url*'s hostname and/or scheme will be present in the result.  For example:
333
334      .. doctest::
335
336         >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
337         ...         '//www.python.org/%7Eguido')
338         'http://www.python.org/%7Eguido'
339
340      If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
341      :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
342
343
344   .. versionchanged:: 3.5
345
346      Behavior updated to match the semantics defined in :rfc:`3986`.
347
348
349.. function:: urldefrag(url)
350
351   If *url* contains a fragment identifier, return a modified version of *url*
352   with no fragment identifier, and the fragment identifier as a separate
353   string.  If there is no fragment identifier in *url*, return *url* unmodified
354   and an empty string.
355
356   The return value is a :term:`named tuple`, its items can be accessed by index
357   or as named attributes:
358
359   +------------------+-------+-------------------------+----------------------+
360   | Attribute        | Index | Value                   | Value if not present |
361   +==================+=======+=========================+======================+
362   | :attr:`url`      | 0     | URL with no fragment    | empty string         |
363   +------------------+-------+-------------------------+----------------------+
364   | :attr:`fragment` | 1     | Fragment identifier     | empty string         |
365   +------------------+-------+-------------------------+----------------------+
366
367   See section :ref:`urlparse-result-object` for more information on the result
368   object.
369
370   .. versionchanged:: 3.2
371      Result is a structured object rather than a simple 2-tuple.
372
373.. function:: unwrap(url)
374
375   Extract the url from a wrapped URL (that is, a string formatted as
376   ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
377   or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
378   without changes.
379
380.. _parsing-ascii-encoded-bytes:
381
382Parsing ASCII Encoded Bytes
383---------------------------
384
385The URL parsing functions were originally designed to operate on character
386strings only. In practice, it is useful to be able to manipulate properly
387quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
388URL parsing functions in this module all operate on :class:`bytes` and
389:class:`bytearray` objects in addition to :class:`str` objects.
390
391If :class:`str` data is passed in, the result will also contain only
392:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
393passed in, the result will contain only :class:`bytes` data.
394
395Attempting to mix :class:`str` data with :class:`bytes` or
396:class:`bytearray` in a single function call will result in a
397:exc:`TypeError` being raised, while attempting to pass in non-ASCII
398byte values will trigger :exc:`UnicodeDecodeError`.
399
400To support easier conversion of result objects between :class:`str` and
401:class:`bytes`, all return values from URL parsing functions provide
402either an :meth:`encode` method (when the result contains :class:`str`
403data) or a :meth:`decode` method (when the result contains :class:`bytes`
404data). The signatures of these methods match those of the corresponding
405:class:`str` and :class:`bytes` methods (except that the default encoding
406is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
407corresponding type that contains either :class:`bytes` data (for
408:meth:`encode` methods) or :class:`str` data (for
409:meth:`decode` methods).
410
411Applications that need to operate on potentially improperly quoted URLs
412that may contain non-ASCII data will need to do their own decoding from
413bytes to characters before invoking the URL parsing methods.
414
415The behaviour described in this section applies only to the URL parsing
416functions. The URL quoting functions use their own rules when producing
417or consuming byte sequences as detailed in the documentation of the
418individual URL quoting functions.
419
420.. versionchanged:: 3.2
421   URL parsing functions now accept ASCII encoded byte sequences
422
423
424.. _urlparse-result-object:
425
426Structured Parse Results
427------------------------
428
429The result objects from the :func:`urlparse`, :func:`urlsplit`  and
430:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
431These subclasses add the attributes listed in the documentation for
432those functions, the encoding and decoding support described in the
433previous section, as well as an additional method:
434
435.. method:: urllib.parse.SplitResult.geturl()
436
437   Return the re-combined version of the original URL as a string. This may
438   differ from the original URL in that the scheme may be normalized to lower
439   case and empty components may be dropped. Specifically, empty parameters,
440   queries, and fragment identifiers will be removed.
441
442   For :func:`urldefrag` results, only empty fragment identifiers will be removed.
443   For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
444   made to the URL returned by this method.
445
446   The result of this method remains unchanged if passed back through the original
447   parsing function:
448
449      >>> from urllib.parse import urlsplit
450      >>> url = 'HTTP://www.Python.org/doc/#'
451      >>> r1 = urlsplit(url)
452      >>> r1.geturl()
453      'http://www.Python.org/doc/'
454      >>> r2 = urlsplit(r1.geturl())
455      >>> r2.geturl()
456      'http://www.Python.org/doc/'
457
458
459The following classes provide the implementations of the structured parse
460results when operating on :class:`str` objects:
461
462.. class:: DefragResult(url, fragment)
463
464   Concrete class for :func:`urldefrag` results containing :class:`str`
465   data. The :meth:`encode` method returns a :class:`DefragResultBytes`
466   instance.
467
468   .. versionadded:: 3.2
469
470.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
471
472   Concrete class for :func:`urlparse` results containing :class:`str`
473   data. The :meth:`encode` method returns a :class:`ParseResultBytes`
474   instance.
475
476.. class:: SplitResult(scheme, netloc, path, query, fragment)
477
478   Concrete class for :func:`urlsplit` results containing :class:`str`
479   data. The :meth:`encode` method returns a :class:`SplitResultBytes`
480   instance.
481
482
483The following classes provide the implementations of the parse results when
484operating on :class:`bytes` or :class:`bytearray` objects:
485
486.. class:: DefragResultBytes(url, fragment)
487
488   Concrete class for :func:`urldefrag` results containing :class:`bytes`
489   data. The :meth:`decode` method returns a :class:`DefragResult`
490   instance.
491
492   .. versionadded:: 3.2
493
494.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
495
496   Concrete class for :func:`urlparse` results containing :class:`bytes`
497   data. The :meth:`decode` method returns a :class:`ParseResult`
498   instance.
499
500   .. versionadded:: 3.2
501
502.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
503
504   Concrete class for :func:`urlsplit` results containing :class:`bytes`
505   data. The :meth:`decode` method returns a :class:`SplitResult`
506   instance.
507
508   .. versionadded:: 3.2
509
510
511URL Quoting
512-----------
513
514The URL quoting functions focus on taking program data and making it safe
515for use as URL components by quoting special characters and appropriately
516encoding non-ASCII text. They also support reversing these operations to
517recreate the original data from the contents of a URL component if that
518task isn't already covered by the URL parsing functions above.
519
520.. function:: quote(string, safe='/', encoding=None, errors=None)
521
522   Replace special characters in *string* using the ``%xx`` escape. Letters,
523   digits, and the characters ``'_.-~'`` are never quoted. By default, this
524   function is intended for quoting the path section of a URL. The optional
525   *safe* parameter specifies additional ASCII characters that should not be
526   quoted --- its default value is ``'/'``.
527
528   *string* may be either a :class:`str` or a :class:`bytes` object.
529
530   .. versionchanged:: 3.7
531      Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
532      included in the set of unreserved characters.
533
534   The optional *encoding* and *errors* parameters specify how to deal with
535   non-ASCII characters, as accepted by the :meth:`str.encode` method.
536   *encoding* defaults to ``'utf-8'``.
537   *errors* defaults to ``'strict'``, meaning unsupported characters raise a
538   :class:`UnicodeEncodeError`.
539   *encoding* and *errors* must not be supplied if *string* is a
540   :class:`bytes`, or a :class:`TypeError` is raised.
541
542   Note that ``quote(string, safe, encoding, errors)`` is equivalent to
543   ``quote_from_bytes(string.encode(encoding, errors), safe)``.
544
545   Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
546
547
548.. function:: quote_plus(string, safe='', encoding=None, errors=None)
549
550   Like :func:`quote`, but also replace spaces with plus signs, as required for
551   quoting HTML form values when building up a query string to go into a URL.
552   Plus signs in the original string are escaped unless they are included in
553   *safe*.  It also does not have *safe* default to ``'/'``.
554
555   Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
556
557
558.. function:: quote_from_bytes(bytes, safe='/')
559
560   Like :func:`quote`, but accepts a :class:`bytes` object rather than a
561   :class:`str`, and does not perform string-to-bytes encoding.
562
563   Example: ``quote_from_bytes(b'a&\xef')`` yields
564   ``'a%26%EF'``.
565
566
567.. function:: unquote(string, encoding='utf-8', errors='replace')
568
569   Replace ``%xx`` escapes with their single-character equivalent.
570   The optional *encoding* and *errors* parameters specify how to decode
571   percent-encoded sequences into Unicode characters, as accepted by the
572   :meth:`bytes.decode` method.
573
574   *string* may be either a :class:`str` or a :class:`bytes` object.
575
576   *encoding* defaults to ``'utf-8'``.
577   *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
578   by a placeholder character.
579
580   Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
581
582   .. versionchanged:: 3.9
583      *string* parameter supports bytes and str objects (previously only str).
584
585
586
587
588.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
589
590   Like :func:`unquote`, but also replace plus signs with spaces, as required
591   for unquoting HTML form values.
592
593   *string* must be a :class:`str`.
594
595   Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
596
597
598.. function:: unquote_to_bytes(string)
599
600   Replace ``%xx`` escapes with their single-octet equivalent, and return a
601   :class:`bytes` object.
602
603   *string* may be either a :class:`str` or a :class:`bytes` object.
604
605   If it is a :class:`str`, unescaped non-ASCII characters in *string*
606   are encoded into UTF-8 bytes.
607
608   Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
609
610
611.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
612                        errors=None, quote_via=quote_plus)
613
614   Convert a mapping object or a sequence of two-element tuples, which may
615   contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
616   text string.  If the resultant string is to be used as a *data* for POST
617   operation with the :func:`~urllib.request.urlopen` function, then
618   it should be encoded to bytes, otherwise it would result in a
619   :exc:`TypeError`.
620
621   The resulting string is a series of ``key=value`` pairs separated by ``'&'``
622   characters, where both *key* and *value* are quoted using the *quote_via*
623   function.  By default, :func:`quote_plus` is used to quote the values, which
624   means spaces are quoted as a ``'+'`` character and '/' characters are
625   encoded as ``%2F``, which follows the standard for GET requests
626   (``application/x-www-form-urlencoded``).  An alternate function that can be
627   passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
628   and not encode '/' characters.  For maximum control of what is quoted, use
629   ``quote`` and specify a value for *safe*.
630
631   When a sequence of two-element tuples is used as the *query*
632   argument, the first element of each tuple is a key and the second is a
633   value. The value element in itself can be a sequence and in that case, if
634   the optional parameter *doseq* evaluates to ``True``, individual
635   ``key=value`` pairs separated by ``'&'`` are generated for each element of
636   the value sequence for the key.  The order of parameters in the encoded
637   string will match the order of parameter tuples in the sequence.
638
639   The *safe*, *encoding*, and *errors* parameters are passed down to
640   *quote_via* (the *encoding* and *errors* parameters are only passed
641   when a query element is a :class:`str`).
642
643   To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
644   provided in this module to parse query strings into Python data structures.
645
646   Refer to :ref:`urllib examples <urllib-examples>` to find out how the
647   :func:`urllib.parse.urlencode` method can be used for generating the query
648   string of a URL or data for a POST request.
649
650   .. versionchanged:: 3.2
651      *query* supports bytes and string objects.
652
653   .. versionadded:: 3.5
654      *quote_via* parameter.
655
656
657.. seealso::
658
659   :rfc:`3986` - Uniform Resource Identifiers
660      This is the current standard (STD66). Any changes to urllib.parse module
661      should conform to this. Certain deviations could be observed, which are
662      mostly for backward compatibility purposes and for certain de-facto
663      parsing requirements as commonly observed in major browsers.
664
665   :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
666      This specifies the parsing requirements of IPv6 URLs.
667
668   :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
669      Document describing the generic syntactic requirements for both Uniform Resource
670      Names (URNs) and Uniform Resource Locators (URLs).
671
672   :rfc:`2368` - The mailto URL scheme.
673      Parsing requirements for mailto URL schemes.
674
675   :rfc:`1808` - Relative Uniform Resource Locators
676      This Request For Comments includes the rules for joining an absolute and a
677      relative URL, including a fair number of "Abnormal Examples" which govern the
678      treatment of border cases.
679
680   :rfc:`1738` - Uniform Resource Locators (URL)
681      This specifies the formal syntax and semantics of absolute URLs.
682