• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!urllib.parse` --- Parse URLs into components
2===================================================
3
4.. module:: urllib.parse
5   :synopsis: Parse URLs into or assemble them from components.
6
7**Source code:** :source:`Lib/urllib/parse.py`
8
9.. index::
10   single: WWW
11   single: World Wide Web
12   single: URL
13   pair: URL; parsing
14   pair: relative; URL
15
16--------------
17
18This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the internet RFC on Relative Uniform
24Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``itms-services``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtsps``, ``rtspu``,
27``sftp``, ``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``,
28``telnet``, ``wais``, ``ws``, ``wss``.
29
30.. impl-detail::
31
32   The inclusion of the ``itms-services`` URL scheme can prevent an app from
33   passing Apple's App Store review process for the macOS and iOS App Stores.
34   Handling for the ``itms-services`` scheme is always removed on iOS; on
35   macOS, it *may* be removed if CPython has been built with the
36   :option:`--with-app-store-compliance` option.
37
38The :mod:`urllib.parse` module defines functions that fall into two broad
39categories: URL parsing and URL quoting. These are covered in detail in
40the following sections.
41
42This module's functions use the deprecated term ``netloc`` (or ``net_loc``),
43which was introduced in :rfc:`1808`. However, this term has been obsoleted by
44:rfc:`3986`, which introduced the term ``authority`` as its replacement.
45The use of ``netloc`` is continued for backward compatibility.
46
47URL Parsing
48-----------
49
50The URL parsing functions focus on splitting a URL string into its components,
51or on combining URL components into a URL string.
52
53.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
54
55   Parse a URL into six components, returning a 6-item :term:`named tuple`.  This
56   corresponds to the general structure of a URL:
57   ``scheme://netloc/path;parameters?query#fragment``.
58   Each tuple item is a string, possibly empty. The components are not broken up
59   into smaller parts (for example, the network location is a single string), and %
60   escapes are not expanded. The delimiters as shown above are not part of the
61   result, except for a leading slash in the *path* component, which is retained if
62   present.  For example:
63
64   .. doctest::
65      :options: +NORMALIZE_WHITESPACE
66
67      >>> from urllib.parse import urlparse
68      >>> urlparse("scheme://netloc/path;parameters?query#fragment")
69      ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='',
70                  query='query', fragment='fragment')
71      >>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?"
72      ...              "highlight=params#url-parsing")
73      >>> o
74      ParseResult(scheme='http', netloc='docs.python.org:80',
75                  path='/3/library/urllib.parse.html', params='',
76                  query='highlight=params', fragment='url-parsing')
77      >>> o.scheme
78      'http'
79      >>> o.netloc
80      'docs.python.org:80'
81      >>> o.hostname
82      'docs.python.org'
83      >>> o.port
84      80
85      >>> o._replace(fragment="").geturl()
86      'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
87
88   Following the syntax specifications in :rfc:`1808`, urlparse recognizes
89   a netloc only if it is properly introduced by '//'.  Otherwise the
90   input is presumed to be a relative URL and thus to start with
91   a path component.
92
93   .. doctest::
94      :options: +NORMALIZE_WHITESPACE
95
96      >>> from urllib.parse import urlparse
97      >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
98      ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
99                  params='', query='', fragment='')
100      >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
101      ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
102                  params='', query='', fragment='')
103      >>> urlparse('help/Python.html')
104      ParseResult(scheme='', netloc='', path='help/Python.html', params='',
105                  query='', fragment='')
106
107   The *scheme* argument gives the default addressing scheme, to be
108   used only if the URL does not specify one.  It should be the same type
109   (text or bytes) as *urlstring*, except that the default value ``''`` is
110   always allowed, and is automatically converted to ``b''`` if appropriate.
111
112   If the *allow_fragments* argument is false, fragment identifiers are not
113   recognized.  Instead, they are parsed as part of the path, parameters
114   or query component, and :attr:`fragment` is set to the empty string in
115   the return value.
116
117   The return value is a :term:`named tuple`, which means that its items can
118   be accessed by index or as named attributes, which are:
119
120   +------------------+-------+-------------------------+------------------------+
121   | Attribute        | Index | Value                   | Value if not present   |
122   +==================+=======+=========================+========================+
123   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter     |
124   +------------------+-------+-------------------------+------------------------+
125   | :attr:`netloc`   | 1     | Network location part   | empty string           |
126   +------------------+-------+-------------------------+------------------------+
127   | :attr:`path`     | 2     | Hierarchical path       | empty string           |
128   +------------------+-------+-------------------------+------------------------+
129   | :attr:`params`   | 3     | Parameters for last     | empty string           |
130   |                  |       | path element            |                        |
131   +------------------+-------+-------------------------+------------------------+
132   | :attr:`query`    | 4     | Query component         | empty string           |
133   +------------------+-------+-------------------------+------------------------+
134   | :attr:`fragment` | 5     | Fragment identifier     | empty string           |
135   +------------------+-------+-------------------------+------------------------+
136   | :attr:`username` |       | User name               | :const:`None`          |
137   +------------------+-------+-------------------------+------------------------+
138   | :attr:`password` |       | Password                | :const:`None`          |
139   +------------------+-------+-------------------------+------------------------+
140   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`          |
141   +------------------+-------+-------------------------+------------------------+
142   | :attr:`port`     |       | Port number as integer, | :const:`None`          |
143   |                  |       | if present              |                        |
144   +------------------+-------+-------------------------+------------------------+
145
146   Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
147   an invalid port is specified in the URL.  See section
148   :ref:`urlparse-result-object` for more information on the result object.
149
150   Unmatched square brackets in the :attr:`netloc` attribute will raise a
151   :exc:`ValueError`.
152
153   Characters in the :attr:`netloc` attribute that decompose under NFKC
154   normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
155   ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
156   decomposed before parsing, no error will be raised.
157
158   As is the case with all named tuples, the subclass has a few additional methods
159   and attributes that are particularly useful. One such method is :meth:`_replace`.
160   The :meth:`_replace` method will return a new ParseResult object replacing specified
161   fields with new values.
162
163   .. doctest::
164      :options: +NORMALIZE_WHITESPACE
165
166      >>> from urllib.parse import urlparse
167      >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
168      >>> u
169      ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
170                  params='', query='', fragment='')
171      >>> u._replace(scheme='http')
172      ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
173                  params='', query='', fragment='')
174
175   .. warning::
176
177      :func:`urlparse` does not perform validation.  See :ref:`URL parsing
178      security <url-parsing-security>` for details.
179
180   .. versionchanged:: 3.2
181      Added IPv6 URL parsing capabilities.
182
183   .. versionchanged:: 3.3
184      The fragment is now parsed for all URL schemes (unless *allow_fragments* is
185      false), in accordance with :rfc:`3986`.  Previously, an allowlist of
186      schemes that support fragments existed.
187
188   .. versionchanged:: 3.6
189      Out-of-range port numbers now raise :exc:`ValueError`, instead of
190      returning :const:`None`.
191
192   .. versionchanged:: 3.8
193      Characters that affect netloc parsing under NFKC normalization will
194      now raise :exc:`ValueError`.
195
196
197.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
198
199   Parse a query string given as a string argument (data of type
200   :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
201   dictionary.  The dictionary keys are the unique query variable names and the
202   values are lists of values for each name.
203
204   The optional argument *keep_blank_values* is a flag indicating whether blank
205   values in percent-encoded queries should be treated as blank strings. A true value
206   indicates that blanks should be retained as  blank strings.  The default false
207   value indicates that blank values are to be ignored and treated as if they were
208   not included.
209
210   The optional argument *strict_parsing* is a flag indicating what to do with
211   parsing errors.  If false (the default), errors are silently ignored.  If true,
212   errors raise a :exc:`ValueError` exception.
213
214   The optional *encoding* and *errors* parameters specify how to decode
215   percent-encoded sequences into Unicode characters, as accepted by the
216   :meth:`bytes.decode` method.
217
218   The optional argument *max_num_fields* is the maximum number of fields to
219   read. If set, then throws a :exc:`ValueError` if there are more than
220   *max_num_fields* fields read.
221
222   The optional argument *separator* is the symbol to use for separating the
223   query arguments. It defaults to ``&``.
224
225   Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
226   parameter set to ``True``) to convert such dictionaries into query
227   strings.
228
229
230   .. versionchanged:: 3.2
231      Add *encoding* and *errors* parameters.
232
233   .. versionchanged:: 3.8
234      Added *max_num_fields* parameter.
235
236   .. versionchanged:: 3.10
237      Added *separator* parameter with the default value of ``&``. Python
238      versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as
239      query parameter separator. This has been changed to allow only a single
240      separator key, with ``&`` as the default separator.
241
242
243.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
244
245   Parse a query string given as a string argument (data of type
246   :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a list of
247   name, value pairs.
248
249   The optional argument *keep_blank_values* is a flag indicating whether blank
250   values in percent-encoded queries should be treated as blank strings. A true value
251   indicates that blanks should be retained as  blank strings.  The default false
252   value indicates that blank values are to be ignored and treated as if they were
253   not included.
254
255   The optional argument *strict_parsing* is a flag indicating what to do with
256   parsing errors.  If false (the default), errors are silently ignored.  If true,
257   errors raise a :exc:`ValueError` exception.
258
259   The optional *encoding* and *errors* parameters specify how to decode
260   percent-encoded sequences into Unicode characters, as accepted by the
261   :meth:`bytes.decode` method.
262
263   The optional argument *max_num_fields* is the maximum number of fields to
264   read. If set, then throws a :exc:`ValueError` if there are more than
265   *max_num_fields* fields read.
266
267   The optional argument *separator* is the symbol to use for separating the
268   query arguments. It defaults to ``&``.
269
270   Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
271   query strings.
272
273   .. versionchanged:: 3.2
274      Add *encoding* and *errors* parameters.
275
276   .. versionchanged:: 3.8
277      Added *max_num_fields* parameter.
278
279   .. versionchanged:: 3.10
280      Added *separator* parameter with the default value of ``&``. Python
281      versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as
282      query parameter separator. This has been changed to allow only a single
283      separator key, with ``&`` as the default separator.
284
285
286.. function:: urlunparse(parts)
287
288   Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
289   argument can be any six-item iterable. This may result in a slightly
290   different, but equivalent URL, if the URL that was parsed originally had
291   unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
292   states that these are equivalent).
293
294
295.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
296
297   This is similar to :func:`urlparse`, but does not split the params from the URL.
298   This should generally be used instead of :func:`urlparse` if the more recent URL
299   syntax allowing parameters to be applied to each segment of the *path* portion
300   of the URL (see :rfc:`2396`) is wanted.  A separate function is needed to
301   separate the path segments and parameters.  This function returns a 5-item
302   :term:`named tuple`::
303
304      (addressing scheme, network location, path, query, fragment identifier).
305
306   The return value is a :term:`named tuple`, its items can be accessed by index
307   or as named attributes:
308
309   +------------------+-------+-------------------------+----------------------+
310   | Attribute        | Index | Value                   | Value if not present |
311   +==================+=======+=========================+======================+
312   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter   |
313   +------------------+-------+-------------------------+----------------------+
314   | :attr:`netloc`   | 1     | Network location part   | empty string         |
315   +------------------+-------+-------------------------+----------------------+
316   | :attr:`path`     | 2     | Hierarchical path       | empty string         |
317   +------------------+-------+-------------------------+----------------------+
318   | :attr:`query`    | 3     | Query component         | empty string         |
319   +------------------+-------+-------------------------+----------------------+
320   | :attr:`fragment` | 4     | Fragment identifier     | empty string         |
321   +------------------+-------+-------------------------+----------------------+
322   | :attr:`username` |       | User name               | :const:`None`        |
323   +------------------+-------+-------------------------+----------------------+
324   | :attr:`password` |       | Password                | :const:`None`        |
325   +------------------+-------+-------------------------+----------------------+
326   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`        |
327   +------------------+-------+-------------------------+----------------------+
328   | :attr:`port`     |       | Port number as integer, | :const:`None`        |
329   |                  |       | if present              |                      |
330   +------------------+-------+-------------------------+----------------------+
331
332   Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
333   an invalid port is specified in the URL.  See section
334   :ref:`urlparse-result-object` for more information on the result object.
335
336   Unmatched square brackets in the :attr:`netloc` attribute will raise a
337   :exc:`ValueError`.
338
339   Characters in the :attr:`netloc` attribute that decompose under NFKC
340   normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
341   ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
342   decomposed before parsing, no error will be raised.
343
344   Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0
345   control and space characters are stripped from the URL. ``\n``,
346   ``\r`` and tab ``\t`` characters are removed from the URL at any position.
347
348   .. warning::
349
350      :func:`urlsplit` does not perform validation.  See :ref:`URL parsing
351      security <url-parsing-security>` for details.
352
353   .. versionchanged:: 3.6
354      Out-of-range port numbers now raise :exc:`ValueError`, instead of
355      returning :const:`None`.
356
357   .. versionchanged:: 3.8
358      Characters that affect netloc parsing under NFKC normalization will
359      now raise :exc:`ValueError`.
360
361   .. versionchanged:: 3.10
362      ASCII newline and tab characters are stripped from the URL.
363
364   .. versionchanged:: 3.12
365      Leading WHATWG C0 control and space characters are stripped from the URL.
366
367.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
368
369.. function:: urlunsplit(parts)
370
371   Combine the elements of a tuple as returned by :func:`urlsplit` into a
372   complete URL as a string. The *parts* argument can be any five-item
373   iterable. This may result in a slightly different, but equivalent URL, if the
374   URL that was parsed originally had unnecessary delimiters (for example, a ?
375   with an empty query; the RFC states that these are equivalent).
376
377
378.. function:: urljoin(base, url, allow_fragments=True)
379
380   Construct a full ("absolute") URL by combining a "base URL" (*base*) with
381   another URL (*url*).  Informally, this uses components of the base URL, in
382   particular the addressing scheme, the network location and (part of) the
383   path, to provide missing components in the relative URL.  For example:
384
385      >>> from urllib.parse import urljoin
386      >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
387      'http://www.cwi.nl/%7Eguido/FAQ.html'
388
389   The *allow_fragments* argument has the same meaning and default as for
390   :func:`urlparse`.
391
392   .. note::
393
394      If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``),
395      the *url*'s hostname and/or scheme will be present in the result.  For example:
396
397      .. doctest::
398
399         >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
400         ...         '//www.python.org/%7Eguido')
401         'http://www.python.org/%7Eguido'
402
403      If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
404      :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
405
406   .. warning::
407
408      Because an absolute URL may be passed as the ``url`` parameter, it is
409      generally **not secure** to use ``urljoin`` with an attacker-controlled
410      ``url``. For example in,
411      ``urljoin("https://website.com/users/", username)``, if ``username`` can
412      contain an absolute URL, the result of ``urljoin`` will be the absolute
413      URL.
414
415
416   .. versionchanged:: 3.5
417
418      Behavior updated to match the semantics defined in :rfc:`3986`.
419
420
421.. function:: urldefrag(url)
422
423   If *url* contains a fragment identifier, return a modified version of *url*
424   with no fragment identifier, and the fragment identifier as a separate
425   string.  If there is no fragment identifier in *url*, return *url* unmodified
426   and an empty string.
427
428   The return value is a :term:`named tuple`, its items can be accessed by index
429   or as named attributes:
430
431   +------------------+-------+-------------------------+----------------------+
432   | Attribute        | Index | Value                   | Value if not present |
433   +==================+=======+=========================+======================+
434   | :attr:`url`      | 0     | URL with no fragment    | empty string         |
435   +------------------+-------+-------------------------+----------------------+
436   | :attr:`fragment` | 1     | Fragment identifier     | empty string         |
437   +------------------+-------+-------------------------+----------------------+
438
439   See section :ref:`urlparse-result-object` for more information on the result
440   object.
441
442   .. versionchanged:: 3.2
443      Result is a structured object rather than a simple 2-tuple.
444
445.. function:: unwrap(url)
446
447   Extract the url from a wrapped URL (that is, a string formatted as
448   ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
449   or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
450   without changes.
451
452.. _url-parsing-security:
453
454URL parsing security
455--------------------
456
457The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation** of
458inputs.  They may not raise errors on inputs that other applications consider
459invalid.  They may also succeed on some inputs that might not be considered
460URLs elsewhere.  Their purpose is for practical functionality rather than
461purity.
462
463Instead of raising an exception on unusual input, they may instead return some
464component parts as empty strings. Or components may contain more than perhaps
465they should.
466
467We recommend that users of these APIs where the values may be used anywhere
468with security implications code defensively. Do some verification within your
469code before trusting a returned component part.  Does that ``scheme`` make
470sense?  Is that a sensible ``path``?  Is there anything strange about that
471``hostname``?  etc.
472
473What constitutes a URL is not universally well defined.  Different applications
474have different needs and desired constraints.  For instance the living `WHATWG
475spec`_ describes what user facing web clients such as a web browser require.
476While :rfc:`3986` is more general.  These functions incorporate some aspects of
477both, but cannot be claimed compliant with either.  The APIs and existing user
478code with expectations on specific behaviors predate both standards leading us
479to be very cautious about making API behavior changes.
480
481.. _parsing-ascii-encoded-bytes:
482
483Parsing ASCII Encoded Bytes
484---------------------------
485
486The URL parsing functions were originally designed to operate on character
487strings only. In practice, it is useful to be able to manipulate properly
488quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
489URL parsing functions in this module all operate on :class:`bytes` and
490:class:`bytearray` objects in addition to :class:`str` objects.
491
492If :class:`str` data is passed in, the result will also contain only
493:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
494passed in, the result will contain only :class:`bytes` data.
495
496Attempting to mix :class:`str` data with :class:`bytes` or
497:class:`bytearray` in a single function call will result in a
498:exc:`TypeError` being raised, while attempting to pass in non-ASCII
499byte values will trigger :exc:`UnicodeDecodeError`.
500
501To support easier conversion of result objects between :class:`str` and
502:class:`bytes`, all return values from URL parsing functions provide
503either an :meth:`encode` method (when the result contains :class:`str`
504data) or a :meth:`decode` method (when the result contains :class:`bytes`
505data). The signatures of these methods match those of the corresponding
506:class:`str` and :class:`bytes` methods (except that the default encoding
507is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
508corresponding type that contains either :class:`bytes` data (for
509:meth:`encode` methods) or :class:`str` data (for
510:meth:`decode` methods).
511
512Applications that need to operate on potentially improperly quoted URLs
513that may contain non-ASCII data will need to do their own decoding from
514bytes to characters before invoking the URL parsing methods.
515
516The behaviour described in this section applies only to the URL parsing
517functions. The URL quoting functions use their own rules when producing
518or consuming byte sequences as detailed in the documentation of the
519individual URL quoting functions.
520
521.. versionchanged:: 3.2
522   URL parsing functions now accept ASCII encoded byte sequences
523
524
525.. _urlparse-result-object:
526
527Structured Parse Results
528------------------------
529
530The result objects from the :func:`urlparse`, :func:`urlsplit`  and
531:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
532These subclasses add the attributes listed in the documentation for
533those functions, the encoding and decoding support described in the
534previous section, as well as an additional method:
535
536.. method:: urllib.parse.SplitResult.geturl()
537
538   Return the re-combined version of the original URL as a string. This may
539   differ from the original URL in that the scheme may be normalized to lower
540   case and empty components may be dropped. Specifically, empty parameters,
541   queries, and fragment identifiers will be removed.
542
543   For :func:`urldefrag` results, only empty fragment identifiers will be removed.
544   For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
545   made to the URL returned by this method.
546
547   The result of this method remains unchanged if passed back through the original
548   parsing function:
549
550      >>> from urllib.parse import urlsplit
551      >>> url = 'HTTP://www.Python.org/doc/#'
552      >>> r1 = urlsplit(url)
553      >>> r1.geturl()
554      'http://www.Python.org/doc/'
555      >>> r2 = urlsplit(r1.geturl())
556      >>> r2.geturl()
557      'http://www.Python.org/doc/'
558
559
560The following classes provide the implementations of the structured parse
561results when operating on :class:`str` objects:
562
563.. class:: DefragResult(url, fragment)
564
565   Concrete class for :func:`urldefrag` results containing :class:`str`
566   data. The :meth:`encode` method returns a :class:`DefragResultBytes`
567   instance.
568
569   .. versionadded:: 3.2
570
571.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
572
573   Concrete class for :func:`urlparse` results containing :class:`str`
574   data. The :meth:`encode` method returns a :class:`ParseResultBytes`
575   instance.
576
577.. class:: SplitResult(scheme, netloc, path, query, fragment)
578
579   Concrete class for :func:`urlsplit` results containing :class:`str`
580   data. The :meth:`encode` method returns a :class:`SplitResultBytes`
581   instance.
582
583
584The following classes provide the implementations of the parse results when
585operating on :class:`bytes` or :class:`bytearray` objects:
586
587.. class:: DefragResultBytes(url, fragment)
588
589   Concrete class for :func:`urldefrag` results containing :class:`bytes`
590   data. The :meth:`decode` method returns a :class:`DefragResult`
591   instance.
592
593   .. versionadded:: 3.2
594
595.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
596
597   Concrete class for :func:`urlparse` results containing :class:`bytes`
598   data. The :meth:`decode` method returns a :class:`ParseResult`
599   instance.
600
601   .. versionadded:: 3.2
602
603.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
604
605   Concrete class for :func:`urlsplit` results containing :class:`bytes`
606   data. The :meth:`decode` method returns a :class:`SplitResult`
607   instance.
608
609   .. versionadded:: 3.2
610
611
612URL Quoting
613-----------
614
615The URL quoting functions focus on taking program data and making it safe
616for use as URL components by quoting special characters and appropriately
617encoding non-ASCII text. They also support reversing these operations to
618recreate the original data from the contents of a URL component if that
619task isn't already covered by the URL parsing functions above.
620
621.. function:: quote(string, safe='/', encoding=None, errors=None)
622
623   Replace special characters in *string* using the :samp:`%{xx}` escape. Letters,
624   digits, and the characters ``'_.-~'`` are never quoted. By default, this
625   function is intended for quoting the path section of a URL. The optional
626   *safe* parameter specifies additional ASCII characters that should not be
627   quoted --- its default value is ``'/'``.
628
629   *string* may be either a :class:`str` or a :class:`bytes` object.
630
631   .. versionchanged:: 3.7
632      Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
633      included in the set of unreserved characters.
634
635   The optional *encoding* and *errors* parameters specify how to deal with
636   non-ASCII characters, as accepted by the :meth:`str.encode` method.
637   *encoding* defaults to ``'utf-8'``.
638   *errors* defaults to ``'strict'``, meaning unsupported characters raise a
639   :class:`UnicodeEncodeError`.
640   *encoding* and *errors* must not be supplied if *string* is a
641   :class:`bytes`, or a :class:`TypeError` is raised.
642
643   Note that ``quote(string, safe, encoding, errors)`` is equivalent to
644   ``quote_from_bytes(string.encode(encoding, errors), safe)``.
645
646   Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
647
648
649.. function:: quote_plus(string, safe='', encoding=None, errors=None)
650
651   Like :func:`quote`, but also replace spaces with plus signs, as required for
652   quoting HTML form values when building up a query string to go into a URL.
653   Plus signs in the original string are escaped unless they are included in
654   *safe*.  It also does not have *safe* default to ``'/'``.
655
656   Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
657
658
659.. function:: quote_from_bytes(bytes, safe='/')
660
661   Like :func:`quote`, but accepts a :class:`bytes` object rather than a
662   :class:`str`, and does not perform string-to-bytes encoding.
663
664   Example: ``quote_from_bytes(b'a&\xef')`` yields
665   ``'a%26%EF'``.
666
667
668.. function:: unquote(string, encoding='utf-8', errors='replace')
669
670   Replace :samp:`%{xx}` escapes with their single-character equivalent.
671   The optional *encoding* and *errors* parameters specify how to decode
672   percent-encoded sequences into Unicode characters, as accepted by the
673   :meth:`bytes.decode` method.
674
675   *string* may be either a :class:`str` or a :class:`bytes` object.
676
677   *encoding* defaults to ``'utf-8'``.
678   *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
679   by a placeholder character.
680
681   Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
682
683   .. versionchanged:: 3.9
684      *string* parameter supports bytes and str objects (previously only str).
685
686
687
688
689.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
690
691   Like :func:`unquote`, but also replace plus signs with spaces, as required
692   for unquoting HTML form values.
693
694   *string* must be a :class:`str`.
695
696   Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
697
698
699.. function:: unquote_to_bytes(string)
700
701   Replace :samp:`%{xx}` escapes with their single-octet equivalent, and return a
702   :class:`bytes` object.
703
704   *string* may be either a :class:`str` or a :class:`bytes` object.
705
706   If it is a :class:`str`, unescaped non-ASCII characters in *string*
707   are encoded into UTF-8 bytes.
708
709   Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
710
711
712.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
713                        errors=None, quote_via=quote_plus)
714
715   Convert a mapping object or a sequence of two-element tuples, which may
716   contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
717   text string.  If the resultant string is to be used as a *data* for POST
718   operation with the :func:`~urllib.request.urlopen` function, then
719   it should be encoded to bytes, otherwise it would result in a
720   :exc:`TypeError`.
721
722   The resulting string is a series of ``key=value`` pairs separated by ``'&'``
723   characters, where both *key* and *value* are quoted using the *quote_via*
724   function.  By default, :func:`quote_plus` is used to quote the values, which
725   means spaces are quoted as a ``'+'`` character and '/' characters are
726   encoded as ``%2F``, which follows the standard for GET requests
727   (``application/x-www-form-urlencoded``).  An alternate function that can be
728   passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
729   and not encode '/' characters.  For maximum control of what is quoted, use
730   ``quote`` and specify a value for *safe*.
731
732   When a sequence of two-element tuples is used as the *query*
733   argument, the first element of each tuple is a key and the second is a
734   value. The value element in itself can be a sequence and in that case, if
735   the optional parameter *doseq* evaluates to ``True``, individual
736   ``key=value`` pairs separated by ``'&'`` are generated for each element of
737   the value sequence for the key.  The order of parameters in the encoded
738   string will match the order of parameter tuples in the sequence.
739
740   The *safe*, *encoding*, and *errors* parameters are passed down to
741   *quote_via* (the *encoding* and *errors* parameters are only passed
742   when a query element is a :class:`str`).
743
744   To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
745   provided in this module to parse query strings into Python data structures.
746
747   Refer to :ref:`urllib examples <urllib-examples>` to find out how the
748   :func:`urllib.parse.urlencode` method can be used for generating the query
749   string of a URL or data for a POST request.
750
751   .. versionchanged:: 3.2
752      *query* supports bytes and string objects.
753
754   .. versionchanged:: 3.5
755      Added the *quote_via* parameter.
756
757
758.. seealso::
759
760   `WHATWG`_ -  URL Living standard
761      Working Group for the URL Standard that defines URLs, domains, IP addresses, the
762      application/x-www-form-urlencoded format, and their API.
763
764   :rfc:`3986` - Uniform Resource Identifiers
765      This is the current standard (STD66). Any changes to urllib.parse module
766      should conform to this. Certain deviations could be observed, which are
767      mostly for backward compatibility purposes and for certain de-facto
768      parsing requirements as commonly observed in major browsers.
769
770   :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
771      This specifies the parsing requirements of IPv6 URLs.
772
773   :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
774      Document describing the generic syntactic requirements for both Uniform Resource
775      Names (URNs) and Uniform Resource Locators (URLs).
776
777   :rfc:`2368` - The mailto URL scheme.
778      Parsing requirements for mailto URL schemes.
779
780   :rfc:`1808` - Relative Uniform Resource Locators
781      This Request For Comments includes the rules for joining an absolute and a
782      relative URL, including a fair number of "Abnormal Examples" which govern the
783      treatment of border cases.
784
785   :rfc:`1738` - Uniform Resource Locators (URL)
786      This specifies the formal syntax and semantics of absolute URLs.
787
788.. _WHATWG: https://url.spec.whatwg.org/
789