• Home
  • Raw
  • Download

Lines Matching +full:url +full:- +full:parse

1 :mod:`urllib.parse` --- Parse URLs into components
4 .. module:: urllib.parse
5 :synopsis: Parse URLs into or assemble them from components.
7 **Source code:** :source:`Lib/urllib/parse.py`
12 single: URL
13 pair: URL; parsing
14 pair: relative; URL
16 --------------
18 This module defines a standard interface to break Uniform Resource Locator (URL)
20 combine the components back into a URL string, and to convert a "relative URL"
21 to an absolute URL given a "base URL."
24 Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
30 The :mod:`urllib.parse` module defines functions that fall into two broad
31 categories: URL parsing and URL quoting. These are covered in detail in
34 URL Parsing
35 -----------
37 The URL parsing functions focus on splitting a URL string into its components,
38 or on combining URL components into a URL string.
42 Parse a URL into six components, returning a 6-item :term:`named tuple`. This
43 corresponds to the general structure of a URL:
54 >>> from urllib.parse import urlparse
58 >>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?"
59 ... "highlight=params#url-parsing")
62 path='/3/library/urllib.parse.html', params='',
63 query='highlight=params', fragment='url-parsing')
73 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
77 input is presumed to be a relative URL and thus to start with
83 >>> from urllib.parse import urlparse
95 used only if the URL does not specify one. It should be the same type
107 +------------------+-------+-------------------------+------------------------+
110 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
111 +------------------+-------+-------------------------+------------------------+
113 +------------------+-------+-------------------------+------------------------+
115 +------------------+-------+-------------------------+------------------------+
118 +------------------+-------+-------------------------+------------------------+
120 +------------------+-------+-------------------------+------------------------+
122 +------------------+-------+-------------------------+------------------------+
124 +------------------+-------+-------------------------+------------------------+
126 +------------------+-------+-------------------------+------------------------+
128 +------------------+-------+-------------------------+------------------------+
131 +------------------+-------+-------------------------+------------------------+
134 an invalid port is specified in the URL. See section
135 :ref:`urlparse-result-object` for more information on the result object.
142 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
153 >>> from urllib.parse import urlparse
164 :func:`urlparse` does not perform validation. See :ref:`URL parsing
165 security <url-parsing-security>` for details.
168 Added IPv6 URL parsing capabilities.
171 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
176 Out-of-range port numbers now raise :exc:`ValueError`, instead of
184 .. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors=…
186 Parse a query string given as a string argument (data of type
187 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
192 values in percent-encoded queries should be treated as blank strings. A true value
202 percent-encoded sequences into Unicode characters, as accepted by the
212 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
230 .. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors…
232 Parse a query string given as a string argument (data of type
233 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
237 values in percent-encoded queries should be treated as blank strings. A true value
247 percent-encoded sequences into Unicode characters, as accepted by the
257 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
275 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
276 argument can be any six-item iterable. This may result in a slightly
277 different, but equivalent URL, if the URL that was parsed originally had
284 This is similar to :func:`urlparse`, but does not split the params from the URL.
285 This should generally be used instead of :func:`urlparse` if the more recent URL
287 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
288 separate the path segments and parameters. This function returns a 5-item
296 +------------------+-------+-------------------------+----------------------+
299 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
300 +------------------+-------+-------------------------+----------------------+
302 +------------------+-------+-------------------------+----------------------+
304 +------------------+-------+-------------------------+----------------------+
306 +------------------+-------+-------------------------+----------------------+
308 +------------------+-------+-------------------------+----------------------+
310 +------------------+-------+-------------------------+----------------------+
312 +------------------+-------+-------------------------+----------------------+
314 +------------------+-------+-------------------------+----------------------+
317 +------------------+-------+-------------------------+----------------------+
320 an invalid port is specified in the URL. See section
321 :ref:`urlparse-result-object` for more information on the result object.
328 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
332 control and space characters are stripped from the URL. ``\n``,
333 ``\r`` and tab ``\t`` characters are removed from the URL at any position.
337 :func:`urlsplit` does not perform validation. See :ref:`URL parsing
338 security <url-parsing-security>` for details.
341 Out-of-range port numbers now raise :exc:`ValueError`, instead of
349 ASCII newline and tab characters are stripped from the URL.
352 Leading WHATWG C0 control and space characters are stripped from the URL.
354 .. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
359 complete URL as a string. The *parts* argument can be any five-item
360 iterable. This may result in a slightly different, but equivalent URL, if the
361 URL that was parsed originally had unnecessary delimiters (for example, a ?
365 .. function:: urljoin(base, url, allow_fragments=True)
367 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
368 another URL (*url*). Informally, this uses components of the base URL, in
370 path, to provide missing components in the relative URL. For example:
372 >>> from urllib.parse import urljoin
381 If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``),
382 the *url*'s hostname and/or scheme will be present in the result. For example:
390 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
399 .. function:: urldefrag(url)
401 If *url* contains a fragment identifier, return a modified version of *url*
403 string. If there is no fragment identifier in *url*, return *url* unmodified
409 +------------------+-------+-------------------------+----------------------+
412 | :attr:`url` | 0 | URL with no fragment | empty string |
413 +------------------+-------+-------------------------+----------------------+
415 +------------------+-------+-------------------------+----------------------+
417 See section :ref:`urlparse-result-object` for more information on the result
421 Result is a structured object rather than a simple 2-tuple.
423 .. function:: unwrap(url)
425 Extract the url from a wrapped URL (that is, a string formatted as
426 ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
427 or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
430 .. _url-parsing-security:
432 URL parsing security
433 --------------------
451 What constitutes a URL is not universally well defined. Different applications
459 .. _parsing-ascii-encoded-bytes:
462 ---------------------------
464 The URL parsing functions were originally designed to operate on character
467 URL parsing functions in this module all operate on :class:`bytes` and
476 :exc:`TypeError` being raised, while attempting to pass in non-ASCII
480 :class:`bytes`, all return values from URL parsing functions provide
485 is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
491 that may contain non-ASCII data will need to do their own decoding from
492 bytes to characters before invoking the URL parsing methods.
494 The behaviour described in this section applies only to the URL parsing
495 functions. The URL quoting functions use their own rules when producing
497 individual URL quoting functions.
500 URL parsing functions now accept ASCII encoded byte sequences
503 .. _urlparse-result-object:
505 Structured Parse Results
506 ------------------------
514 .. method:: urllib.parse.SplitResult.geturl()
516 Return the re-combined version of the original URL as a string. This may
517 differ from the original URL in that the scheme may be normalized to lower
523 made to the URL returned by this method.
528 >>> from urllib.parse import urlsplit
529 >>> url = 'HTTP://www.Python.org/doc/#'
530 >>> r1 = urlsplit(url)
538 The following classes provide the implementations of the structured parse
541 .. class:: DefragResult(url, fragment)
562 The following classes provide the implementations of the parse results when
565 .. class:: DefragResultBytes(url, fragment)
590 URL Quoting
591 -----------
593 The URL quoting functions focus on taking program data and making it safe
594 for use as URL components by quoting special characters and appropriately
595 encoding non-ASCII text. They also support reversing these operations to
596 recreate the original data from the contents of a URL component if that
597 task isn't already covered by the URL parsing functions above.
602 digits, and the characters ``'_.-~'`` are never quoted. By default, this
603 function is intended for quoting the path section of a URL. The optional
605 quoted --- its default value is ``'/'``.
610 Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
614 non-ASCII characters, as accepted by the :meth:`str.encode` method.
615 *encoding* defaults to ``'utf-8'``.
630 quoting HTML form values when building up a query string to go into a URL.
640 :class:`str`, and does not perform string-to-bytes encoding.
646 .. function:: unquote(string, encoding='utf-8', errors='replace')
648 Replace ``%xx`` escapes with their single-character equivalent.
650 percent-encoded sequences into Unicode characters, as accepted by the
655 *encoding* defaults to ``'utf-8'``.
667 .. function:: unquote_plus(string, encoding='utf-8', errors='replace')
679 Replace ``%xx`` escapes with their single-octet equivalent, and return a
684 If it is a :class:`str`, unescaped non-ASCII characters in *string*
685 are encoded into UTF-8 bytes.
693 Convert a mapping object or a sequence of two-element tuples, which may
694 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
705 (``application/x-www-form-urlencoded``). An alternate function that can be
710 When a sequence of two-element tuples is used as the *query*
723 provided in this module to parse query strings into Python data structures.
725 Refer to :ref:`urllib examples <urllib-examples>` to find out how the
726 :func:`urllib.parse.urlencode` method can be used for generating the query
727 string of a URL or data for a POST request.
738 `WHATWG`_ - URL Living standard
739 Working Group for the URL Standard that defines URLs, domains, IP addresses, the
740 application/x-www-form-urlencoded format, and their API.
742 :rfc:`3986` - Uniform Resource Identifiers
743 This is the current standard (STD66). Any changes to urllib.parse module
745 mostly for backward compatibility purposes and for certain de-facto
748 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
751 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
755 :rfc:`2368` - The mailto URL scheme.
756 Parsing requirements for mailto URL schemes.
758 :rfc:`1808` - Relative Uniform Resource Locators
760 relative URL, including a fair number of "Abnormal Examples" which govern the
763 :rfc:`1738` - Uniform Resource Locators (URL)
766 .. _WHATWG: https://url.spec.whatwg.org/