1:mod:`!urllib.parse` --- Parse URLs into components 2=================================================== 3 4.. module:: urllib.parse 5 :synopsis: Parse URLs into or assemble them from components. 6 7**Source code:** :source:`Lib/urllib/parse.py` 8 9.. index:: 10 single: WWW 11 single: World Wide Web 12 single: URL 13 pair: URL; parsing 14 pair: relative; URL 15 16-------------- 17 18This module defines a standard interface to break Uniform Resource Locator (URL) 19strings up in components (addressing scheme, network location, path etc.), to 20combine the components back into a URL string, and to convert a "relative URL" 21to an absolute URL given a "base URL." 22 23The module has been designed to match the internet RFC on Relative Uniform 24Resource Locators. It supports the following URL schemes: ``file``, ``ftp``, 25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``itms-services``, ``mailto``, ``mms``, 26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtsps``, ``rtspu``, 27``sftp``, ``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, 28``telnet``, ``wais``, ``ws``, ``wss``. 29 30.. impl-detail:: 31 32 The inclusion of the ``itms-services`` URL scheme can prevent an app from 33 passing Apple's App Store review process for the macOS and iOS App Stores. 34 Handling for the ``itms-services`` scheme is always removed on iOS; on 35 macOS, it *may* be removed if CPython has been built with the 36 :option:`--with-app-store-compliance` option. 37 38The :mod:`urllib.parse` module defines functions that fall into two broad 39categories: URL parsing and URL quoting. These are covered in detail in 40the following sections. 41 42This module's functions use the deprecated term ``netloc`` (or ``net_loc``), 43which was introduced in :rfc:`1808`. However, this term has been obsoleted by 44:rfc:`3986`, which introduced the term ``authority`` as its replacement. 45The use of ``netloc`` is continued for backward compatibility. 46 47URL Parsing 48----------- 49 50The URL parsing functions focus on splitting a URL string into its components, 51or on combining URL components into a URL string. 52 53.. function:: urlparse(urlstring, scheme='', allow_fragments=True) 54 55 Parse a URL into six components, returning a 6-item :term:`named tuple`. This 56 corresponds to the general structure of a URL: 57 ``scheme://netloc/path;parameters?query#fragment``. 58 Each tuple item is a string, possibly empty. The components are not broken up 59 into smaller parts (for example, the network location is a single string), and % 60 escapes are not expanded. The delimiters as shown above are not part of the 61 result, except for a leading slash in the *path* component, which is retained if 62 present. For example: 63 64 .. doctest:: 65 :options: +NORMALIZE_WHITESPACE 66 67 >>> from urllib.parse import urlparse 68 >>> urlparse("scheme://netloc/path;parameters?query#fragment") 69 ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='', 70 query='query', fragment='fragment') 71 >>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?" 72 ... "highlight=params#url-parsing") 73 >>> o 74 ParseResult(scheme='http', netloc='docs.python.org:80', 75 path='/3/library/urllib.parse.html', params='', 76 query='highlight=params', fragment='url-parsing') 77 >>> o.scheme 78 'http' 79 >>> o.netloc 80 'docs.python.org:80' 81 >>> o.hostname 82 'docs.python.org' 83 >>> o.port 84 80 85 >>> o._replace(fragment="").geturl() 86 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params' 87 88 Following the syntax specifications in :rfc:`1808`, urlparse recognizes 89 a netloc only if it is properly introduced by '//'. Otherwise the 90 input is presumed to be a relative URL and thus to start with 91 a path component. 92 93 .. doctest:: 94 :options: +NORMALIZE_WHITESPACE 95 96 >>> from urllib.parse import urlparse 97 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html') 98 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', 99 params='', query='', fragment='') 100 >>> urlparse('www.cwi.nl/%7Eguido/Python.html') 101 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html', 102 params='', query='', fragment='') 103 >>> urlparse('help/Python.html') 104 ParseResult(scheme='', netloc='', path='help/Python.html', params='', 105 query='', fragment='') 106 107 The *scheme* argument gives the default addressing scheme, to be 108 used only if the URL does not specify one. It should be the same type 109 (text or bytes) as *urlstring*, except that the default value ``''`` is 110 always allowed, and is automatically converted to ``b''`` if appropriate. 111 112 If the *allow_fragments* argument is false, fragment identifiers are not 113 recognized. Instead, they are parsed as part of the path, parameters 114 or query component, and :attr:`fragment` is set to the empty string in 115 the return value. 116 117 The return value is a :term:`named tuple`, which means that its items can 118 be accessed by index or as named attributes, which are: 119 120 +------------------+-------+-------------------------+------------------------+ 121 | Attribute | Index | Value | Value if not present | 122 +==================+=======+=========================+========================+ 123 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | 124 +------------------+-------+-------------------------+------------------------+ 125 | :attr:`netloc` | 1 | Network location part | empty string | 126 +------------------+-------+-------------------------+------------------------+ 127 | :attr:`path` | 2 | Hierarchical path | empty string | 128 +------------------+-------+-------------------------+------------------------+ 129 | :attr:`params` | 3 | Parameters for last | empty string | 130 | | | path element | | 131 +------------------+-------+-------------------------+------------------------+ 132 | :attr:`query` | 4 | Query component | empty string | 133 +------------------+-------+-------------------------+------------------------+ 134 | :attr:`fragment` | 5 | Fragment identifier | empty string | 135 +------------------+-------+-------------------------+------------------------+ 136 | :attr:`username` | | User name | :const:`None` | 137 +------------------+-------+-------------------------+------------------------+ 138 | :attr:`password` | | Password | :const:`None` | 139 +------------------+-------+-------------------------+------------------------+ 140 | :attr:`hostname` | | Host name (lower case) | :const:`None` | 141 +------------------+-------+-------------------------+------------------------+ 142 | :attr:`port` | | Port number as integer, | :const:`None` | 143 | | | if present | | 144 +------------------+-------+-------------------------+------------------------+ 145 146 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if 147 an invalid port is specified in the URL. See section 148 :ref:`urlparse-result-object` for more information on the result object. 149 150 Unmatched square brackets in the :attr:`netloc` attribute will raise a 151 :exc:`ValueError`. 152 153 Characters in the :attr:`netloc` attribute that decompose under NFKC 154 normalization (as used by the IDNA encoding) into any of ``/``, ``?``, 155 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is 156 decomposed before parsing, no error will be raised. 157 158 As is the case with all named tuples, the subclass has a few additional methods 159 and attributes that are particularly useful. One such method is :meth:`_replace`. 160 The :meth:`_replace` method will return a new ParseResult object replacing specified 161 fields with new values. 162 163 .. doctest:: 164 :options: +NORMALIZE_WHITESPACE 165 166 >>> from urllib.parse import urlparse 167 >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html') 168 >>> u 169 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', 170 params='', query='', fragment='') 171 >>> u._replace(scheme='http') 172 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', 173 params='', query='', fragment='') 174 175 .. warning:: 176 177 :func:`urlparse` does not perform validation. See :ref:`URL parsing 178 security <url-parsing-security>` for details. 179 180 .. versionchanged:: 3.2 181 Added IPv6 URL parsing capabilities. 182 183 .. versionchanged:: 3.3 184 The fragment is now parsed for all URL schemes (unless *allow_fragments* is 185 false), in accordance with :rfc:`3986`. Previously, an allowlist of 186 schemes that support fragments existed. 187 188 .. versionchanged:: 3.6 189 Out-of-range port numbers now raise :exc:`ValueError`, instead of 190 returning :const:`None`. 191 192 .. versionchanged:: 3.8 193 Characters that affect netloc parsing under NFKC normalization will 194 now raise :exc:`ValueError`. 195 196 197.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') 198 199 Parse a query string given as a string argument (data of type 200 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a 201 dictionary. The dictionary keys are the unique query variable names and the 202 values are lists of values for each name. 203 204 The optional argument *keep_blank_values* is a flag indicating whether blank 205 values in percent-encoded queries should be treated as blank strings. A true value 206 indicates that blanks should be retained as blank strings. The default false 207 value indicates that blank values are to be ignored and treated as if they were 208 not included. 209 210 The optional argument *strict_parsing* is a flag indicating what to do with 211 parsing errors. If false (the default), errors are silently ignored. If true, 212 errors raise a :exc:`ValueError` exception. 213 214 The optional *encoding* and *errors* parameters specify how to decode 215 percent-encoded sequences into Unicode characters, as accepted by the 216 :meth:`bytes.decode` method. 217 218 The optional argument *max_num_fields* is the maximum number of fields to 219 read. If set, then throws a :exc:`ValueError` if there are more than 220 *max_num_fields* fields read. 221 222 The optional argument *separator* is the symbol to use for separating the 223 query arguments. It defaults to ``&``. 224 225 Use the :func:`urllib.parse.urlencode` function (with the ``doseq`` 226 parameter set to ``True``) to convert such dictionaries into query 227 strings. 228 229 230 .. versionchanged:: 3.2 231 Add *encoding* and *errors* parameters. 232 233 .. versionchanged:: 3.8 234 Added *max_num_fields* parameter. 235 236 .. versionchanged:: 3.10 237 Added *separator* parameter with the default value of ``&``. Python 238 versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as 239 query parameter separator. This has been changed to allow only a single 240 separator key, with ``&`` as the default separator. 241 242 243.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') 244 245 Parse a query string given as a string argument (data of type 246 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of 247 name, value pairs. 248 249 The optional argument *keep_blank_values* is a flag indicating whether blank 250 values in percent-encoded queries should be treated as blank strings. A true value 251 indicates that blanks should be retained as blank strings. The default false 252 value indicates that blank values are to be ignored and treated as if they were 253 not included. 254 255 The optional argument *strict_parsing* is a flag indicating what to do with 256 parsing errors. If false (the default), errors are silently ignored. If true, 257 errors raise a :exc:`ValueError` exception. 258 259 The optional *encoding* and *errors* parameters specify how to decode 260 percent-encoded sequences into Unicode characters, as accepted by the 261 :meth:`bytes.decode` method. 262 263 The optional argument *max_num_fields* is the maximum number of fields to 264 read. If set, then throws a :exc:`ValueError` if there are more than 265 *max_num_fields* fields read. 266 267 The optional argument *separator* is the symbol to use for separating the 268 query arguments. It defaults to ``&``. 269 270 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into 271 query strings. 272 273 .. versionchanged:: 3.2 274 Add *encoding* and *errors* parameters. 275 276 .. versionchanged:: 3.8 277 Added *max_num_fields* parameter. 278 279 .. versionchanged:: 3.10 280 Added *separator* parameter with the default value of ``&``. Python 281 versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as 282 query parameter separator. This has been changed to allow only a single 283 separator key, with ``&`` as the default separator. 284 285 286.. function:: urlunparse(parts) 287 288 Construct a URL from a tuple as returned by ``urlparse()``. The *parts* 289 argument can be any six-item iterable. This may result in a slightly 290 different, but equivalent URL, if the URL that was parsed originally had 291 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC 292 states that these are equivalent). 293 294 295.. function:: urlsplit(urlstring, scheme='', allow_fragments=True) 296 297 This is similar to :func:`urlparse`, but does not split the params from the URL. 298 This should generally be used instead of :func:`urlparse` if the more recent URL 299 syntax allowing parameters to be applied to each segment of the *path* portion 300 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to 301 separate the path segments and parameters. This function returns a 5-item 302 :term:`named tuple`:: 303 304 (addressing scheme, network location, path, query, fragment identifier). 305 306 The return value is a :term:`named tuple`, its items can be accessed by index 307 or as named attributes: 308 309 +------------------+-------+-------------------------+----------------------+ 310 | Attribute | Index | Value | Value if not present | 311 +==================+=======+=========================+======================+ 312 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | 313 +------------------+-------+-------------------------+----------------------+ 314 | :attr:`netloc` | 1 | Network location part | empty string | 315 +------------------+-------+-------------------------+----------------------+ 316 | :attr:`path` | 2 | Hierarchical path | empty string | 317 +------------------+-------+-------------------------+----------------------+ 318 | :attr:`query` | 3 | Query component | empty string | 319 +------------------+-------+-------------------------+----------------------+ 320 | :attr:`fragment` | 4 | Fragment identifier | empty string | 321 +------------------+-------+-------------------------+----------------------+ 322 | :attr:`username` | | User name | :const:`None` | 323 +------------------+-------+-------------------------+----------------------+ 324 | :attr:`password` | | Password | :const:`None` | 325 +------------------+-------+-------------------------+----------------------+ 326 | :attr:`hostname` | | Host name (lower case) | :const:`None` | 327 +------------------+-------+-------------------------+----------------------+ 328 | :attr:`port` | | Port number as integer, | :const:`None` | 329 | | | if present | | 330 +------------------+-------+-------------------------+----------------------+ 331 332 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if 333 an invalid port is specified in the URL. See section 334 :ref:`urlparse-result-object` for more information on the result object. 335 336 Unmatched square brackets in the :attr:`netloc` attribute will raise a 337 :exc:`ValueError`. 338 339 Characters in the :attr:`netloc` attribute that decompose under NFKC 340 normalization (as used by the IDNA encoding) into any of ``/``, ``?``, 341 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is 342 decomposed before parsing, no error will be raised. 343 344 Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0 345 control and space characters are stripped from the URL. ``\n``, 346 ``\r`` and tab ``\t`` characters are removed from the URL at any position. 347 348 .. warning:: 349 350 :func:`urlsplit` does not perform validation. See :ref:`URL parsing 351 security <url-parsing-security>` for details. 352 353 .. versionchanged:: 3.6 354 Out-of-range port numbers now raise :exc:`ValueError`, instead of 355 returning :const:`None`. 356 357 .. versionchanged:: 3.8 358 Characters that affect netloc parsing under NFKC normalization will 359 now raise :exc:`ValueError`. 360 361 .. versionchanged:: 3.10 362 ASCII newline and tab characters are stripped from the URL. 363 364 .. versionchanged:: 3.12 365 Leading WHATWG C0 control and space characters are stripped from the URL. 366 367.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser 368 369.. function:: urlunsplit(parts) 370 371 Combine the elements of a tuple as returned by :func:`urlsplit` into a 372 complete URL as a string. The *parts* argument can be any five-item 373 iterable. This may result in a slightly different, but equivalent URL, if the 374 URL that was parsed originally had unnecessary delimiters (for example, a ? 375 with an empty query; the RFC states that these are equivalent). 376 377 378.. function:: urljoin(base, url, allow_fragments=True) 379 380 Construct a full ("absolute") URL by combining a "base URL" (*base*) with 381 another URL (*url*). Informally, this uses components of the base URL, in 382 particular the addressing scheme, the network location and (part of) the 383 path, to provide missing components in the relative URL. For example: 384 385 >>> from urllib.parse import urljoin 386 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') 387 'http://www.cwi.nl/%7Eguido/FAQ.html' 388 389 The *allow_fragments* argument has the same meaning and default as for 390 :func:`urlparse`. 391 392 .. note:: 393 394 If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``), 395 the *url*'s hostname and/or scheme will be present in the result. For example: 396 397 .. doctest:: 398 399 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 400 ... '//www.python.org/%7Eguido') 401 'http://www.python.org/%7Eguido' 402 403 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and 404 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts. 405 406 .. warning:: 407 408 Because an absolute URL may be passed as the ``url`` parameter, it is 409 generally **not secure** to use ``urljoin`` with an attacker-controlled 410 ``url``. For example in, 411 ``urljoin("https://website.com/users/", username)``, if ``username`` can 412 contain an absolute URL, the result of ``urljoin`` will be the absolute 413 URL. 414 415 416 .. versionchanged:: 3.5 417 418 Behavior updated to match the semantics defined in :rfc:`3986`. 419 420 421.. function:: urldefrag(url) 422 423 If *url* contains a fragment identifier, return a modified version of *url* 424 with no fragment identifier, and the fragment identifier as a separate 425 string. If there is no fragment identifier in *url*, return *url* unmodified 426 and an empty string. 427 428 The return value is a :term:`named tuple`, its items can be accessed by index 429 or as named attributes: 430 431 +------------------+-------+-------------------------+----------------------+ 432 | Attribute | Index | Value | Value if not present | 433 +==================+=======+=========================+======================+ 434 | :attr:`url` | 0 | URL with no fragment | empty string | 435 +------------------+-------+-------------------------+----------------------+ 436 | :attr:`fragment` | 1 | Fragment identifier | empty string | 437 +------------------+-------+-------------------------+----------------------+ 438 439 See section :ref:`urlparse-result-object` for more information on the result 440 object. 441 442 .. versionchanged:: 3.2 443 Result is a structured object rather than a simple 2-tuple. 444 445.. function:: unwrap(url) 446 447 Extract the url from a wrapped URL (that is, a string formatted as 448 ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path`` 449 or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned 450 without changes. 451 452.. _url-parsing-security: 453 454URL parsing security 455-------------------- 456 457The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation** of 458inputs. They may not raise errors on inputs that other applications consider 459invalid. They may also succeed on some inputs that might not be considered 460URLs elsewhere. Their purpose is for practical functionality rather than 461purity. 462 463Instead of raising an exception on unusual input, they may instead return some 464component parts as empty strings. Or components may contain more than perhaps 465they should. 466 467We recommend that users of these APIs where the values may be used anywhere 468with security implications code defensively. Do some verification within your 469code before trusting a returned component part. Does that ``scheme`` make 470sense? Is that a sensible ``path``? Is there anything strange about that 471``hostname``? etc. 472 473What constitutes a URL is not universally well defined. Different applications 474have different needs and desired constraints. For instance the living `WHATWG 475spec`_ describes what user facing web clients such as a web browser require. 476While :rfc:`3986` is more general. These functions incorporate some aspects of 477both, but cannot be claimed compliant with either. The APIs and existing user 478code with expectations on specific behaviors predate both standards leading us 479to be very cautious about making API behavior changes. 480 481.. _parsing-ascii-encoded-bytes: 482 483Parsing ASCII Encoded Bytes 484--------------------------- 485 486The URL parsing functions were originally designed to operate on character 487strings only. In practice, it is useful to be able to manipulate properly 488quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the 489URL parsing functions in this module all operate on :class:`bytes` and 490:class:`bytearray` objects in addition to :class:`str` objects. 491 492If :class:`str` data is passed in, the result will also contain only 493:class:`str` data. If :class:`bytes` or :class:`bytearray` data is 494passed in, the result will contain only :class:`bytes` data. 495 496Attempting to mix :class:`str` data with :class:`bytes` or 497:class:`bytearray` in a single function call will result in a 498:exc:`TypeError` being raised, while attempting to pass in non-ASCII 499byte values will trigger :exc:`UnicodeDecodeError`. 500 501To support easier conversion of result objects between :class:`str` and 502:class:`bytes`, all return values from URL parsing functions provide 503either an :meth:`encode` method (when the result contains :class:`str` 504data) or a :meth:`decode` method (when the result contains :class:`bytes` 505data). The signatures of these methods match those of the corresponding 506:class:`str` and :class:`bytes` methods (except that the default encoding 507is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a 508corresponding type that contains either :class:`bytes` data (for 509:meth:`encode` methods) or :class:`str` data (for 510:meth:`decode` methods). 511 512Applications that need to operate on potentially improperly quoted URLs 513that may contain non-ASCII data will need to do their own decoding from 514bytes to characters before invoking the URL parsing methods. 515 516The behaviour described in this section applies only to the URL parsing 517functions. The URL quoting functions use their own rules when producing 518or consuming byte sequences as detailed in the documentation of the 519individual URL quoting functions. 520 521.. versionchanged:: 3.2 522 URL parsing functions now accept ASCII encoded byte sequences 523 524 525.. _urlparse-result-object: 526 527Structured Parse Results 528------------------------ 529 530The result objects from the :func:`urlparse`, :func:`urlsplit` and 531:func:`urldefrag` functions are subclasses of the :class:`tuple` type. 532These subclasses add the attributes listed in the documentation for 533those functions, the encoding and decoding support described in the 534previous section, as well as an additional method: 535 536.. method:: urllib.parse.SplitResult.geturl() 537 538 Return the re-combined version of the original URL as a string. This may 539 differ from the original URL in that the scheme may be normalized to lower 540 case and empty components may be dropped. Specifically, empty parameters, 541 queries, and fragment identifiers will be removed. 542 543 For :func:`urldefrag` results, only empty fragment identifiers will be removed. 544 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be 545 made to the URL returned by this method. 546 547 The result of this method remains unchanged if passed back through the original 548 parsing function: 549 550 >>> from urllib.parse import urlsplit 551 >>> url = 'HTTP://www.Python.org/doc/#' 552 >>> r1 = urlsplit(url) 553 >>> r1.geturl() 554 'http://www.Python.org/doc/' 555 >>> r2 = urlsplit(r1.geturl()) 556 >>> r2.geturl() 557 'http://www.Python.org/doc/' 558 559 560The following classes provide the implementations of the structured parse 561results when operating on :class:`str` objects: 562 563.. class:: DefragResult(url, fragment) 564 565 Concrete class for :func:`urldefrag` results containing :class:`str` 566 data. The :meth:`encode` method returns a :class:`DefragResultBytes` 567 instance. 568 569 .. versionadded:: 3.2 570 571.. class:: ParseResult(scheme, netloc, path, params, query, fragment) 572 573 Concrete class for :func:`urlparse` results containing :class:`str` 574 data. The :meth:`encode` method returns a :class:`ParseResultBytes` 575 instance. 576 577.. class:: SplitResult(scheme, netloc, path, query, fragment) 578 579 Concrete class for :func:`urlsplit` results containing :class:`str` 580 data. The :meth:`encode` method returns a :class:`SplitResultBytes` 581 instance. 582 583 584The following classes provide the implementations of the parse results when 585operating on :class:`bytes` or :class:`bytearray` objects: 586 587.. class:: DefragResultBytes(url, fragment) 588 589 Concrete class for :func:`urldefrag` results containing :class:`bytes` 590 data. The :meth:`decode` method returns a :class:`DefragResult` 591 instance. 592 593 .. versionadded:: 3.2 594 595.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment) 596 597 Concrete class for :func:`urlparse` results containing :class:`bytes` 598 data. The :meth:`decode` method returns a :class:`ParseResult` 599 instance. 600 601 .. versionadded:: 3.2 602 603.. class:: SplitResultBytes(scheme, netloc, path, query, fragment) 604 605 Concrete class for :func:`urlsplit` results containing :class:`bytes` 606 data. The :meth:`decode` method returns a :class:`SplitResult` 607 instance. 608 609 .. versionadded:: 3.2 610 611 612URL Quoting 613----------- 614 615The URL quoting functions focus on taking program data and making it safe 616for use as URL components by quoting special characters and appropriately 617encoding non-ASCII text. They also support reversing these operations to 618recreate the original data from the contents of a URL component if that 619task isn't already covered by the URL parsing functions above. 620 621.. function:: quote(string, safe='/', encoding=None, errors=None) 622 623 Replace special characters in *string* using the :samp:`%{xx}` escape. Letters, 624 digits, and the characters ``'_.-~'`` are never quoted. By default, this 625 function is intended for quoting the path section of a URL. The optional 626 *safe* parameter specifies additional ASCII characters that should not be 627 quoted --- its default value is ``'/'``. 628 629 *string* may be either a :class:`str` or a :class:`bytes` object. 630 631 .. versionchanged:: 3.7 632 Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now 633 included in the set of unreserved characters. 634 635 The optional *encoding* and *errors* parameters specify how to deal with 636 non-ASCII characters, as accepted by the :meth:`str.encode` method. 637 *encoding* defaults to ``'utf-8'``. 638 *errors* defaults to ``'strict'``, meaning unsupported characters raise a 639 :class:`UnicodeEncodeError`. 640 *encoding* and *errors* must not be supplied if *string* is a 641 :class:`bytes`, or a :class:`TypeError` is raised. 642 643 Note that ``quote(string, safe, encoding, errors)`` is equivalent to 644 ``quote_from_bytes(string.encode(encoding, errors), safe)``. 645 646 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``. 647 648 649.. function:: quote_plus(string, safe='', encoding=None, errors=None) 650 651 Like :func:`quote`, but also replace spaces with plus signs, as required for 652 quoting HTML form values when building up a query string to go into a URL. 653 Plus signs in the original string are escaped unless they are included in 654 *safe*. It also does not have *safe* default to ``'/'``. 655 656 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``. 657 658 659.. function:: quote_from_bytes(bytes, safe='/') 660 661 Like :func:`quote`, but accepts a :class:`bytes` object rather than a 662 :class:`str`, and does not perform string-to-bytes encoding. 663 664 Example: ``quote_from_bytes(b'a&\xef')`` yields 665 ``'a%26%EF'``. 666 667 668.. function:: unquote(string, encoding='utf-8', errors='replace') 669 670 Replace :samp:`%{xx}` escapes with their single-character equivalent. 671 The optional *encoding* and *errors* parameters specify how to decode 672 percent-encoded sequences into Unicode characters, as accepted by the 673 :meth:`bytes.decode` method. 674 675 *string* may be either a :class:`str` or a :class:`bytes` object. 676 677 *encoding* defaults to ``'utf-8'``. 678 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced 679 by a placeholder character. 680 681 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``. 682 683 .. versionchanged:: 3.9 684 *string* parameter supports bytes and str objects (previously only str). 685 686 687 688 689.. function:: unquote_plus(string, encoding='utf-8', errors='replace') 690 691 Like :func:`unquote`, but also replace plus signs with spaces, as required 692 for unquoting HTML form values. 693 694 *string* must be a :class:`str`. 695 696 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``. 697 698 699.. function:: unquote_to_bytes(string) 700 701 Replace :samp:`%{xx}` escapes with their single-octet equivalent, and return a 702 :class:`bytes` object. 703 704 *string* may be either a :class:`str` or a :class:`bytes` object. 705 706 If it is a :class:`str`, unescaped non-ASCII characters in *string* 707 are encoded into UTF-8 bytes. 708 709 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``. 710 711 712.. function:: urlencode(query, doseq=False, safe='', encoding=None, \ 713 errors=None, quote_via=quote_plus) 714 715 Convert a mapping object or a sequence of two-element tuples, which may 716 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII 717 text string. If the resultant string is to be used as a *data* for POST 718 operation with the :func:`~urllib.request.urlopen` function, then 719 it should be encoded to bytes, otherwise it would result in a 720 :exc:`TypeError`. 721 722 The resulting string is a series of ``key=value`` pairs separated by ``'&'`` 723 characters, where both *key* and *value* are quoted using the *quote_via* 724 function. By default, :func:`quote_plus` is used to quote the values, which 725 means spaces are quoted as a ``'+'`` character and '/' characters are 726 encoded as ``%2F``, which follows the standard for GET requests 727 (``application/x-www-form-urlencoded``). An alternate function that can be 728 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20`` 729 and not encode '/' characters. For maximum control of what is quoted, use 730 ``quote`` and specify a value for *safe*. 731 732 When a sequence of two-element tuples is used as the *query* 733 argument, the first element of each tuple is a key and the second is a 734 value. The value element in itself can be a sequence and in that case, if 735 the optional parameter *doseq* evaluates to ``True``, individual 736 ``key=value`` pairs separated by ``'&'`` are generated for each element of 737 the value sequence for the key. The order of parameters in the encoded 738 string will match the order of parameter tuples in the sequence. 739 740 The *safe*, *encoding*, and *errors* parameters are passed down to 741 *quote_via* (the *encoding* and *errors* parameters are only passed 742 when a query element is a :class:`str`). 743 744 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are 745 provided in this module to parse query strings into Python data structures. 746 747 Refer to :ref:`urllib examples <urllib-examples>` to find out how the 748 :func:`urllib.parse.urlencode` method can be used for generating the query 749 string of a URL or data for a POST request. 750 751 .. versionchanged:: 3.2 752 *query* supports bytes and string objects. 753 754 .. versionchanged:: 3.5 755 Added the *quote_via* parameter. 756 757 758.. seealso:: 759 760 `WHATWG`_ - URL Living standard 761 Working Group for the URL Standard that defines URLs, domains, IP addresses, the 762 application/x-www-form-urlencoded format, and their API. 763 764 :rfc:`3986` - Uniform Resource Identifiers 765 This is the current standard (STD66). Any changes to urllib.parse module 766 should conform to this. Certain deviations could be observed, which are 767 mostly for backward compatibility purposes and for certain de-facto 768 parsing requirements as commonly observed in major browsers. 769 770 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's. 771 This specifies the parsing requirements of IPv6 URLs. 772 773 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax 774 Document describing the generic syntactic requirements for both Uniform Resource 775 Names (URNs) and Uniform Resource Locators (URLs). 776 777 :rfc:`2368` - The mailto URL scheme. 778 Parsing requirements for mailto URL schemes. 779 780 :rfc:`1808` - Relative Uniform Resource Locators 781 This Request For Comments includes the rules for joining an absolute and a 782 relative URL, including a fair number of "Abnormal Examples" which govern the 783 treatment of border cases. 784 785 :rfc:`1738` - Uniform Resource Locators (URL) 786 This specifies the formal syntax and semantics of absolute URLs. 787 788.. _WHATWG: https://url.spec.whatwg.org/ 789