1:mod:`!urllib.request` --- Extensible library for opening URLs 2============================================================== 3 4.. module:: urllib.request 5 :synopsis: Extensible library for opening URLs. 6 7.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu> 8.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> 9.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com> 10 11**Source code:** :source:`Lib/urllib/request.py` 12 13-------------- 14 15The :mod:`urllib.request` module defines functions and classes which help in 16opening URLs (mostly HTTP) in a complex world --- basic and digest 17authentication, redirections, cookies and more. 18 19.. seealso:: 20 21 The `Requests package <https://requests.readthedocs.io/en/master/>`_ 22 is recommended for a higher-level HTTP client interface. 23 24.. warning:: 25 26 On macOS it is unsafe to use this module in programs using 27 :func:`os.fork` because the :func:`getproxies` implementation for 28 macOS uses a higher-level system API. Set the environment variable 29 ``no_proxy`` to ``*`` to avoid this problem 30 (e.g. ``os.environ["no_proxy"] = "*"``). 31 32.. include:: ../includes/wasm-notavail.rst 33 34The :mod:`urllib.request` module defines the following functions: 35 36 37.. function:: urlopen(url, data=None[, timeout], *, context=None) 38 39 Open *url*, which can be either a string containing a valid, properly 40 encoded URL, or a :class:`Request` object. 41 42 *data* must be an object specifying additional data to be sent to the 43 server, or ``None`` if no such data is needed. See :class:`Request` 44 for details. 45 46 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header 47 in its HTTP requests. 48 49 The optional *timeout* parameter specifies a timeout in seconds for 50 blocking operations like the connection attempt (if not specified, 51 the global default timeout setting will be used). This actually 52 only works for HTTP, HTTPS and FTP connections. 53 54 If *context* is specified, it must be a :class:`ssl.SSLContext` instance 55 describing the various SSL options. See :class:`~http.client.HTTPSConnection` 56 for more details. 57 58 This function always returns an object which can work as a 59 :term:`context manager` and has the properties *url*, *headers*, and *status*. 60 See :class:`urllib.response.addinfourl` for more detail on these properties. 61 62 For HTTP and HTTPS URLs, this function returns a 63 :class:`http.client.HTTPResponse` object slightly modified. In addition 64 to the three new methods above, the msg attribute contains the 65 same information as the :attr:`~http.client.HTTPResponse.reason` 66 attribute --- the reason phrase returned by server --- instead of 67 the response headers as it is specified in the documentation for 68 :class:`~http.client.HTTPResponse`. 69 70 For FTP, file, and data URLs and requests explicitly handled by legacy 71 :class:`URLopener` and :class:`FancyURLopener` classes, this function 72 returns a :class:`urllib.response.addinfourl` object. 73 74 Raises :exc:`~urllib.error.URLError` on protocol errors. 75 76 Note that ``None`` may be returned if no handler handles the request (though 77 the default installed global :class:`OpenerDirector` uses 78 :class:`UnknownHandler` to ensure this never happens). 79 80 In addition, if proxy settings are detected (for example, when a ``*_proxy`` 81 environment variable like :envvar:`!http_proxy` is set), 82 :class:`ProxyHandler` is default installed and makes sure the requests are 83 handled through the proxy. 84 85 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been 86 discontinued; :func:`urllib.request.urlopen` corresponds to the old 87 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary 88 parameter to ``urllib.urlopen``, can be obtained by using 89 :class:`ProxyHandler` objects. 90 91 .. audit-event:: urllib.Request fullurl,data,headers,method urllib.request.urlopen 92 93 The default opener raises an :ref:`auditing event <auditing>` 94 ``urllib.Request`` with arguments ``fullurl``, ``data``, ``headers``, 95 ``method`` taken from the request object. 96 97 .. versionchanged:: 3.2 98 *cafile* and *capath* were added. 99 100 HTTPS virtual hosts are now supported if possible (that is, if 101 :const:`ssl.HAS_SNI` is true). 102 103 *data* can be an iterable object. 104 105 .. versionchanged:: 3.3 106 *cadefault* was added. 107 108 .. versionchanged:: 3.4.3 109 *context* was added. 110 111 .. versionchanged:: 3.10 112 HTTPS connection now send an ALPN extension with protocol indicator 113 ``http/1.1`` when no *context* is given. Custom *context* should set 114 ALPN protocols with :meth:`~ssl.SSLContext.set_alpn_protocols`. 115 116 .. versionchanged:: 3.13 117 Remove *cafile*, *capath* and *cadefault* parameters: use the *context* 118 parameter instead. 119 120 121.. function:: install_opener(opener) 122 123 Install an :class:`OpenerDirector` instance as the default global opener. 124 Installing an opener is only necessary if you want urlopen to use that 125 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of 126 :func:`~urllib.request.urlopen`. The code does not check for a real 127 :class:`OpenerDirector`, and any class with the appropriate interface will 128 work. 129 130 131.. function:: build_opener([handler, ...]) 132 133 Return an :class:`OpenerDirector` instance, which chains the handlers in the 134 order given. *handler*\s can be either instances of :class:`BaseHandler`, or 135 subclasses of :class:`BaseHandler` (in which case it must be possible to call 136 the constructor without any parameters). Instances of the following classes 137 will be in front of the *handler*\s, unless the *handler*\s contain them, 138 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy 139 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`, 140 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`, 141 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`. 142 143 If the Python installation has SSL support (i.e., if the :mod:`ssl` module 144 can be imported), :class:`HTTPSHandler` will also be added. 145 146 A :class:`BaseHandler` subclass may also change its :attr:`handler_order` 147 attribute to modify its position in the handlers list. 148 149 150.. function:: pathname2url(path) 151 152 Convert the given local path to a ``file:`` URL. This function uses 153 :func:`~urllib.parse.quote` function to encode the path. For historical 154 reasons, the return value omits the ``file:`` scheme prefix. This example 155 shows the function being used on Windows:: 156 157 >>> from urllib.request import pathname2url 158 >>> path = 'C:\\Program Files' 159 >>> 'file:' + pathname2url(path) 160 'file:///C:/Program%20Files' 161 162 163.. function:: url2pathname(url) 164 165 Convert the given ``file:`` URL to a local path. This function uses 166 :func:`~urllib.parse.unquote` to decode the URL. For historical reasons, 167 the given value *must* omit the ``file:`` scheme prefix. This example shows 168 the function being used on Windows:: 169 170 >>> from urllib.request import url2pathname 171 >>> url = 'file:///C:/Program%20Files' 172 >>> url2pathname(url.removeprefix('file:')) 173 'C:\\Program Files' 174 175.. function:: getproxies() 176 177 This helper function returns a dictionary of scheme to proxy server URL 178 mappings. It scans the environment for variables named ``<scheme>_proxy``, 179 in a case insensitive approach, for all operating systems first, and when it 180 cannot find it, looks for proxy information from System 181 Configuration for macOS and Windows Systems Registry for Windows. 182 If both lowercase and uppercase environment variables exist (and disagree), 183 lowercase is preferred. 184 185 .. note:: 186 187 If the environment variable ``REQUEST_METHOD`` is set, which usually 188 indicates your script is running in a CGI environment, the environment 189 variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is 190 because that variable can be injected by a client using the "Proxy:" HTTP 191 header. If you need to use an HTTP proxy in a CGI environment, either use 192 ``ProxyHandler`` explicitly, or make sure the variable name is in 193 lowercase (or at least the ``_proxy`` suffix). 194 195 196The following classes are provided: 197 198.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) 199 200 This class is an abstraction of a URL request. 201 202 *url* should be a string containing a valid, properly encoded URL. 203 204 *data* must be an object specifying additional data to send to the 205 server, or ``None`` if no such data is needed. Currently HTTP 206 requests are the only ones that use *data*. The supported object 207 types include bytes, file-like objects, and iterables of bytes-like objects. 208 If no ``Content-Length`` nor ``Transfer-Encoding`` header field 209 has been provided, :class:`HTTPHandler` will set these headers according 210 to the type of *data*. ``Content-Length`` will be used to send 211 bytes objects, while ``Transfer-Encoding: chunked`` as specified in 212 :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables. 213 214 For an HTTP POST request method, *data* should be a buffer in the 215 standard :mimetype:`application/x-www-form-urlencoded` format. The 216 :func:`urllib.parse.urlencode` function takes a mapping or sequence 217 of 2-tuples and returns an ASCII string in this format. It should 218 be encoded to bytes before being used as the *data* parameter. 219 220 *headers* should be a dictionary, and will be treated as if 221 :meth:`add_header` was called with each key and value as arguments. 222 This is often used to "spoof" the ``User-Agent`` header value, which is 223 used by a browser to identify itself -- some HTTP servers only 224 allow requests coming from common browsers as opposed to scripts. 225 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 226 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while 227 :mod:`urllib`'s default user agent string is 228 ``"Python-urllib/2.6"`` (on Python 2.6). 229 All header keys are sent in camel case. 230 231 An appropriate ``Content-Type`` header should be included if the *data* 232 argument is present. If this header has not been provided and *data* 233 is not ``None``, ``Content-Type: application/x-www-form-urlencoded`` will 234 be added as a default. 235 236 The next two arguments are only of interest for correct handling 237 of third-party HTTP cookies: 238 239 *origin_req_host* should be the request-host of the origin 240 transaction, as defined by :rfc:`2965`. It defaults to 241 ``http.cookiejar.request_host(self)``. This is the host name or IP 242 address of the original request that was initiated by the user. 243 For example, if the request is for an image in an HTML document, 244 this should be the request-host of the request for the page 245 containing the image. 246 247 *unverifiable* should indicate whether the request is unverifiable, 248 as defined by :rfc:`2965`. It defaults to ``False``. An unverifiable 249 request is one whose URL the user did not have the option to 250 approve. For example, if the request is for an image in an HTML 251 document, and the user had no option to approve the automatic 252 fetching of the image, this should be true. 253 254 *method* should be a string that indicates the HTTP request method that 255 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the 256 :attr:`~Request.method` attribute and is used by :meth:`get_method`. 257 The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. 258 Subclasses may indicate a different default method by setting the 259 :attr:`~Request.method` attribute in the class itself. 260 261 .. note:: 262 The request will not work as expected if the data object is unable 263 to deliver its content more than once (e.g. a file or an iterable 264 that can produce the content only once) and the request is retried 265 for HTTP redirects or authentication. The *data* is sent to the 266 HTTP server right away after the headers. There is no support for 267 a 100-continue expectation in the library. 268 269 .. versionchanged:: 3.3 270 :attr:`Request.method` argument is added to the Request class. 271 272 .. versionchanged:: 3.4 273 Default :attr:`Request.method` may be indicated at the class level. 274 275 .. versionchanged:: 3.6 276 Do not raise an error if the ``Content-Length`` has not been 277 provided and *data* is neither ``None`` nor a bytes object. 278 Fall back to use chunked transfer encoding instead. 279 280.. class:: OpenerDirector() 281 282 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained 283 together. It manages the chaining of handlers, and recovery from errors. 284 285 286.. class:: BaseHandler() 287 288 This is the base class for all registered handlers --- and handles only the 289 simple mechanics of registration. 290 291 292.. class:: HTTPDefaultErrorHandler() 293 294 A class which defines a default handler for HTTP error responses; all responses 295 are turned into :exc:`~urllib.error.HTTPError` exceptions. 296 297 298.. class:: HTTPRedirectHandler() 299 300 A class to handle redirections. 301 302 303.. class:: HTTPCookieProcessor(cookiejar=None) 304 305 A class to handle HTTP Cookies. 306 307 308.. class:: ProxyHandler(proxies=None) 309 310 Cause requests to go through a proxy. If *proxies* is given, it must be a 311 dictionary mapping protocol names to URLs of proxies. The default is to read 312 the list of proxies from the environment variables 313 ``<protocol>_proxy``. If no proxy environment variables are set, then 314 in a Windows environment proxy settings are obtained from the registry's 315 Internet Settings section, and in a macOS environment proxy information 316 is retrieved from the System Configuration Framework. 317 318 To disable autodetected proxy pass an empty dictionary. 319 320 The :envvar:`no_proxy` environment variable can be used to specify hosts 321 which shouldn't be reached via proxy; if set, it should be a comma-separated 322 list of hostname suffixes, optionally with ``:port`` appended, for example 323 ``cern.ch,ncsa.uiuc.edu,some.host:8080``. 324 325 .. note:: 326 327 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; 328 see the documentation on :func:`~urllib.request.getproxies`. 329 330 331.. class:: HTTPPasswordMgr() 332 333 Keep a database of ``(realm, uri) -> (user, password)`` mappings. 334 335 336.. class:: HTTPPasswordMgrWithDefaultRealm() 337 338 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of 339 ``None`` is considered a catch-all realm, which is searched if no other realm 340 fits. 341 342 343.. class:: HTTPPasswordMgrWithPriorAuth() 344 345 A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a 346 database of ``uri -> is_authenticated`` mappings. Can be used by a 347 BasicAuth handler to determine when to send authentication credentials 348 immediately instead of waiting for a ``401`` response first. 349 350 .. versionadded:: 3.5 351 352 353.. class:: AbstractBasicAuthHandler(password_mgr=None) 354 355 This is a mixin class that helps with HTTP authentication, both to the remote 356 host and to a proxy. *password_mgr*, if given, should be something that is 357 compatible with :class:`HTTPPasswordMgr`; refer to section 358 :ref:`http-password-mgr` for information on the interface that must be 359 supported. If *passwd_mgr* also provides ``is_authenticated`` and 360 ``update_authenticated`` methods (see 361 :ref:`http-password-mgr-with-prior-auth`), then the handler will use the 362 ``is_authenticated`` result for a given URI to determine whether or not to 363 send authentication credentials with the request. If ``is_authenticated`` 364 returns ``True`` for the URI, credentials are sent. If ``is_authenticated`` 365 is ``False``, credentials are not sent, and then if a ``401`` response is 366 received the request is re-sent with the authentication credentials. If 367 authentication succeeds, ``update_authenticated`` is called to set 368 ``is_authenticated`` ``True`` for the URI, so that subsequent requests to 369 the URI or any of its super-URIs will automatically include the 370 authentication credentials. 371 372 .. versionadded:: 3.5 373 Added ``is_authenticated`` support. 374 375 376.. class:: HTTPBasicAuthHandler(password_mgr=None) 377 378 Handle authentication with the remote host. *password_mgr*, if given, should 379 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 380 section :ref:`http-password-mgr` for information on the interface that must 381 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when 382 presented with a wrong Authentication scheme. 383 384 385.. class:: ProxyBasicAuthHandler(password_mgr=None) 386 387 Handle authentication with the proxy. *password_mgr*, if given, should be 388 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 389 :ref:`http-password-mgr` for information on the interface that must be 390 supported. 391 392 393.. class:: AbstractDigestAuthHandler(password_mgr=None) 394 395 This is a mixin class that helps with HTTP authentication, both to the remote 396 host and to a proxy. *password_mgr*, if given, should be something that is 397 compatible with :class:`HTTPPasswordMgr`; refer to section 398 :ref:`http-password-mgr` for information on the interface that must be 399 supported. 400 401 402.. class:: HTTPDigestAuthHandler(password_mgr=None) 403 404 Handle authentication with the remote host. *password_mgr*, if given, should 405 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 406 section :ref:`http-password-mgr` for information on the interface that must 407 be supported. When both Digest Authentication Handler and Basic 408 Authentication Handler are both added, Digest Authentication is always tried 409 first. If the Digest Authentication returns a 40x response again, it is sent 410 to Basic Authentication handler to Handle. This Handler method will raise a 411 :exc:`ValueError` when presented with an authentication scheme other than 412 Digest or Basic. 413 414 .. versionchanged:: 3.3 415 Raise :exc:`ValueError` on unsupported Authentication Scheme. 416 417 418 419.. class:: ProxyDigestAuthHandler(password_mgr=None) 420 421 Handle authentication with the proxy. *password_mgr*, if given, should be 422 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 423 :ref:`http-password-mgr` for information on the interface that must be 424 supported. 425 426 427.. class:: HTTPHandler() 428 429 A class to handle opening of HTTP URLs. 430 431 432.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None) 433 434 A class to handle opening of HTTPS URLs. *context* and *check_hostname* 435 have the same meaning as in :class:`http.client.HTTPSConnection`. 436 437 .. versionchanged:: 3.2 438 *context* and *check_hostname* were added. 439 440 441.. class:: FileHandler() 442 443 Open local files. 444 445.. class:: DataHandler() 446 447 Open data URLs. 448 449 .. versionadded:: 3.4 450 451.. class:: FTPHandler() 452 453 Open FTP URLs. 454 455 456.. class:: CacheFTPHandler() 457 458 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. 459 460 461.. class:: UnknownHandler() 462 463 A catch-all class to handle unknown URLs. 464 465 466.. class:: HTTPErrorProcessor() 467 468 Process HTTP error responses. 469 470 471.. _request-objects: 472 473Request Objects 474--------------- 475 476The following methods describe :class:`Request`'s public interface, 477and so all may be overridden in subclasses. It also defines several 478public attributes that can be used by clients to inspect the parsed 479request. 480 481.. attribute:: Request.full_url 482 483 The original URL passed to the constructor. 484 485 .. versionchanged:: 3.4 486 487 Request.full_url is a property with setter, getter and a deleter. Getting 488 :attr:`~Request.full_url` returns the original request URL with the 489 fragment, if it was present. 490 491.. attribute:: Request.type 492 493 The URI scheme. 494 495.. attribute:: Request.host 496 497 The URI authority, typically a host, but may also contain a port 498 separated by a colon. 499 500.. attribute:: Request.origin_req_host 501 502 The original host for the request, without port. 503 504.. attribute:: Request.selector 505 506 The URI path. If the :class:`Request` uses a proxy, then selector 507 will be the full URL that is passed to the proxy. 508 509.. attribute:: Request.data 510 511 The entity body for the request, or ``None`` if not specified. 512 513 .. versionchanged:: 3.4 514 Changing value of :attr:`Request.data` now deletes "Content-Length" 515 header if it was previously set or calculated. 516 517.. attribute:: Request.unverifiable 518 519 boolean, indicates whether the request is unverifiable as defined 520 by :rfc:`2965`. 521 522.. attribute:: Request.method 523 524 The HTTP request method to use. By default its value is :const:`None`, 525 which means that :meth:`~Request.get_method` will do its normal computation 526 of the method to be used. Its value can be set (thus overriding the default 527 computation in :meth:`~Request.get_method`) either by providing a default 528 value by setting it at the class level in a :class:`Request` subclass, or by 529 passing a value in to the :class:`Request` constructor via the *method* 530 argument. 531 532 .. versionadded:: 3.3 533 534 .. versionchanged:: 3.4 535 A default value can now be set in subclasses; previously it could only 536 be set via the constructor argument. 537 538 539.. method:: Request.get_method() 540 541 Return a string indicating the HTTP request method. If 542 :attr:`Request.method` is not ``None``, return its value, otherwise return 543 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not. 544 This is only meaningful for HTTP requests. 545 546 .. versionchanged:: 3.3 547 get_method now looks at the value of :attr:`Request.method`. 548 549 550.. method:: Request.add_header(key, val) 551 552 Add another header to the request. Headers are currently ignored by all 553 handlers except HTTP handlers, where they are added to the list of headers sent 554 to the server. Note that there cannot be more than one header with the same 555 name, and later calls will overwrite previous calls in case the *key* collides. 556 Currently, this is no loss of HTTP functionality, since all headers which have 557 meaning when used more than once have a (header-specific) way of gaining the 558 same functionality using only one header. Note that headers added using 559 this method are also added to redirected requests. 560 561 562.. method:: Request.add_unredirected_header(key, header) 563 564 Add a header that will not be added to a redirected request. 565 566 567.. method:: Request.has_header(header) 568 569 Return whether the instance has the named header (checks both regular and 570 unredirected). 571 572 573.. method:: Request.remove_header(header) 574 575 Remove named header from the request instance (both from regular and 576 unredirected headers). 577 578 .. versionadded:: 3.4 579 580 581.. method:: Request.get_full_url() 582 583 Return the URL given in the constructor. 584 585 .. versionchanged:: 3.4 586 587 Returns :attr:`Request.full_url` 588 589 590.. method:: Request.set_proxy(host, type) 591 592 Prepare the request by connecting to a proxy server. The *host* and *type* will 593 replace those of the instance, and the instance's selector will be the original 594 URL given in the constructor. 595 596 597.. method:: Request.get_header(header_name, default=None) 598 599 Return the value of the given header. If the header is not present, return 600 the default value. 601 602 603.. method:: Request.header_items() 604 605 Return a list of tuples (header_name, header_value) of the Request headers. 606 607.. versionchanged:: 3.4 608 The request methods add_data, has_data, get_data, get_type, get_host, 609 get_selector, get_origin_req_host and is_unverifiable that were deprecated 610 since 3.3 have been removed. 611 612 613.. _opener-director-objects: 614 615OpenerDirector Objects 616---------------------- 617 618:class:`OpenerDirector` instances have the following methods: 619 620 621.. method:: OpenerDirector.add_handler(handler) 622 623 *handler* should be an instance of :class:`BaseHandler`. The following methods 624 are searched, and added to the possible chains (note that HTTP errors are a 625 special case). Note that, in the following, *protocol* should be replaced 626 with the actual protocol to handle, for example :meth:`http_response` would 627 be the HTTP protocol response handler. Also *type* should be replaced with 628 the actual HTTP code, for example :meth:`http_error_404` would handle HTTP 629 404 errors. 630 631 * :meth:`!<protocol>_open` --- signal that the handler knows how to open *protocol* 632 URLs. 633 634 See |protocol_open|_ for more information. 635 636 * :meth:`!http_error_\<type\>` --- signal that the handler knows how to handle HTTP 637 errors with HTTP error code *type*. 638 639 See |http_error_nnn|_ for more information. 640 641 * :meth:`!<protocol>_error` --- signal that the handler knows how to handle errors 642 from (non-\ ``http``) *protocol*. 643 644 * :meth:`!<protocol>_request` --- signal that the handler knows how to pre-process 645 *protocol* requests. 646 647 See |protocol_request|_ for more information. 648 649 * :meth:`!<protocol>_response` --- signal that the handler knows how to 650 post-process *protocol* responses. 651 652 See |protocol_response|_ for more information. 653 654.. |protocol_open| replace:: :meth:`BaseHandler.<protocol>_open` 655.. |http_error_nnn| replace:: :meth:`BaseHandler.http_error_\<nnn\>` 656.. |protocol_request| replace:: :meth:`BaseHandler.<protocol>_request` 657.. |protocol_response| replace:: :meth:`BaseHandler.<protocol>_response` 658 659.. method:: OpenerDirector.open(url, data=None[, timeout]) 660 661 Open the given *url* (which can be a request object or a string), optionally 662 passing the given *data*. Arguments, return values and exceptions raised are 663 the same as those of :func:`urlopen` (which simply calls the :meth:`open` 664 method on the currently installed global :class:`OpenerDirector`). The 665 optional *timeout* parameter specifies a timeout in seconds for blocking 666 operations like the connection attempt (if not specified, the global default 667 timeout setting will be used). The timeout feature actually works only for 668 HTTP, HTTPS and FTP connections. 669 670 671.. method:: OpenerDirector.error(proto, *args) 672 673 Handle an error of the given protocol. This will call the registered error 674 handlers for the given protocol with the given arguments (which are protocol 675 specific). The HTTP protocol is a special case which uses the HTTP response 676 code to determine the specific error handler; refer to the :meth:`!http_error_\<type\>` 677 methods of the handler classes. 678 679 Return values and exceptions raised are the same as those of :func:`urlopen`. 680 681OpenerDirector objects open URLs in three stages: 682 683The order in which these methods are called within each stage is determined by 684sorting the handler instances. 685 686#. Every handler with a method named like :meth:`!<protocol>_request` has that 687 method called to pre-process the request. 688 689#. Handlers with a method named like :meth:`!<protocol>_open` are called to handle 690 the request. This stage ends when a handler either returns a non-\ :const:`None` 691 value (ie. a response), or raises an exception (usually 692 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate. 693 694 In fact, the above algorithm is first tried for methods named 695 :meth:`~BaseHandler.default_open`. If all such methods return :const:`None`, the algorithm 696 is repeated for methods named like :meth:`!<protocol>_open`. If all such methods 697 return :const:`None`, the algorithm is repeated for methods named 698 :meth:`~BaseHandler.unknown_open`. 699 700 Note that the implementation of these methods may involve calls of the parent 701 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and 702 :meth:`~OpenerDirector.error` methods. 703 704#. Every handler with a method named like :meth:`!<protocol>_response` has that 705 method called to post-process the response. 706 707 708.. _base-handler-objects: 709 710BaseHandler Objects 711------------------- 712 713:class:`BaseHandler` objects provide a couple of methods that are directly 714useful, and others that are meant to be used by derived classes. These are 715intended for direct use: 716 717 718.. method:: BaseHandler.add_parent(director) 719 720 Add a director as parent. 721 722 723.. method:: BaseHandler.close() 724 725 Remove any parents. 726 727The following attribute and methods should only be used by classes derived from 728:class:`BaseHandler`. 729 730.. note:: 731 732 The convention has been adopted that subclasses defining 733 :meth:`!<protocol>_request` or :meth:`!<protocol>_response` methods are named 734 :class:`!\*Processor`; all others are named :class:`!\*Handler`. 735 736 737.. attribute:: BaseHandler.parent 738 739 A valid :class:`OpenerDirector`, which can be used to open using a different 740 protocol, or handle errors. 741 742 743.. method:: BaseHandler.default_open(req) 744 745 This method is *not* defined in :class:`BaseHandler`, but subclasses should 746 define it if they want to catch all URLs. 747 748 This method, if implemented, will be called by the parent 749 :class:`OpenerDirector`. It should return a file-like object as described in 750 the return value of the :meth:`~OpenerDirector.open` method of :class:`OpenerDirector`, or ``None``. 751 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional 752 thing happens (for example, :exc:`MemoryError` should not be mapped to 753 :exc:`~urllib.error.URLError`). 754 755 This method will be called before any protocol-specific open method. 756 757 758.. _protocol_open: 759.. method:: BaseHandler.<protocol>_open(req) 760 :noindex: 761 762 This method is *not* defined in :class:`BaseHandler`, but subclasses should 763 define it if they want to handle URLs with the given protocol. 764 765 This method, if defined, will be called by the parent :class:`OpenerDirector`. 766 Return values should be the same as for :meth:`~BaseHandler.default_open`. 767 768 769.. method:: BaseHandler.unknown_open(req) 770 771 This method is *not* defined in :class:`BaseHandler`, but subclasses should 772 define it if they want to catch all URLs with no specific registered handler to 773 open it. 774 775 This method, if implemented, will be called by the :attr:`parent` 776 :class:`OpenerDirector`. Return values should be the same as for 777 :meth:`default_open`. 778 779 780.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) 781 782 This method is *not* defined in :class:`BaseHandler`, but subclasses should 783 override it if they intend to provide a catch-all for otherwise unhandled HTTP 784 errors. It will be called automatically by the :class:`OpenerDirector` getting 785 the error, and should not normally be called in other circumstances. 786 787 *req* will be a :class:`Request` object, *fp* will be a file-like object with 788 the HTTP error body, *code* will be the three-digit code of the error, *msg* 789 will be the user-visible explanation of the code and *hdrs* will be a mapping 790 object with the headers of the error. 791 792 Return values and exceptions raised should be the same as those of 793 :func:`urlopen`. 794 795 796.. _http_error_nnn: 797.. method:: BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs) 798 799 *nnn* should be a three-digit HTTP error code. This method is also not defined 800 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a 801 subclass, when an HTTP error with code *nnn* occurs. 802 803 Subclasses should override this method to handle specific HTTP errors. 804 805 Arguments, return values and exceptions raised should be the same as for 806 :meth:`~BaseHandler.http_error_default`. 807 808 809.. _protocol_request: 810.. method:: BaseHandler.<protocol>_request(req) 811 :noindex: 812 813 This method is *not* defined in :class:`BaseHandler`, but subclasses should 814 define it if they want to pre-process requests of the given protocol. 815 816 This method, if defined, will be called by the parent :class:`OpenerDirector`. 817 *req* will be a :class:`Request` object. The return value should be a 818 :class:`Request` object. 819 820 821.. _protocol_response: 822.. method:: BaseHandler.<protocol>_response(req, response) 823 :noindex: 824 825 This method is *not* defined in :class:`BaseHandler`, but subclasses should 826 define it if they want to post-process responses of the given protocol. 827 828 This method, if defined, will be called by the parent :class:`OpenerDirector`. 829 *req* will be a :class:`Request` object. *response* will be an object 830 implementing the same interface as the return value of :func:`urlopen`. The 831 return value should implement the same interface as the return value of 832 :func:`urlopen`. 833 834 835.. _http-redirect-handler: 836 837HTTPRedirectHandler Objects 838--------------------------- 839 840.. note:: 841 842 Some HTTP redirections require action from this module's client code. If this 843 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for 844 details of the precise meanings of the various redirection codes. 845 846 An :exc:`~urllib.error.HTTPError` exception raised as a security consideration if the 847 HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, 848 HTTPS or FTP URL. 849 850 851.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) 852 853 Return a :class:`Request` or ``None`` in response to a redirect. This is called 854 by the default implementations of the :meth:`!http_error_30\*` methods when a 855 redirection is received from the server. If a redirection should take place, 856 return a new :class:`Request` to allow :meth:`!http_error_30\*` to perform the 857 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if 858 no other handler should try to handle this URL, or return ``None`` if you 859 can't but another handler might. 860 861 .. note:: 862 863 The default implementation of this method does not strictly follow :rfc:`2616`, 864 which says that 301 and 302 responses to ``POST`` requests must not be 865 automatically redirected without confirmation by the user. In reality, browsers 866 do allow automatic redirection of these responses, changing the POST to a 867 ``GET``, and the default implementation reproduces this behavior. 868 869 870.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) 871 872 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the 873 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. 874 875 876.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) 877 878 The same as :meth:`http_error_301`, but called for the 'found' response. 879 880 881.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) 882 883 The same as :meth:`http_error_301`, but called for the 'see other' response. 884 885 886.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) 887 888 The same as :meth:`http_error_301`, but called for the 'temporary redirect' 889 response. It does not allow changing the request method from ``POST`` 890 to ``GET``. 891 892 893.. method:: HTTPRedirectHandler.http_error_308(req, fp, code, msg, hdrs) 894 895 The same as :meth:`http_error_301`, but called for the 'permanent redirect' 896 response. It does not allow changing the request method from ``POST`` 897 to ``GET``. 898 899 .. versionadded:: 3.11 900 901 902.. _http-cookie-processor: 903 904HTTPCookieProcessor Objects 905--------------------------- 906 907:class:`HTTPCookieProcessor` instances have one attribute: 908 909.. attribute:: HTTPCookieProcessor.cookiejar 910 911 The :class:`http.cookiejar.CookieJar` in which cookies are stored. 912 913 914.. _proxy-handler: 915 916ProxyHandler Objects 917-------------------- 918 919 920.. method:: ProxyHandler.<protocol>_open(request) 921 :noindex: 922 923 The :class:`ProxyHandler` will have a method :meth:`!<protocol>_open` for every 924 *protocol* which has a proxy in the *proxies* dictionary given in the 925 constructor. The method will modify requests to go through the proxy, by 926 calling ``request.set_proxy()``, and call the next handler in the chain to 927 actually execute the protocol. 928 929 930.. _http-password-mgr: 931 932HTTPPasswordMgr Objects 933----------------------- 934 935These methods are available on :class:`HTTPPasswordMgr` and 936:class:`HTTPPasswordMgrWithDefaultRealm` objects. 937 938 939.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) 940 941 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and 942 *passwd* must be strings. This causes ``(user, passwd)`` to be used as 943 authentication tokens when authentication for *realm* and a super-URI of any of 944 the given URIs is given. 945 946 947.. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 948 949 Get user/password for given realm and URI, if any. This method will return 950 ``(None, None)`` if there is no matching user/password. 951 952 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be 953 searched if the given *realm* has no matching user/password. 954 955 956.. _http-password-mgr-with-prior-auth: 957 958HTTPPasswordMgrWithPriorAuth Objects 959------------------------------------ 960 961This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support 962tracking URIs for which authentication credentials should always be sent. 963 964 965.. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \ 966 passwd, is_authenticated=False) 967 968 *realm*, *uri*, *user*, *passwd* are as for 969 :meth:`HTTPPasswordMgr.add_password`. *is_authenticated* sets the initial 970 value of the ``is_authenticated`` flag for the given URI or list of URIs. 971 If *is_authenticated* is specified as ``True``, *realm* is ignored. 972 973 974.. method:: HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri) 975 976 Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects 977 978 979.. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \ 980 is_authenticated=False) 981 982 Update the ``is_authenticated`` flag for the given *uri* or list 983 of URIs. 984 985 986.. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri) 987 988 Returns the current state of the ``is_authenticated`` flag for 989 the given URI. 990 991 992.. _abstract-basic-auth-handler: 993 994AbstractBasicAuthHandler Objects 995-------------------------------- 996 997 998.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 999 1000 Handle an authentication request by getting a user/password pair, and re-trying 1001 the request. *authreq* should be the name of the header where the information 1002 about the realm is included in the request, *host* specifies the URL and path to 1003 authenticate for, *req* should be the (failed) :class:`Request` object, and 1004 *headers* should be the error headers. 1005 1006 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an 1007 authority component (e.g. ``"http://python.org/"``). In either case, the 1008 authority must not contain a userinfo component (so, ``"python.org"`` and 1009 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not). 1010 1011 1012.. _http-basic-auth-handler: 1013 1014HTTPBasicAuthHandler Objects 1015---------------------------- 1016 1017 1018.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1019 1020 Retry the request with authentication information, if available. 1021 1022 1023.. _proxy-basic-auth-handler: 1024 1025ProxyBasicAuthHandler Objects 1026----------------------------- 1027 1028 1029.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1030 1031 Retry the request with authentication information, if available. 1032 1033 1034.. _abstract-digest-auth-handler: 1035 1036AbstractDigestAuthHandler Objects 1037--------------------------------- 1038 1039 1040.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 1041 1042 *authreq* should be the name of the header where the information about the realm 1043 is included in the request, *host* should be the host to authenticate to, *req* 1044 should be the (failed) :class:`Request` object, and *headers* should be the 1045 error headers. 1046 1047 1048.. _http-digest-auth-handler: 1049 1050HTTPDigestAuthHandler Objects 1051----------------------------- 1052 1053 1054.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1055 1056 Retry the request with authentication information, if available. 1057 1058 1059.. _proxy-digest-auth-handler: 1060 1061ProxyDigestAuthHandler Objects 1062------------------------------ 1063 1064 1065.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1066 1067 Retry the request with authentication information, if available. 1068 1069 1070.. _http-handler-objects: 1071 1072HTTPHandler Objects 1073------------------- 1074 1075 1076.. method:: HTTPHandler.http_open(req) 1077 1078 Send an HTTP request, which can be either GET or POST, depending on 1079 ``req.has_data()``. 1080 1081 1082.. _https-handler-objects: 1083 1084HTTPSHandler Objects 1085-------------------- 1086 1087 1088.. method:: HTTPSHandler.https_open(req) 1089 1090 Send an HTTPS request, which can be either GET or POST, depending on 1091 ``req.has_data()``. 1092 1093 1094.. _file-handler-objects: 1095 1096FileHandler Objects 1097------------------- 1098 1099 1100.. method:: FileHandler.file_open(req) 1101 1102 Open the file locally, if there is no host name, or the host name is 1103 ``'localhost'``. 1104 1105 .. versionchanged:: 3.2 1106 This method is applicable only for local hostnames. When a remote 1107 hostname is given, a :exc:`~urllib.error.URLError` is raised. 1108 1109 1110.. _data-handler-objects: 1111 1112DataHandler Objects 1113------------------- 1114 1115.. method:: DataHandler.data_open(req) 1116 1117 Read a data URL. This kind of URL contains the content encoded in the URL 1118 itself. The data URL syntax is specified in :rfc:`2397`. This implementation 1119 ignores white spaces in base64 encoded data URLs so the URL may be wrapped 1120 in whatever source file it comes from. But even though some browsers don't 1121 mind about a missing padding at the end of a base64 encoded data URL, this 1122 implementation will raise a :exc:`ValueError` in that case. 1123 1124 1125.. _ftp-handler-objects: 1126 1127FTPHandler Objects 1128------------------ 1129 1130 1131.. method:: FTPHandler.ftp_open(req) 1132 1133 Open the FTP file indicated by *req*. The login is always done with empty 1134 username and password. 1135 1136 1137.. _cacheftp-handler-objects: 1138 1139CacheFTPHandler Objects 1140----------------------- 1141 1142:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the 1143following additional methods: 1144 1145 1146.. method:: CacheFTPHandler.setTimeout(t) 1147 1148 Set timeout of connections to *t* seconds. 1149 1150 1151.. method:: CacheFTPHandler.setMaxConns(m) 1152 1153 Set maximum number of cached connections to *m*. 1154 1155 1156.. _unknown-handler-objects: 1157 1158UnknownHandler Objects 1159---------------------- 1160 1161 1162.. method:: UnknownHandler.unknown_open() 1163 1164 Raise a :exc:`~urllib.error.URLError` exception. 1165 1166 1167.. _http-error-processor-objects: 1168 1169HTTPErrorProcessor Objects 1170-------------------------- 1171 1172.. method:: HTTPErrorProcessor.http_response(request, response) 1173 1174 Process HTTP error responses. 1175 1176 For 200 error codes, the response object is returned immediately. 1177 1178 For non-200 error codes, this simply passes the job on to the 1179 :meth:`!http_error_\<type\>` handler methods, via :meth:`OpenerDirector.error`. 1180 Eventually, :class:`HTTPDefaultErrorHandler` will raise an 1181 :exc:`~urllib.error.HTTPError` if no other handler handles the error. 1182 1183 1184.. method:: HTTPErrorProcessor.https_response(request, response) 1185 1186 Process HTTPS error responses. 1187 1188 The behavior is same as :meth:`http_response`. 1189 1190 1191.. _urllib-request-examples: 1192 1193Examples 1194-------- 1195 1196In addition to the examples below, more examples are given in 1197:ref:`urllib-howto`. 1198 1199This example gets the python.org main page and displays the first 300 bytes of 1200it. :: 1201 1202 >>> import urllib.request 1203 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1204 ... print(f.read(300)) 1205 ... 1206 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1207 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html 1208 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n 1209 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n 1210 <title>Python Programming ' 1211 1212Note that urlopen returns a bytes object. This is because there is no way 1213for urlopen to automatically determine the encoding of the byte stream 1214it receives from the HTTP server. In general, a program will decode 1215the returned bytes object to string once it determines or guesses 1216the appropriate encoding. 1217 1218The following W3C document, https://www.w3.org/International/O-charset\ , lists 1219the various ways in which an (X)HTML or an XML document could have specified its 1220encoding information. 1221 1222As the python.org website uses *utf-8* encoding as specified in its meta tag, we 1223will use the same for decoding the bytes object. :: 1224 1225 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1226 ... print(f.read(100).decode('utf-8')) 1227 ... 1228 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1229 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1230 1231It is also possible to achieve the same result without using the 1232:term:`context manager` approach. :: 1233 1234 >>> import urllib.request 1235 >>> f = urllib.request.urlopen('http://www.python.org/') 1236 >>> print(f.read(100).decode('utf-8')) 1237 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1238 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1239 1240In the following example, we are sending a data-stream to the stdin of a CGI 1241and reading the data it returns to us. Note that this example will only work 1242when the Python installation supports SSL. :: 1243 1244 >>> import urllib.request 1245 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', 1246 ... data=b'This data is passed to stdin of the CGI') 1247 >>> with urllib.request.urlopen(req) as f: 1248 ... print(f.read().decode('utf-8')) 1249 ... 1250 Got Data: "This data is passed to stdin of the CGI" 1251 1252The code for the sample CGI used in the above example is:: 1253 1254 #!/usr/bin/env python 1255 import sys 1256 data = sys.stdin.read() 1257 print('Content-type: text/plain\n\nGot Data: "%s"' % data) 1258 1259Here is an example of doing a ``PUT`` request using :class:`Request`:: 1260 1261 import urllib.request 1262 DATA = b'some data' 1263 req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT') 1264 with urllib.request.urlopen(req) as f: 1265 pass 1266 print(f.status) 1267 print(f.reason) 1268 1269Use of Basic HTTP Authentication:: 1270 1271 import urllib.request 1272 # Create an OpenerDirector with support for Basic HTTP Authentication... 1273 auth_handler = urllib.request.HTTPBasicAuthHandler() 1274 auth_handler.add_password(realm='PDQ Application', 1275 uri='https://mahler:8092/site-updates.py', 1276 user='klem', 1277 passwd='kadidd!ehopper') 1278 opener = urllib.request.build_opener(auth_handler) 1279 # ...and install it globally so it can be used with urlopen. 1280 urllib.request.install_opener(opener) 1281 urllib.request.urlopen('http://www.example.com/login.html') 1282 1283:func:`build_opener` provides many handlers by default, including a 1284:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment 1285variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme 1286involved. For example, the :envvar:`!http_proxy` environment variable is read to 1287obtain the HTTP proxy's URL. 1288 1289This example replaces the default :class:`ProxyHandler` with one that uses 1290programmatically supplied proxy URLs, and adds proxy authorization support with 1291:class:`ProxyBasicAuthHandler`. :: 1292 1293 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) 1294 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() 1295 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') 1296 1297 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) 1298 # This time, rather than install the OpenerDirector, we use it directly: 1299 opener.open('http://www.example.com/login.html') 1300 1301Adding HTTP headers: 1302 1303Use the *headers* argument to the :class:`Request` constructor, or:: 1304 1305 import urllib.request 1306 req = urllib.request.Request('http://www.example.com/') 1307 req.add_header('Referer', 'http://www.python.org/') 1308 # Customize the default User-Agent header value: 1309 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') 1310 r = urllib.request.urlopen(req) 1311 1312:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to 1313every :class:`Request`. To change this:: 1314 1315 import urllib.request 1316 opener = urllib.request.build_opener() 1317 opener.addheaders = [('User-agent', 'Mozilla/5.0')] 1318 opener.open('http://www.example.com/') 1319 1320Also, remember that a few standard headers (:mailheader:`Content-Length`, 1321:mailheader:`Content-Type` and :mailheader:`Host`) 1322are added when the :class:`Request` is passed to :func:`urlopen` (or 1323:meth:`OpenerDirector.open`). 1324 1325.. _urllib-examples: 1326 1327Here is an example session that uses the ``GET`` method to retrieve a URL 1328containing parameters:: 1329 1330 >>> import urllib.request 1331 >>> import urllib.parse 1332 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1333 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params 1334 >>> with urllib.request.urlopen(url) as f: 1335 ... print(f.read().decode('utf-8')) 1336 ... 1337 1338The following example uses the ``POST`` method instead. Note that params output 1339from urlencode is encoded to bytes before it is sent to urlopen as data:: 1340 1341 >>> import urllib.request 1342 >>> import urllib.parse 1343 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1344 >>> data = data.encode('ascii') 1345 >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: 1346 ... print(f.read().decode('utf-8')) 1347 ... 1348 1349The following example uses an explicitly specified HTTP proxy, overriding 1350environment settings:: 1351 1352 >>> import urllib.request 1353 >>> proxies = {'http': 'http://proxy.example.com:8080/'} 1354 >>> opener = urllib.request.FancyURLopener(proxies) 1355 >>> with opener.open("http://www.python.org") as f: 1356 ... f.read().decode('utf-8') 1357 ... 1358 1359The following example uses no proxies at all, overriding environment settings:: 1360 1361 >>> import urllib.request 1362 >>> opener = urllib.request.FancyURLopener({}) 1363 >>> with opener.open("http://www.python.org/") as f: 1364 ... f.read().decode('utf-8') 1365 ... 1366 1367 1368Legacy interface 1369---------------- 1370 1371The following functions and classes are ported from the Python 2 module 1372``urllib`` (as opposed to ``urllib2``). They might become deprecated at 1373some point in the future. 1374 1375.. function:: urlretrieve(url, filename=None, reporthook=None, data=None) 1376 1377 Copy a network object denoted by a URL to a local file. If the URL 1378 points to a local file, the object will not be copied unless filename is supplied. 1379 Return a tuple ``(filename, headers)`` where *filename* is the 1380 local file name under which the object can be found, and *headers* is whatever 1381 the :meth:`!info` method of the object returned by :func:`urlopen` returned (for 1382 a remote object). Exceptions are the same as for :func:`urlopen`. 1383 1384 The second argument, if present, specifies the file location to copy to (if 1385 absent, the location will be a tempfile with a generated name). The third 1386 argument, if present, is a callable that will be called once on 1387 establishment of the network connection and once after each block read 1388 thereafter. The callable will be passed three arguments; a count of blocks 1389 transferred so far, a block size in bytes, and the total size of the file. The 1390 third argument may be ``-1`` on older FTP servers which do not return a file 1391 size in response to a retrieval request. 1392 1393 The following example illustrates the most common usage scenario:: 1394 1395 >>> import urllib.request 1396 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') 1397 >>> html = open(local_filename) 1398 >>> html.close() 1399 1400 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1401 argument may be given to specify a ``POST`` request (normally the request 1402 type is ``GET``). The *data* argument must be a bytes object in standard 1403 :mimetype:`application/x-www-form-urlencoded` format; see the 1404 :func:`urllib.parse.urlencode` function. 1405 1406 :func:`urlretrieve` will raise :exc:`~urllib.error.ContentTooShortError` when it detects that 1407 the amount of data available was less than the expected amount (which is the 1408 size reported by a *Content-Length* header). This can occur, for example, when 1409 the download is interrupted. 1410 1411 The *Content-Length* is treated as a lower bound: if there's more data to read, 1412 urlretrieve reads more data, but if less data is available, it raises the 1413 exception. 1414 1415 You can still retrieve the downloaded data in this case, it is stored in the 1416 :attr:`!content` attribute of the exception instance. 1417 1418 If no *Content-Length* header was supplied, urlretrieve can not check the size 1419 of the data it has downloaded, and just returns it. In this case you just have 1420 to assume that the download was successful. 1421 1422.. function:: urlcleanup() 1423 1424 Cleans up temporary files that may have been left behind by previous 1425 calls to :func:`urlretrieve`. 1426 1427.. class:: URLopener(proxies=None, **x509) 1428 1429 .. deprecated:: 3.3 1430 1431 Base class for opening and reading URLs. Unless you need to support opening 1432 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`, 1433 you probably want to use :class:`FancyURLopener`. 1434 1435 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header 1436 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number. 1437 Applications can define their own :mailheader:`User-Agent` header by subclassing 1438 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute 1439 :attr:`version` to an appropriate string value in the subclass definition. 1440 1441 The optional *proxies* parameter should be a dictionary mapping scheme names to 1442 proxy URLs, where an empty dictionary turns proxies off completely. Its default 1443 value is ``None``, in which case environmental proxy settings will be used if 1444 present, as discussed in the definition of :func:`urlopen`, above. 1445 1446 Additional keyword parameters, collected in *x509*, may be used for 1447 authentication of the client when using the :file:`https:` scheme. The keywords 1448 *key_file* and *cert_file* are supported to provide an SSL key and certificate; 1449 both are needed to support client authentication. 1450 1451 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server 1452 returns an error code. 1453 1454 .. method:: open(fullurl, data=None) 1455 1456 Open *fullurl* using the appropriate protocol. This method sets up cache and 1457 proxy information, then calls the appropriate open method with its input 1458 arguments. If the scheme is not recognized, :meth:`open_unknown` is called. 1459 The *data* argument has the same meaning as the *data* argument of 1460 :func:`urlopen`. 1461 1462 This method always quotes *fullurl* using :func:`~urllib.parse.quote`. 1463 1464 .. method:: open_unknown(fullurl, data=None) 1465 1466 Overridable interface to open unknown URL types. 1467 1468 1469 .. method:: retrieve(url, filename=None, reporthook=None, data=None) 1470 1471 Retrieves the contents of *url* and places it in *filename*. The return value 1472 is a tuple consisting of a local filename and either an 1473 :class:`email.message.Message` object containing the response headers (for remote 1474 URLs) or ``None`` (for local URLs). The caller must then open and read the 1475 contents of *filename*. If *filename* is not given and the URL refers to a 1476 local file, the input filename is returned. If the URL is non-local and 1477 *filename* is not given, the filename is the output of :func:`tempfile.mktemp` 1478 with a suffix that matches the suffix of the last path component of the input 1479 URL. If *reporthook* is given, it must be a function accepting three numeric 1480 parameters: A chunk number, the maximum size chunks are read in and the total size of the download 1481 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the 1482 network. *reporthook* is ignored for local URLs. 1483 1484 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1485 argument may be given to specify a ``POST`` request (normally the request type 1486 is ``GET``). The *data* argument must in standard 1487 :mimetype:`application/x-www-form-urlencoded` format; see the 1488 :func:`urllib.parse.urlencode` function. 1489 1490 1491 .. attribute:: version 1492 1493 Variable that specifies the user agent of the opener object. To get 1494 :mod:`urllib` to tell servers that it is a particular user agent, set this in a 1495 subclass as a class variable or in the constructor before calling the base 1496 constructor. 1497 1498 1499.. class:: FancyURLopener(...) 1500 1501 .. deprecated:: 3.3 1502 1503 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling 1504 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x 1505 response codes listed above, the :mailheader:`Location` header is used to fetch 1506 the actual URL. For 401 response codes (authentication required), basic HTTP 1507 authentication is performed. For the 30x response codes, recursion is bounded 1508 by the value of the *maxtries* attribute, which defaults to 10. 1509 1510 For all other response codes, the method :meth:`~BaseHandler.http_error_default` is called 1511 which you can override in subclasses to handle the error appropriately. 1512 1513 .. note:: 1514 1515 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests 1516 must not be automatically redirected without confirmation by the user. In 1517 reality, browsers do allow automatic redirection of these responses, changing 1518 the POST to a GET, and :mod:`urllib` reproduces this behaviour. 1519 1520 The parameters to the constructor are the same as those for :class:`URLopener`. 1521 1522 .. note:: 1523 1524 When performing basic authentication, a :class:`FancyURLopener` instance calls 1525 its :meth:`prompt_user_passwd` method. The default implementation asks the 1526 users for the required information on the controlling terminal. A subclass may 1527 override this method to support more appropriate behavior if needed. 1528 1529 The :class:`FancyURLopener` class offers one additional method that should be 1530 overloaded to provide the appropriate behavior: 1531 1532 .. method:: prompt_user_passwd(host, realm) 1533 1534 Return information needed to authenticate the user at the given host in the 1535 specified security realm. The return value should be a tuple, ``(user, 1536 password)``, which can be used for basic authentication. 1537 1538 The implementation prompts for this information on the terminal; an application 1539 should override this method to use an appropriate interaction model in the local 1540 environment. 1541 1542 1543:mod:`urllib.request` Restrictions 1544---------------------------------- 1545 1546.. index:: 1547 pair: HTTP; protocol 1548 pair: FTP; protocol 1549 1550* Currently, only the following protocols are supported: HTTP (versions 0.9 and 1551 1.0), FTP, local files, and data URLs. 1552 1553 .. versionchanged:: 3.4 Added support for data URLs. 1554 1555* The caching feature of :func:`urlretrieve` has been disabled until someone 1556 finds the time to hack proper processing of Expiration time headers. 1557 1558* There should be a function to query whether a particular URL is in the cache. 1559 1560* For backward compatibility, if a URL appears to point to a local file but the 1561 file can't be opened, the URL is re-interpreted using the FTP protocol. This 1562 can sometimes cause confusing error messages. 1563 1564* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily 1565 long delays while waiting for a network connection to be set up. This means 1566 that it is difficult to build an interactive web client using these functions 1567 without using threads. 1568 1569 .. index:: 1570 single: HTML 1571 pair: HTTP; protocol 1572 1573* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data 1574 returned by the server. This may be binary data (such as an image), plain text 1575 or (for example) HTML. The HTTP protocol provides type information in the reply 1576 header, which can be inspected by looking at the :mailheader:`Content-Type` 1577 header. If the returned data is HTML, you can use the module 1578 :mod:`html.parser` to parse it. 1579 1580 .. index:: single: FTP 1581 1582* The code handling the FTP protocol cannot differentiate between a file and a 1583 directory. This can lead to unexpected behavior when attempting to read a URL 1584 that points to a file that is not accessible. If the URL ends in a ``/``, it is 1585 assumed to refer to a directory and will be handled accordingly. But if an 1586 attempt to read a file leads to a 550 error (meaning the URL cannot be found or 1587 is not accessible, often for permission reasons), then the path is treated as a 1588 directory in order to handle the case when a directory is specified by a URL but 1589 the trailing ``/`` has been left off. This can cause misleading results when 1590 you try to fetch a file whose read permissions make it inaccessible; the FTP 1591 code will try to read it, fail with a 550 error, and then perform a directory 1592 listing for the unreadable file. If fine-grained control is needed, consider 1593 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing 1594 *_urlopener* to meet your needs. 1595 1596 1597 1598:mod:`urllib.response` --- Response classes used by urllib 1599========================================================== 1600 1601.. module:: urllib.response 1602 :synopsis: Response classes used by urllib. 1603 1604The :mod:`urllib.response` module defines functions and classes which define a 1605minimal file-like interface, including ``read()`` and ``readline()``. 1606Functions defined by this module are used internally by the :mod:`urllib.request` module. 1607The typical response object is a :class:`urllib.response.addinfourl` instance: 1608 1609.. class:: addinfourl 1610 1611 .. attribute:: url 1612 1613 URL of the resource retrieved, commonly used to determine if a redirect was followed. 1614 1615 .. attribute:: headers 1616 1617 Returns the headers of the response in the form of an :class:`~email.message.EmailMessage` instance. 1618 1619 .. attribute:: status 1620 1621 .. versionadded:: 3.9 1622 1623 Status code returned by server. 1624 1625 .. method:: geturl() 1626 1627 .. deprecated:: 3.9 1628 Deprecated in favor of :attr:`~addinfourl.url`. 1629 1630 .. method:: info() 1631 1632 .. deprecated:: 3.9 1633 Deprecated in favor of :attr:`~addinfourl.headers`. 1634 1635 .. attribute:: code 1636 1637 .. deprecated:: 3.9 1638 Deprecated in favor of :attr:`~addinfourl.status`. 1639 1640 .. method:: getcode() 1641 1642 .. deprecated:: 3.9 1643 Deprecated in favor of :attr:`~addinfourl.status`. 1644