1:mod:`urllib.request` --- Extensible library for opening URLs 2============================================================= 3 4.. module:: urllib.request 5 :synopsis: Extensible library for opening URLs. 6 7.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu> 8.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> 9.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com> 10 11**Source code:** :source:`Lib/urllib/request.py` 12 13-------------- 14 15The :mod:`urllib.request` module defines functions and classes which help in 16opening URLs (mostly HTTP) in a complex world --- basic and digest 17authentication, redirections, cookies and more. 18 19.. seealso:: 20 21 The `Requests package <https://requests.readthedocs.io/en/master/>`_ 22 is recommended for a higher-level HTTP client interface. 23 24 25The :mod:`urllib.request` module defines the following functions: 26 27 28.. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None) 29 30 Open the URL *url*, which can be either a string or a 31 :class:`Request` object. 32 33 *data* must be an object specifying additional data to be sent to the 34 server, or ``None`` if no such data is needed. See :class:`Request` 35 for details. 36 37 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header 38 in its HTTP requests. 39 40 The optional *timeout* parameter specifies a timeout in seconds for 41 blocking operations like the connection attempt (if not specified, 42 the global default timeout setting will be used). This actually 43 only works for HTTP, HTTPS and FTP connections. 44 45 If *context* is specified, it must be a :class:`ssl.SSLContext` instance 46 describing the various SSL options. See :class:`~http.client.HTTPSConnection` 47 for more details. 48 49 The optional *cafile* and *capath* parameters specify a set of trusted 50 CA certificates for HTTPS requests. *cafile* should point to a single 51 file containing a bundle of CA certificates, whereas *capath* should 52 point to a directory of hashed certificate files. More information can 53 be found in :meth:`ssl.SSLContext.load_verify_locations`. 54 55 The *cadefault* parameter is ignored. 56 57 This function always returns an object which can work as a 58 :term:`context manager` and has the properties *url*, *headers*, and *status*. 59 See :class:`urllib.response.addinfourl` for more detail on these properties. 60 61 For HTTP and HTTPS URLs, this function returns a 62 :class:`http.client.HTTPResponse` object slightly modified. In addition 63 to the three new methods above, the msg attribute contains the 64 same information as the :attr:`~http.client.HTTPResponse.reason` 65 attribute --- the reason phrase returned by server --- instead of 66 the response headers as it is specified in the documentation for 67 :class:`~http.client.HTTPResponse`. 68 69 For FTP, file, and data URLs and requests explicitly handled by legacy 70 :class:`URLopener` and :class:`FancyURLopener` classes, this function 71 returns a :class:`urllib.response.addinfourl` object. 72 73 Raises :exc:`~urllib.error.URLError` on protocol errors. 74 75 Note that ``None`` may be returned if no handler handles the request (though 76 the default installed global :class:`OpenerDirector` uses 77 :class:`UnknownHandler` to ensure this never happens). 78 79 In addition, if proxy settings are detected (for example, when a ``*_proxy`` 80 environment variable like :envvar:`http_proxy` is set), 81 :class:`ProxyHandler` is default installed and makes sure the requests are 82 handled through the proxy. 83 84 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been 85 discontinued; :func:`urllib.request.urlopen` corresponds to the old 86 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary 87 parameter to ``urllib.urlopen``, can be obtained by using 88 :class:`ProxyHandler` objects. 89 90 .. audit-event:: urllib.Request fullurl,data,headers,method urllib.request.urlopen 91 92 The default opener raises an :ref:`auditing event <auditing>` 93 ``urllib.Request`` with arguments ``fullurl``, ``data``, ``headers``, 94 ``method`` taken from the request object. 95 96 .. versionchanged:: 3.2 97 *cafile* and *capath* were added. 98 99 .. versionchanged:: 3.2 100 HTTPS virtual hosts are now supported if possible (that is, if 101 :data:`ssl.HAS_SNI` is true). 102 103 .. versionadded:: 3.2 104 *data* can be an iterable object. 105 106 .. versionchanged:: 3.3 107 *cadefault* was added. 108 109 .. versionchanged:: 3.4.3 110 *context* was added. 111 112 .. versionchanged:: 3.10 113 HTTPS connection now send an ALPN extension with protocol indicator 114 ``http/1.1`` when no *context* is given. Custom *context* should set 115 ALPN protocols with :meth:`~ssl.SSLContext.set_alpn_protocol`. 116 117 .. deprecated:: 3.6 118 119 *cafile*, *capath* and *cadefault* are deprecated in favor of *context*. 120 Please use :meth:`ssl.SSLContext.load_cert_chain` instead, or let 121 :func:`ssl.create_default_context` select the system's trusted CA 122 certificates for you. 123 124 125.. function:: install_opener(opener) 126 127 Install an :class:`OpenerDirector` instance as the default global opener. 128 Installing an opener is only necessary if you want urlopen to use that 129 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of 130 :func:`~urllib.request.urlopen`. The code does not check for a real 131 :class:`OpenerDirector`, and any class with the appropriate interface will 132 work. 133 134 135.. function:: build_opener([handler, ...]) 136 137 Return an :class:`OpenerDirector` instance, which chains the handlers in the 138 order given. *handler*\s can be either instances of :class:`BaseHandler`, or 139 subclasses of :class:`BaseHandler` (in which case it must be possible to call 140 the constructor without any parameters). Instances of the following classes 141 will be in front of the *handler*\s, unless the *handler*\s contain them, 142 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy 143 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`, 144 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`, 145 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`. 146 147 If the Python installation has SSL support (i.e., if the :mod:`ssl` module 148 can be imported), :class:`HTTPSHandler` will also be added. 149 150 A :class:`BaseHandler` subclass may also change its :attr:`handler_order` 151 attribute to modify its position in the handlers list. 152 153 154.. function:: pathname2url(path) 155 156 Convert the pathname *path* from the local syntax for a path to the form used in 157 the path component of a URL. This does not produce a complete URL. The return 158 value will already be quoted using the :func:`~urllib.parse.quote` function. 159 160 161.. function:: url2pathname(path) 162 163 Convert the path component *path* from a percent-encoded URL to the local syntax for a 164 path. This does not accept a complete URL. This function uses 165 :func:`~urllib.parse.unquote` to decode *path*. 166 167.. function:: getproxies() 168 169 This helper function returns a dictionary of scheme to proxy server URL 170 mappings. It scans the environment for variables named ``<scheme>_proxy``, 171 in a case insensitive approach, for all operating systems first, and when it 172 cannot find it, looks for proxy information from System 173 Configuration for macOS and Windows Systems Registry for Windows. 174 If both lowercase and uppercase environment variables exist (and disagree), 175 lowercase is preferred. 176 177 .. note:: 178 179 If the environment variable ``REQUEST_METHOD`` is set, which usually 180 indicates your script is running in a CGI environment, the environment 181 variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is 182 because that variable can be injected by a client using the "Proxy:" HTTP 183 header. If you need to use an HTTP proxy in a CGI environment, either use 184 ``ProxyHandler`` explicitly, or make sure the variable name is in 185 lowercase (or at least the ``_proxy`` suffix). 186 187 188The following classes are provided: 189 190.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) 191 192 This class is an abstraction of a URL request. 193 194 *url* should be a string containing a valid URL. 195 196 *data* must be an object specifying additional data to send to the 197 server, or ``None`` if no such data is needed. Currently HTTP 198 requests are the only ones that use *data*. The supported object 199 types include bytes, file-like objects, and iterables of bytes-like objects. 200 If no ``Content-Length`` nor ``Transfer-Encoding`` header field 201 has been provided, :class:`HTTPHandler` will set these headers according 202 to the type of *data*. ``Content-Length`` will be used to send 203 bytes objects, while ``Transfer-Encoding: chunked`` as specified in 204 :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables. 205 206 For an HTTP POST request method, *data* should be a buffer in the 207 standard :mimetype:`application/x-www-form-urlencoded` format. The 208 :func:`urllib.parse.urlencode` function takes a mapping or sequence 209 of 2-tuples and returns an ASCII string in this format. It should 210 be encoded to bytes before being used as the *data* parameter. 211 212 *headers* should be a dictionary, and will be treated as if 213 :meth:`add_header` was called with each key and value as arguments. 214 This is often used to "spoof" the ``User-Agent`` header value, which is 215 used by a browser to identify itself -- some HTTP servers only 216 allow requests coming from common browsers as opposed to scripts. 217 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 218 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while 219 :mod:`urllib`'s default user agent string is 220 ``"Python-urllib/2.6"`` (on Python 2.6). 221 222 An appropriate ``Content-Type`` header should be included if the *data* 223 argument is present. If this header has not been provided and *data* 224 is not None, ``Content-Type: application/x-www-form-urlencoded`` will 225 be added as a default. 226 227 The next two arguments are only of interest for correct handling 228 of third-party HTTP cookies: 229 230 *origin_req_host* should be the request-host of the origin 231 transaction, as defined by :rfc:`2965`. It defaults to 232 ``http.cookiejar.request_host(self)``. This is the host name or IP 233 address of the original request that was initiated by the user. 234 For example, if the request is for an image in an HTML document, 235 this should be the request-host of the request for the page 236 containing the image. 237 238 *unverifiable* should indicate whether the request is unverifiable, 239 as defined by :rfc:`2965`. It defaults to ``False``. An unverifiable 240 request is one whose URL the user did not have the option to 241 approve. For example, if the request is for an image in an HTML 242 document, and the user had no option to approve the automatic 243 fetching of the image, this should be true. 244 245 *method* should be a string that indicates the HTTP request method that 246 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the 247 :attr:`~Request.method` attribute and is used by :meth:`get_method()`. 248 The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. 249 Subclasses may indicate a different default method by setting the 250 :attr:`~Request.method` attribute in the class itself. 251 252 .. note:: 253 The request will not work as expected if the data object is unable 254 to deliver its content more than once (e.g. a file or an iterable 255 that can produce the content only once) and the request is retried 256 for HTTP redirects or authentication. The *data* is sent to the 257 HTTP server right away after the headers. There is no support for 258 a 100-continue expectation in the library. 259 260 .. versionchanged:: 3.3 261 :attr:`Request.method` argument is added to the Request class. 262 263 .. versionchanged:: 3.4 264 Default :attr:`Request.method` may be indicated at the class level. 265 266 .. versionchanged:: 3.6 267 Do not raise an error if the ``Content-Length`` has not been 268 provided and *data* is neither ``None`` nor a bytes object. 269 Fall back to use chunked transfer encoding instead. 270 271.. class:: OpenerDirector() 272 273 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained 274 together. It manages the chaining of handlers, and recovery from errors. 275 276 277.. class:: BaseHandler() 278 279 This is the base class for all registered handlers --- and handles only the 280 simple mechanics of registration. 281 282 283.. class:: HTTPDefaultErrorHandler() 284 285 A class which defines a default handler for HTTP error responses; all responses 286 are turned into :exc:`~urllib.error.HTTPError` exceptions. 287 288 289.. class:: HTTPRedirectHandler() 290 291 A class to handle redirections. 292 293 294.. class:: HTTPCookieProcessor(cookiejar=None) 295 296 A class to handle HTTP Cookies. 297 298 299.. class:: ProxyHandler(proxies=None) 300 301 Cause requests to go through a proxy. If *proxies* is given, it must be a 302 dictionary mapping protocol names to URLs of proxies. The default is to read 303 the list of proxies from the environment variables 304 ``<protocol>_proxy``. If no proxy environment variables are set, then 305 in a Windows environment proxy settings are obtained from the registry's 306 Internet Settings section, and in a macOS environment proxy information 307 is retrieved from the System Configuration Framework. 308 309 To disable autodetected proxy pass an empty dictionary. 310 311 The :envvar:`no_proxy` environment variable can be used to specify hosts 312 which shouldn't be reached via proxy; if set, it should be a comma-separated 313 list of hostname suffixes, optionally with ``:port`` appended, for example 314 ``cern.ch,ncsa.uiuc.edu,some.host:8080``. 315 316 .. note:: 317 318 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; 319 see the documentation on :func:`~urllib.request.getproxies`. 320 321 322.. class:: HTTPPasswordMgr() 323 324 Keep a database of ``(realm, uri) -> (user, password)`` mappings. 325 326 327.. class:: HTTPPasswordMgrWithDefaultRealm() 328 329 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of 330 ``None`` is considered a catch-all realm, which is searched if no other realm 331 fits. 332 333 334.. class:: HTTPPasswordMgrWithPriorAuth() 335 336 A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a 337 database of ``uri -> is_authenticated`` mappings. Can be used by a 338 BasicAuth handler to determine when to send authentication credentials 339 immediately instead of waiting for a ``401`` response first. 340 341 .. versionadded:: 3.5 342 343 344.. class:: AbstractBasicAuthHandler(password_mgr=None) 345 346 This is a mixin class that helps with HTTP authentication, both to the remote 347 host and to a proxy. *password_mgr*, if given, should be something that is 348 compatible with :class:`HTTPPasswordMgr`; refer to section 349 :ref:`http-password-mgr` for information on the interface that must be 350 supported. If *passwd_mgr* also provides ``is_authenticated`` and 351 ``update_authenticated`` methods (see 352 :ref:`http-password-mgr-with-prior-auth`), then the handler will use the 353 ``is_authenticated`` result for a given URI to determine whether or not to 354 send authentication credentials with the request. If ``is_authenticated`` 355 returns ``True`` for the URI, credentials are sent. If ``is_authenticated`` 356 is ``False``, credentials are not sent, and then if a ``401`` response is 357 received the request is re-sent with the authentication credentials. If 358 authentication succeeds, ``update_authenticated`` is called to set 359 ``is_authenticated`` ``True`` for the URI, so that subsequent requests to 360 the URI or any of its super-URIs will automatically include the 361 authentication credentials. 362 363 .. versionadded:: 3.5 364 Added ``is_authenticated`` support. 365 366 367.. class:: HTTPBasicAuthHandler(password_mgr=None) 368 369 Handle authentication with the remote host. *password_mgr*, if given, should 370 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 371 section :ref:`http-password-mgr` for information on the interface that must 372 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when 373 presented with a wrong Authentication scheme. 374 375 376.. class:: ProxyBasicAuthHandler(password_mgr=None) 377 378 Handle authentication with the proxy. *password_mgr*, if given, should be 379 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 380 :ref:`http-password-mgr` for information on the interface that must be 381 supported. 382 383 384.. class:: AbstractDigestAuthHandler(password_mgr=None) 385 386 This is a mixin class that helps with HTTP authentication, both to the remote 387 host and to a proxy. *password_mgr*, if given, should be something that is 388 compatible with :class:`HTTPPasswordMgr`; refer to section 389 :ref:`http-password-mgr` for information on the interface that must be 390 supported. 391 392 393.. class:: HTTPDigestAuthHandler(password_mgr=None) 394 395 Handle authentication with the remote host. *password_mgr*, if given, should 396 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 397 section :ref:`http-password-mgr` for information on the interface that must 398 be supported. When both Digest Authentication Handler and Basic 399 Authentication Handler are both added, Digest Authentication is always tried 400 first. If the Digest Authentication returns a 40x response again, it is sent 401 to Basic Authentication handler to Handle. This Handler method will raise a 402 :exc:`ValueError` when presented with an authentication scheme other than 403 Digest or Basic. 404 405 .. versionchanged:: 3.3 406 Raise :exc:`ValueError` on unsupported Authentication Scheme. 407 408 409 410.. class:: ProxyDigestAuthHandler(password_mgr=None) 411 412 Handle authentication with the proxy. *password_mgr*, if given, should be 413 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 414 :ref:`http-password-mgr` for information on the interface that must be 415 supported. 416 417 418.. class:: HTTPHandler() 419 420 A class to handle opening of HTTP URLs. 421 422 423.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None) 424 425 A class to handle opening of HTTPS URLs. *context* and *check_hostname* 426 have the same meaning as in :class:`http.client.HTTPSConnection`. 427 428 .. versionchanged:: 3.2 429 *context* and *check_hostname* were added. 430 431 432.. class:: FileHandler() 433 434 Open local files. 435 436.. class:: DataHandler() 437 438 Open data URLs. 439 440 .. versionadded:: 3.4 441 442.. class:: FTPHandler() 443 444 Open FTP URLs. 445 446 447.. class:: CacheFTPHandler() 448 449 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. 450 451 452.. class:: UnknownHandler() 453 454 A catch-all class to handle unknown URLs. 455 456 457.. class:: HTTPErrorProcessor() 458 459 Process HTTP error responses. 460 461 462.. _request-objects: 463 464Request Objects 465--------------- 466 467The following methods describe :class:`Request`'s public interface, 468and so all may be overridden in subclasses. It also defines several 469public attributes that can be used by clients to inspect the parsed 470request. 471 472.. attribute:: Request.full_url 473 474 The original URL passed to the constructor. 475 476 .. versionchanged:: 3.4 477 478 Request.full_url is a property with setter, getter and a deleter. Getting 479 :attr:`~Request.full_url` returns the original request URL with the 480 fragment, if it was present. 481 482.. attribute:: Request.type 483 484 The URI scheme. 485 486.. attribute:: Request.host 487 488 The URI authority, typically a host, but may also contain a port 489 separated by a colon. 490 491.. attribute:: Request.origin_req_host 492 493 The original host for the request, without port. 494 495.. attribute:: Request.selector 496 497 The URI path. If the :class:`Request` uses a proxy, then selector 498 will be the full URL that is passed to the proxy. 499 500.. attribute:: Request.data 501 502 The entity body for the request, or ``None`` if not specified. 503 504 .. versionchanged:: 3.4 505 Changing value of :attr:`Request.data` now deletes "Content-Length" 506 header if it was previously set or calculated. 507 508.. attribute:: Request.unverifiable 509 510 boolean, indicates whether the request is unverifiable as defined 511 by :rfc:`2965`. 512 513.. attribute:: Request.method 514 515 The HTTP request method to use. By default its value is :const:`None`, 516 which means that :meth:`~Request.get_method` will do its normal computation 517 of the method to be used. Its value can be set (thus overriding the default 518 computation in :meth:`~Request.get_method`) either by providing a default 519 value by setting it at the class level in a :class:`Request` subclass, or by 520 passing a value in to the :class:`Request` constructor via the *method* 521 argument. 522 523 .. versionadded:: 3.3 524 525 .. versionchanged:: 3.4 526 A default value can now be set in subclasses; previously it could only 527 be set via the constructor argument. 528 529 530.. method:: Request.get_method() 531 532 Return a string indicating the HTTP request method. If 533 :attr:`Request.method` is not ``None``, return its value, otherwise return 534 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not. 535 This is only meaningful for HTTP requests. 536 537 .. versionchanged:: 3.3 538 get_method now looks at the value of :attr:`Request.method`. 539 540 541.. method:: Request.add_header(key, val) 542 543 Add another header to the request. Headers are currently ignored by all 544 handlers except HTTP handlers, where they are added to the list of headers sent 545 to the server. Note that there cannot be more than one header with the same 546 name, and later calls will overwrite previous calls in case the *key* collides. 547 Currently, this is no loss of HTTP functionality, since all headers which have 548 meaning when used more than once have a (header-specific) way of gaining the 549 same functionality using only one header. 550 551 552.. method:: Request.add_unredirected_header(key, header) 553 554 Add a header that will not be added to a redirected request. 555 556 557.. method:: Request.has_header(header) 558 559 Return whether the instance has the named header (checks both regular and 560 unredirected). 561 562 563.. method:: Request.remove_header(header) 564 565 Remove named header from the request instance (both from regular and 566 unredirected headers). 567 568 .. versionadded:: 3.4 569 570 571.. method:: Request.get_full_url() 572 573 Return the URL given in the constructor. 574 575 .. versionchanged:: 3.4 576 577 Returns :attr:`Request.full_url` 578 579 580.. method:: Request.set_proxy(host, type) 581 582 Prepare the request by connecting to a proxy server. The *host* and *type* will 583 replace those of the instance, and the instance's selector will be the original 584 URL given in the constructor. 585 586 587.. method:: Request.get_header(header_name, default=None) 588 589 Return the value of the given header. If the header is not present, return 590 the default value. 591 592 593.. method:: Request.header_items() 594 595 Return a list of tuples (header_name, header_value) of the Request headers. 596 597.. versionchanged:: 3.4 598 The request methods add_data, has_data, get_data, get_type, get_host, 599 get_selector, get_origin_req_host and is_unverifiable that were deprecated 600 since 3.3 have been removed. 601 602 603.. _opener-director-objects: 604 605OpenerDirector Objects 606---------------------- 607 608:class:`OpenerDirector` instances have the following methods: 609 610 611.. method:: OpenerDirector.add_handler(handler) 612 613 *handler* should be an instance of :class:`BaseHandler`. The following methods 614 are searched, and added to the possible chains (note that HTTP errors are a 615 special case). Note that, in the following, *protocol* should be replaced 616 with the actual protocol to handle, for example :meth:`http_response` would 617 be the HTTP protocol response handler. Also *type* should be replaced with 618 the actual HTTP code, for example :meth:`http_error_404` would handle HTTP 619 404 errors. 620 621 * :meth:`<protocol>_open` --- signal that the handler knows how to open *protocol* 622 URLs. 623 624 See |protocol_open|_ for more information. 625 626 * :meth:`http_error_\<type\>` --- signal that the handler knows how to handle HTTP 627 errors with HTTP error code *type*. 628 629 See |http_error_nnn|_ for more information. 630 631 * :meth:`<protocol>_error` --- signal that the handler knows how to handle errors 632 from (non-\ ``http``) *protocol*. 633 634 * :meth:`<protocol>_request` --- signal that the handler knows how to pre-process 635 *protocol* requests. 636 637 See |protocol_request|_ for more information. 638 639 * :meth:`<protocol>_response` --- signal that the handler knows how to 640 post-process *protocol* responses. 641 642 See |protocol_response|_ for more information. 643 644.. |protocol_open| replace:: :meth:`BaseHandler.<protocol>_open` 645.. |http_error_nnn| replace:: :meth:`BaseHandler.http_error_\<nnn\>` 646.. |protocol_request| replace:: :meth:`BaseHandler.<protocol>_request` 647.. |protocol_response| replace:: :meth:`BaseHandler.<protocol>_response` 648 649.. method:: OpenerDirector.open(url, data=None[, timeout]) 650 651 Open the given *url* (which can be a request object or a string), optionally 652 passing the given *data*. Arguments, return values and exceptions raised are 653 the same as those of :func:`urlopen` (which simply calls the :meth:`open` 654 method on the currently installed global :class:`OpenerDirector`). The 655 optional *timeout* parameter specifies a timeout in seconds for blocking 656 operations like the connection attempt (if not specified, the global default 657 timeout setting will be used). The timeout feature actually works only for 658 HTTP, HTTPS and FTP connections. 659 660 661.. method:: OpenerDirector.error(proto, *args) 662 663 Handle an error of the given protocol. This will call the registered error 664 handlers for the given protocol with the given arguments (which are protocol 665 specific). The HTTP protocol is a special case which uses the HTTP response 666 code to determine the specific error handler; refer to the :meth:`http_error_\<type\>` 667 methods of the handler classes. 668 669 Return values and exceptions raised are the same as those of :func:`urlopen`. 670 671OpenerDirector objects open URLs in three stages: 672 673The order in which these methods are called within each stage is determined by 674sorting the handler instances. 675 676#. Every handler with a method named like :meth:`<protocol>_request` has that 677 method called to pre-process the request. 678 679#. Handlers with a method named like :meth:`<protocol>_open` are called to handle 680 the request. This stage ends when a handler either returns a non-\ :const:`None` 681 value (ie. a response), or raises an exception (usually 682 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate. 683 684 In fact, the above algorithm is first tried for methods named 685 :meth:`default_open`. If all such methods return :const:`None`, the algorithm 686 is repeated for methods named like :meth:`<protocol>_open`. If all such methods 687 return :const:`None`, the algorithm is repeated for methods named 688 :meth:`unknown_open`. 689 690 Note that the implementation of these methods may involve calls of the parent 691 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and 692 :meth:`~OpenerDirector.error` methods. 693 694#. Every handler with a method named like :meth:`<protocol>_response` has that 695 method called to post-process the response. 696 697 698.. _base-handler-objects: 699 700BaseHandler Objects 701------------------- 702 703:class:`BaseHandler` objects provide a couple of methods that are directly 704useful, and others that are meant to be used by derived classes. These are 705intended for direct use: 706 707 708.. method:: BaseHandler.add_parent(director) 709 710 Add a director as parent. 711 712 713.. method:: BaseHandler.close() 714 715 Remove any parents. 716 717The following attribute and methods should only be used by classes derived from 718:class:`BaseHandler`. 719 720.. note:: 721 722 The convention has been adopted that subclasses defining 723 :meth:`<protocol>_request` or :meth:`<protocol>_response` methods are named 724 :class:`\*Processor`; all others are named :class:`\*Handler`. 725 726 727.. attribute:: BaseHandler.parent 728 729 A valid :class:`OpenerDirector`, which can be used to open using a different 730 protocol, or handle errors. 731 732 733.. method:: BaseHandler.default_open(req) 734 735 This method is *not* defined in :class:`BaseHandler`, but subclasses should 736 define it if they want to catch all URLs. 737 738 This method, if implemented, will be called by the parent 739 :class:`OpenerDirector`. It should return a file-like object as described in 740 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``. 741 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional 742 thing happens (for example, :exc:`MemoryError` should not be mapped to 743 :exc:`URLError`). 744 745 This method will be called before any protocol-specific open method. 746 747 748.. _protocol_open: 749.. method:: BaseHandler.<protocol>_open(req) 750 :noindex: 751 752 This method is *not* defined in :class:`BaseHandler`, but subclasses should 753 define it if they want to handle URLs with the given protocol. 754 755 This method, if defined, will be called by the parent :class:`OpenerDirector`. 756 Return values should be the same as for :meth:`default_open`. 757 758 759.. method:: BaseHandler.unknown_open(req) 760 761 This method is *not* defined in :class:`BaseHandler`, but subclasses should 762 define it if they want to catch all URLs with no specific registered handler to 763 open it. 764 765 This method, if implemented, will be called by the :attr:`parent` 766 :class:`OpenerDirector`. Return values should be the same as for 767 :meth:`default_open`. 768 769 770.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) 771 772 This method is *not* defined in :class:`BaseHandler`, but subclasses should 773 override it if they intend to provide a catch-all for otherwise unhandled HTTP 774 errors. It will be called automatically by the :class:`OpenerDirector` getting 775 the error, and should not normally be called in other circumstances. 776 777 *req* will be a :class:`Request` object, *fp* will be a file-like object with 778 the HTTP error body, *code* will be the three-digit code of the error, *msg* 779 will be the user-visible explanation of the code and *hdrs* will be a mapping 780 object with the headers of the error. 781 782 Return values and exceptions raised should be the same as those of 783 :func:`urlopen`. 784 785 786.. _http_error_nnn: 787.. method:: BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs) 788 789 *nnn* should be a three-digit HTTP error code. This method is also not defined 790 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a 791 subclass, when an HTTP error with code *nnn* occurs. 792 793 Subclasses should override this method to handle specific HTTP errors. 794 795 Arguments, return values and exceptions raised should be the same as for 796 :meth:`http_error_default`. 797 798 799.. _protocol_request: 800.. method:: BaseHandler.<protocol>_request(req) 801 :noindex: 802 803 This method is *not* defined in :class:`BaseHandler`, but subclasses should 804 define it if they want to pre-process requests of the given protocol. 805 806 This method, if defined, will be called by the parent :class:`OpenerDirector`. 807 *req* will be a :class:`Request` object. The return value should be a 808 :class:`Request` object. 809 810 811.. _protocol_response: 812.. method:: BaseHandler.<protocol>_response(req, response) 813 :noindex: 814 815 This method is *not* defined in :class:`BaseHandler`, but subclasses should 816 define it if they want to post-process responses of the given protocol. 817 818 This method, if defined, will be called by the parent :class:`OpenerDirector`. 819 *req* will be a :class:`Request` object. *response* will be an object 820 implementing the same interface as the return value of :func:`urlopen`. The 821 return value should implement the same interface as the return value of 822 :func:`urlopen`. 823 824 825.. _http-redirect-handler: 826 827HTTPRedirectHandler Objects 828--------------------------- 829 830.. note:: 831 832 Some HTTP redirections require action from this module's client code. If this 833 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for 834 details of the precise meanings of the various redirection codes. 835 836 An :class:`HTTPError` exception raised as a security consideration if the 837 HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, 838 HTTPS or FTP URL. 839 840 841.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) 842 843 Return a :class:`Request` or ``None`` in response to a redirect. This is called 844 by the default implementations of the :meth:`http_error_30\*` methods when a 845 redirection is received from the server. If a redirection should take place, 846 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the 847 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if 848 no other handler should try to handle this URL, or return ``None`` if you 849 can't but another handler might. 850 851 .. note:: 852 853 The default implementation of this method does not strictly follow :rfc:`2616`, 854 which says that 301 and 302 responses to ``POST`` requests must not be 855 automatically redirected without confirmation by the user. In reality, browsers 856 do allow automatic redirection of these responses, changing the POST to a 857 ``GET``, and the default implementation reproduces this behavior. 858 859 860.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) 861 862 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the 863 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. 864 865 866.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) 867 868 The same as :meth:`http_error_301`, but called for the 'found' response. 869 870 871.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) 872 873 The same as :meth:`http_error_301`, but called for the 'see other' response. 874 875 876.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) 877 878 The same as :meth:`http_error_301`, but called for the 'temporary redirect' 879 response. 880 881 882.. _http-cookie-processor: 883 884HTTPCookieProcessor Objects 885--------------------------- 886 887:class:`HTTPCookieProcessor` instances have one attribute: 888 889.. attribute:: HTTPCookieProcessor.cookiejar 890 891 The :class:`http.cookiejar.CookieJar` in which cookies are stored. 892 893 894.. _proxy-handler: 895 896ProxyHandler Objects 897-------------------- 898 899 900.. method:: ProxyHandler.<protocol>_open(request) 901 :noindex: 902 903 The :class:`ProxyHandler` will have a method :meth:`<protocol>_open` for every 904 *protocol* which has a proxy in the *proxies* dictionary given in the 905 constructor. The method will modify requests to go through the proxy, by 906 calling ``request.set_proxy()``, and call the next handler in the chain to 907 actually execute the protocol. 908 909 910.. _http-password-mgr: 911 912HTTPPasswordMgr Objects 913----------------------- 914 915These methods are available on :class:`HTTPPasswordMgr` and 916:class:`HTTPPasswordMgrWithDefaultRealm` objects. 917 918 919.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) 920 921 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and 922 *passwd* must be strings. This causes ``(user, passwd)`` to be used as 923 authentication tokens when authentication for *realm* and a super-URI of any of 924 the given URIs is given. 925 926 927.. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 928 929 Get user/password for given realm and URI, if any. This method will return 930 ``(None, None)`` if there is no matching user/password. 931 932 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be 933 searched if the given *realm* has no matching user/password. 934 935 936.. _http-password-mgr-with-prior-auth: 937 938HTTPPasswordMgrWithPriorAuth Objects 939------------------------------------ 940 941This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support 942tracking URIs for which authentication credentials should always be sent. 943 944 945.. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \ 946 passwd, is_authenticated=False) 947 948 *realm*, *uri*, *user*, *passwd* are as for 949 :meth:`HTTPPasswordMgr.add_password`. *is_authenticated* sets the initial 950 value of the ``is_authenticated`` flag for the given URI or list of URIs. 951 If *is_authenticated* is specified as ``True``, *realm* is ignored. 952 953 954.. method:: HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri) 955 956 Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects 957 958 959.. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \ 960 is_authenticated=False) 961 962 Update the ``is_authenticated`` flag for the given *uri* or list 963 of URIs. 964 965 966.. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri) 967 968 Returns the current state of the ``is_authenticated`` flag for 969 the given URI. 970 971 972.. _abstract-basic-auth-handler: 973 974AbstractBasicAuthHandler Objects 975-------------------------------- 976 977 978.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 979 980 Handle an authentication request by getting a user/password pair, and re-trying 981 the request. *authreq* should be the name of the header where the information 982 about the realm is included in the request, *host* specifies the URL and path to 983 authenticate for, *req* should be the (failed) :class:`Request` object, and 984 *headers* should be the error headers. 985 986 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an 987 authority component (e.g. ``"http://python.org/"``). In either case, the 988 authority must not contain a userinfo component (so, ``"python.org"`` and 989 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not). 990 991 992.. _http-basic-auth-handler: 993 994HTTPBasicAuthHandler Objects 995---------------------------- 996 997 998.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) 999 1000 Retry the request with authentication information, if available. 1001 1002 1003.. _proxy-basic-auth-handler: 1004 1005ProxyBasicAuthHandler Objects 1006----------------------------- 1007 1008 1009.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1010 1011 Retry the request with authentication information, if available. 1012 1013 1014.. _abstract-digest-auth-handler: 1015 1016AbstractDigestAuthHandler Objects 1017--------------------------------- 1018 1019 1020.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 1021 1022 *authreq* should be the name of the header where the information about the realm 1023 is included in the request, *host* should be the host to authenticate to, *req* 1024 should be the (failed) :class:`Request` object, and *headers* should be the 1025 error headers. 1026 1027 1028.. _http-digest-auth-handler: 1029 1030HTTPDigestAuthHandler Objects 1031----------------------------- 1032 1033 1034.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1035 1036 Retry the request with authentication information, if available. 1037 1038 1039.. _proxy-digest-auth-handler: 1040 1041ProxyDigestAuthHandler Objects 1042------------------------------ 1043 1044 1045.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1046 1047 Retry the request with authentication information, if available. 1048 1049 1050.. _http-handler-objects: 1051 1052HTTPHandler Objects 1053------------------- 1054 1055 1056.. method:: HTTPHandler.http_open(req) 1057 1058 Send an HTTP request, which can be either GET or POST, depending on 1059 ``req.has_data()``. 1060 1061 1062.. _https-handler-objects: 1063 1064HTTPSHandler Objects 1065-------------------- 1066 1067 1068.. method:: HTTPSHandler.https_open(req) 1069 1070 Send an HTTPS request, which can be either GET or POST, depending on 1071 ``req.has_data()``. 1072 1073 1074.. _file-handler-objects: 1075 1076FileHandler Objects 1077------------------- 1078 1079 1080.. method:: FileHandler.file_open(req) 1081 1082 Open the file locally, if there is no host name, or the host name is 1083 ``'localhost'``. 1084 1085 .. versionchanged:: 3.2 1086 This method is applicable only for local hostnames. When a remote 1087 hostname is given, an :exc:`~urllib.error.URLError` is raised. 1088 1089 1090.. _data-handler-objects: 1091 1092DataHandler Objects 1093------------------- 1094 1095.. method:: DataHandler.data_open(req) 1096 1097 Read a data URL. This kind of URL contains the content encoded in the URL 1098 itself. The data URL syntax is specified in :rfc:`2397`. This implementation 1099 ignores white spaces in base64 encoded data URLs so the URL may be wrapped 1100 in whatever source file it comes from. But even though some browsers don't 1101 mind about a missing padding at the end of a base64 encoded data URL, this 1102 implementation will raise an :exc:`ValueError` in that case. 1103 1104 1105.. _ftp-handler-objects: 1106 1107FTPHandler Objects 1108------------------ 1109 1110 1111.. method:: FTPHandler.ftp_open(req) 1112 1113 Open the FTP file indicated by *req*. The login is always done with empty 1114 username and password. 1115 1116 1117.. _cacheftp-handler-objects: 1118 1119CacheFTPHandler Objects 1120----------------------- 1121 1122:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the 1123following additional methods: 1124 1125 1126.. method:: CacheFTPHandler.setTimeout(t) 1127 1128 Set timeout of connections to *t* seconds. 1129 1130 1131.. method:: CacheFTPHandler.setMaxConns(m) 1132 1133 Set maximum number of cached connections to *m*. 1134 1135 1136.. _unknown-handler-objects: 1137 1138UnknownHandler Objects 1139---------------------- 1140 1141 1142.. method:: UnknownHandler.unknown_open() 1143 1144 Raise a :exc:`~urllib.error.URLError` exception. 1145 1146 1147.. _http-error-processor-objects: 1148 1149HTTPErrorProcessor Objects 1150-------------------------- 1151 1152.. method:: HTTPErrorProcessor.http_response(request, response) 1153 1154 Process HTTP error responses. 1155 1156 For 200 error codes, the response object is returned immediately. 1157 1158 For non-200 error codes, this simply passes the job on to the 1159 :meth:`http_error_\<type\>` handler methods, via :meth:`OpenerDirector.error`. 1160 Eventually, :class:`HTTPDefaultErrorHandler` will raise an 1161 :exc:`~urllib.error.HTTPError` if no other handler handles the error. 1162 1163 1164.. method:: HTTPErrorProcessor.https_response(request, response) 1165 1166 Process HTTPS error responses. 1167 1168 The behavior is same as :meth:`http_response`. 1169 1170 1171.. _urllib-request-examples: 1172 1173Examples 1174-------- 1175 1176In addition to the examples below, more examples are given in 1177:ref:`urllib-howto`. 1178 1179This example gets the python.org main page and displays the first 300 bytes of 1180it. :: 1181 1182 >>> import urllib.request 1183 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1184 ... print(f.read(300)) 1185 ... 1186 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1187 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html 1188 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n 1189 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n 1190 <title>Python Programming ' 1191 1192Note that urlopen returns a bytes object. This is because there is no way 1193for urlopen to automatically determine the encoding of the byte stream 1194it receives from the HTTP server. In general, a program will decode 1195the returned bytes object to string once it determines or guesses 1196the appropriate encoding. 1197 1198The following W3C document, https://www.w3.org/International/O-charset\ , lists 1199the various ways in which an (X)HTML or an XML document could have specified its 1200encoding information. 1201 1202As the python.org website uses *utf-8* encoding as specified in its meta tag, we 1203will use the same for decoding the bytes object. :: 1204 1205 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1206 ... print(f.read(100).decode('utf-8')) 1207 ... 1208 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1209 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1210 1211It is also possible to achieve the same result without using the 1212:term:`context manager` approach. :: 1213 1214 >>> import urllib.request 1215 >>> f = urllib.request.urlopen('http://www.python.org/') 1216 >>> print(f.read(100).decode('utf-8')) 1217 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1218 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1219 1220In the following example, we are sending a data-stream to the stdin of a CGI 1221and reading the data it returns to us. Note that this example will only work 1222when the Python installation supports SSL. :: 1223 1224 >>> import urllib.request 1225 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', 1226 ... data=b'This data is passed to stdin of the CGI') 1227 >>> with urllib.request.urlopen(req) as f: 1228 ... print(f.read().decode('utf-8')) 1229 ... 1230 Got Data: "This data is passed to stdin of the CGI" 1231 1232The code for the sample CGI used in the above example is:: 1233 1234 #!/usr/bin/env python 1235 import sys 1236 data = sys.stdin.read() 1237 print('Content-type: text/plain\n\nGot Data: "%s"' % data) 1238 1239Here is an example of doing a ``PUT`` request using :class:`Request`:: 1240 1241 import urllib.request 1242 DATA = b'some data' 1243 req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT') 1244 with urllib.request.urlopen(req) as f: 1245 pass 1246 print(f.status) 1247 print(f.reason) 1248 1249Use of Basic HTTP Authentication:: 1250 1251 import urllib.request 1252 # Create an OpenerDirector with support for Basic HTTP Authentication... 1253 auth_handler = urllib.request.HTTPBasicAuthHandler() 1254 auth_handler.add_password(realm='PDQ Application', 1255 uri='https://mahler:8092/site-updates.py', 1256 user='klem', 1257 passwd='kadidd!ehopper') 1258 opener = urllib.request.build_opener(auth_handler) 1259 # ...and install it globally so it can be used with urlopen. 1260 urllib.request.install_opener(opener) 1261 urllib.request.urlopen('http://www.example.com/login.html') 1262 1263:func:`build_opener` provides many handlers by default, including a 1264:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment 1265variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme 1266involved. For example, the :envvar:`http_proxy` environment variable is read to 1267obtain the HTTP proxy's URL. 1268 1269This example replaces the default :class:`ProxyHandler` with one that uses 1270programmatically-supplied proxy URLs, and adds proxy authorization support with 1271:class:`ProxyBasicAuthHandler`. :: 1272 1273 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) 1274 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() 1275 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') 1276 1277 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) 1278 # This time, rather than install the OpenerDirector, we use it directly: 1279 opener.open('http://www.example.com/login.html') 1280 1281Adding HTTP headers: 1282 1283Use the *headers* argument to the :class:`Request` constructor, or:: 1284 1285 import urllib.request 1286 req = urllib.request.Request('http://www.example.com/') 1287 req.add_header('Referer', 'http://www.python.org/') 1288 # Customize the default User-Agent header value: 1289 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') 1290 r = urllib.request.urlopen(req) 1291 1292:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to 1293every :class:`Request`. To change this:: 1294 1295 import urllib.request 1296 opener = urllib.request.build_opener() 1297 opener.addheaders = [('User-agent', 'Mozilla/5.0')] 1298 opener.open('http://www.example.com/') 1299 1300Also, remember that a few standard headers (:mailheader:`Content-Length`, 1301:mailheader:`Content-Type` and :mailheader:`Host`) 1302are added when the :class:`Request` is passed to :func:`urlopen` (or 1303:meth:`OpenerDirector.open`). 1304 1305.. _urllib-examples: 1306 1307Here is an example session that uses the ``GET`` method to retrieve a URL 1308containing parameters:: 1309 1310 >>> import urllib.request 1311 >>> import urllib.parse 1312 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1313 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params 1314 >>> with urllib.request.urlopen(url) as f: 1315 ... print(f.read().decode('utf-8')) 1316 ... 1317 1318The following example uses the ``POST`` method instead. Note that params output 1319from urlencode is encoded to bytes before it is sent to urlopen as data:: 1320 1321 >>> import urllib.request 1322 >>> import urllib.parse 1323 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1324 >>> data = data.encode('ascii') 1325 >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: 1326 ... print(f.read().decode('utf-8')) 1327 ... 1328 1329The following example uses an explicitly specified HTTP proxy, overriding 1330environment settings:: 1331 1332 >>> import urllib.request 1333 >>> proxies = {'http': 'http://proxy.example.com:8080/'} 1334 >>> opener = urllib.request.FancyURLopener(proxies) 1335 >>> with opener.open("http://www.python.org") as f: 1336 ... f.read().decode('utf-8') 1337 ... 1338 1339The following example uses no proxies at all, overriding environment settings:: 1340 1341 >>> import urllib.request 1342 >>> opener = urllib.request.FancyURLopener({}) 1343 >>> with opener.open("http://www.python.org/") as f: 1344 ... f.read().decode('utf-8') 1345 ... 1346 1347 1348Legacy interface 1349---------------- 1350 1351The following functions and classes are ported from the Python 2 module 1352``urllib`` (as opposed to ``urllib2``). They might become deprecated at 1353some point in the future. 1354 1355.. function:: urlretrieve(url, filename=None, reporthook=None, data=None) 1356 1357 Copy a network object denoted by a URL to a local file. If the URL 1358 points to a local file, the object will not be copied unless filename is supplied. 1359 Return a tuple ``(filename, headers)`` where *filename* is the 1360 local file name under which the object can be found, and *headers* is whatever 1361 the :meth:`info` method of the object returned by :func:`urlopen` returned (for 1362 a remote object). Exceptions are the same as for :func:`urlopen`. 1363 1364 The second argument, if present, specifies the file location to copy to (if 1365 absent, the location will be a tempfile with a generated name). The third 1366 argument, if present, is a callable that will be called once on 1367 establishment of the network connection and once after each block read 1368 thereafter. The callable will be passed three arguments; a count of blocks 1369 transferred so far, a block size in bytes, and the total size of the file. The 1370 third argument may be ``-1`` on older FTP servers which do not return a file 1371 size in response to a retrieval request. 1372 1373 The following example illustrates the most common usage scenario:: 1374 1375 >>> import urllib.request 1376 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') 1377 >>> html = open(local_filename) 1378 >>> html.close() 1379 1380 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1381 argument may be given to specify a ``POST`` request (normally the request 1382 type is ``GET``). The *data* argument must be a bytes object in standard 1383 :mimetype:`application/x-www-form-urlencoded` format; see the 1384 :func:`urllib.parse.urlencode` function. 1385 1386 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that 1387 the amount of data available was less than the expected amount (which is the 1388 size reported by a *Content-Length* header). This can occur, for example, when 1389 the download is interrupted. 1390 1391 The *Content-Length* is treated as a lower bound: if there's more data to read, 1392 urlretrieve reads more data, but if less data is available, it raises the 1393 exception. 1394 1395 You can still retrieve the downloaded data in this case, it is stored in the 1396 :attr:`content` attribute of the exception instance. 1397 1398 If no *Content-Length* header was supplied, urlretrieve can not check the size 1399 of the data it has downloaded, and just returns it. In this case you just have 1400 to assume that the download was successful. 1401 1402.. function:: urlcleanup() 1403 1404 Cleans up temporary files that may have been left behind by previous 1405 calls to :func:`urlretrieve`. 1406 1407.. class:: URLopener(proxies=None, **x509) 1408 1409 .. deprecated:: 3.3 1410 1411 Base class for opening and reading URLs. Unless you need to support opening 1412 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`, 1413 you probably want to use :class:`FancyURLopener`. 1414 1415 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header 1416 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number. 1417 Applications can define their own :mailheader:`User-Agent` header by subclassing 1418 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute 1419 :attr:`version` to an appropriate string value in the subclass definition. 1420 1421 The optional *proxies* parameter should be a dictionary mapping scheme names to 1422 proxy URLs, where an empty dictionary turns proxies off completely. Its default 1423 value is ``None``, in which case environmental proxy settings will be used if 1424 present, as discussed in the definition of :func:`urlopen`, above. 1425 1426 Additional keyword parameters, collected in *x509*, may be used for 1427 authentication of the client when using the :file:`https:` scheme. The keywords 1428 *key_file* and *cert_file* are supported to provide an SSL key and certificate; 1429 both are needed to support client authentication. 1430 1431 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server 1432 returns an error code. 1433 1434 .. method:: open(fullurl, data=None) 1435 1436 Open *fullurl* using the appropriate protocol. This method sets up cache and 1437 proxy information, then calls the appropriate open method with its input 1438 arguments. If the scheme is not recognized, :meth:`open_unknown` is called. 1439 The *data* argument has the same meaning as the *data* argument of 1440 :func:`urlopen`. 1441 1442 This method always quotes *fullurl* using :func:`~urllib.parse.quote`. 1443 1444 .. method:: open_unknown(fullurl, data=None) 1445 1446 Overridable interface to open unknown URL types. 1447 1448 1449 .. method:: retrieve(url, filename=None, reporthook=None, data=None) 1450 1451 Retrieves the contents of *url* and places it in *filename*. The return value 1452 is a tuple consisting of a local filename and either an 1453 :class:`email.message.Message` object containing the response headers (for remote 1454 URLs) or ``None`` (for local URLs). The caller must then open and read the 1455 contents of *filename*. If *filename* is not given and the URL refers to a 1456 local file, the input filename is returned. If the URL is non-local and 1457 *filename* is not given, the filename is the output of :func:`tempfile.mktemp` 1458 with a suffix that matches the suffix of the last path component of the input 1459 URL. If *reporthook* is given, it must be a function accepting three numeric 1460 parameters: A chunk number, the maximum size chunks are read in and the total size of the download 1461 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the 1462 network. *reporthook* is ignored for local URLs. 1463 1464 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1465 argument may be given to specify a ``POST`` request (normally the request type 1466 is ``GET``). The *data* argument must in standard 1467 :mimetype:`application/x-www-form-urlencoded` format; see the 1468 :func:`urllib.parse.urlencode` function. 1469 1470 1471 .. attribute:: version 1472 1473 Variable that specifies the user agent of the opener object. To get 1474 :mod:`urllib` to tell servers that it is a particular user agent, set this in a 1475 subclass as a class variable or in the constructor before calling the base 1476 constructor. 1477 1478 1479.. class:: FancyURLopener(...) 1480 1481 .. deprecated:: 3.3 1482 1483 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling 1484 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x 1485 response codes listed above, the :mailheader:`Location` header is used to fetch 1486 the actual URL. For 401 response codes (authentication required), basic HTTP 1487 authentication is performed. For the 30x response codes, recursion is bounded 1488 by the value of the *maxtries* attribute, which defaults to 10. 1489 1490 For all other response codes, the method :meth:`http_error_default` is called 1491 which you can override in subclasses to handle the error appropriately. 1492 1493 .. note:: 1494 1495 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests 1496 must not be automatically redirected without confirmation by the user. In 1497 reality, browsers do allow automatic redirection of these responses, changing 1498 the POST to a GET, and :mod:`urllib` reproduces this behaviour. 1499 1500 The parameters to the constructor are the same as those for :class:`URLopener`. 1501 1502 .. note:: 1503 1504 When performing basic authentication, a :class:`FancyURLopener` instance calls 1505 its :meth:`prompt_user_passwd` method. The default implementation asks the 1506 users for the required information on the controlling terminal. A subclass may 1507 override this method to support more appropriate behavior if needed. 1508 1509 The :class:`FancyURLopener` class offers one additional method that should be 1510 overloaded to provide the appropriate behavior: 1511 1512 .. method:: prompt_user_passwd(host, realm) 1513 1514 Return information needed to authenticate the user at the given host in the 1515 specified security realm. The return value should be a tuple, ``(user, 1516 password)``, which can be used for basic authentication. 1517 1518 The implementation prompts for this information on the terminal; an application 1519 should override this method to use an appropriate interaction model in the local 1520 environment. 1521 1522 1523:mod:`urllib.request` Restrictions 1524---------------------------------- 1525 1526 .. index:: 1527 pair: HTTP; protocol 1528 pair: FTP; protocol 1529 1530* Currently, only the following protocols are supported: HTTP (versions 0.9 and 1531 1.0), FTP, local files, and data URLs. 1532 1533 .. versionchanged:: 3.4 Added support for data URLs. 1534 1535* The caching feature of :func:`urlretrieve` has been disabled until someone 1536 finds the time to hack proper processing of Expiration time headers. 1537 1538* There should be a function to query whether a particular URL is in the cache. 1539 1540* For backward compatibility, if a URL appears to point to a local file but the 1541 file can't be opened, the URL is re-interpreted using the FTP protocol. This 1542 can sometimes cause confusing error messages. 1543 1544* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily 1545 long delays while waiting for a network connection to be set up. This means 1546 that it is difficult to build an interactive web client using these functions 1547 without using threads. 1548 1549 .. index:: 1550 single: HTML 1551 pair: HTTP; protocol 1552 1553* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data 1554 returned by the server. This may be binary data (such as an image), plain text 1555 or (for example) HTML. The HTTP protocol provides type information in the reply 1556 header, which can be inspected by looking at the :mailheader:`Content-Type` 1557 header. If the returned data is HTML, you can use the module 1558 :mod:`html.parser` to parse it. 1559 1560 .. index:: single: FTP 1561 1562* The code handling the FTP protocol cannot differentiate between a file and a 1563 directory. This can lead to unexpected behavior when attempting to read a URL 1564 that points to a file that is not accessible. If the URL ends in a ``/``, it is 1565 assumed to refer to a directory and will be handled accordingly. But if an 1566 attempt to read a file leads to a 550 error (meaning the URL cannot be found or 1567 is not accessible, often for permission reasons), then the path is treated as a 1568 directory in order to handle the case when a directory is specified by a URL but 1569 the trailing ``/`` has been left off. This can cause misleading results when 1570 you try to fetch a file whose read permissions make it inaccessible; the FTP 1571 code will try to read it, fail with a 550 error, and then perform a directory 1572 listing for the unreadable file. If fine-grained control is needed, consider 1573 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing 1574 *_urlopener* to meet your needs. 1575 1576 1577 1578:mod:`urllib.response` --- Response classes used by urllib 1579========================================================== 1580 1581.. module:: urllib.response 1582 :synopsis: Response classes used by urllib. 1583 1584The :mod:`urllib.response` module defines functions and classes which define a 1585minimal file-like interface, including ``read()`` and ``readline()``. 1586Functions defined by this module are used internally by the :mod:`urllib.request` module. 1587The typical response object is a :class:`urllib.response.addinfourl` instance: 1588 1589.. class:: addinfourl 1590 1591 .. attribute:: url 1592 1593 URL of the resource retrieved, commonly used to determine if a redirect was followed. 1594 1595 .. attribute:: headers 1596 1597 Returns the headers of the response in the form of an :class:`~email.message.EmailMessage` instance. 1598 1599 .. attribute:: status 1600 1601 .. versionadded:: 3.9 1602 1603 Status code returned by server. 1604 1605 .. method:: geturl() 1606 1607 .. deprecated:: 3.9 1608 Deprecated in favor of :attr:`~addinfourl.url`. 1609 1610 .. method:: info() 1611 1612 .. deprecated:: 3.9 1613 Deprecated in favor of :attr:`~addinfourl.headers`. 1614 1615 .. attribute:: code 1616 1617 .. deprecated:: 3.9 1618 Deprecated in favor of :attr:`~addinfourl.status`. 1619 1620 .. method:: getstatus() 1621 1622 .. deprecated:: 3.9 1623 Deprecated in favor of :attr:`~addinfourl.status`. 1624