1:mod:`urllib.request` --- Extensible library for opening URLs 2============================================================= 3 4.. module:: urllib.request 5 :synopsis: Extensible library for opening URLs. 6 7.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu> 8.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> 9.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com> 10 11**Source code:** :source:`Lib/urllib/request.py` 12 13-------------- 14 15The :mod:`urllib.request` module defines functions and classes which help in 16opening URLs (mostly HTTP) in a complex world --- basic and digest 17authentication, redirections, cookies and more. 18 19.. seealso:: 20 21 The `Requests package <http://docs.python-requests.org/>`_ 22 is recommended for a higher-level HTTP client interface. 23 24 25The :mod:`urllib.request` module defines the following functions: 26 27 28.. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None) 29 30 Open the URL *url*, which can be either a string or a 31 :class:`Request` object. 32 33 *data* must be an object specifying additional data to be sent to the 34 server, or ``None`` if no such data is needed. See :class:`Request` 35 for details. 36 37 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header 38 in its HTTP requests. 39 40 The optional *timeout* parameter specifies a timeout in seconds for 41 blocking operations like the connection attempt (if not specified, 42 the global default timeout setting will be used). This actually 43 only works for HTTP, HTTPS and FTP connections. 44 45 If *context* is specified, it must be a :class:`ssl.SSLContext` instance 46 describing the various SSL options. See :class:`~http.client.HTTPSConnection` 47 for more details. 48 49 The optional *cafile* and *capath* parameters specify a set of trusted 50 CA certificates for HTTPS requests. *cafile* should point to a single 51 file containing a bundle of CA certificates, whereas *capath* should 52 point to a directory of hashed certificate files. More information can 53 be found in :meth:`ssl.SSLContext.load_verify_locations`. 54 55 The *cadefault* parameter is ignored. 56 57 This function always returns an object which can work as a 58 :term:`context manager` and has methods such as 59 60 * :meth:`~urllib.response.addinfourl.geturl` --- return the URL of the resource retrieved, 61 commonly used to determine if a redirect was followed 62 63 * :meth:`~urllib.response.addinfourl.info` --- return the meta-information of the page, such as headers, 64 in the form of an :func:`email.message_from_string` instance (see 65 `Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_) 66 67 * :meth:`~urllib.response.addinfourl.getcode` -- return the HTTP status code of the response. 68 69 For HTTP and HTTPS URLs, this function returns a 70 :class:`http.client.HTTPResponse` object slightly modified. In addition 71 to the three new methods above, the msg attribute contains the 72 same information as the :attr:`~http.client.HTTPResponse.reason` 73 attribute --- the reason phrase returned by server --- instead of 74 the response headers as it is specified in the documentation for 75 :class:`~http.client.HTTPResponse`. 76 77 For FTP, file, and data URLs and requests explicitly handled by legacy 78 :class:`URLopener` and :class:`FancyURLopener` classes, this function 79 returns a :class:`urllib.response.addinfourl` object. 80 81 Raises :exc:`~urllib.error.URLError` on protocol errors. 82 83 Note that ``None`` may be returned if no handler handles the request (though 84 the default installed global :class:`OpenerDirector` uses 85 :class:`UnknownHandler` to ensure this never happens). 86 87 In addition, if proxy settings are detected (for example, when a ``*_proxy`` 88 environment variable like :envvar:`http_proxy` is set), 89 :class:`ProxyHandler` is default installed and makes sure the requests are 90 handled through the proxy. 91 92 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been 93 discontinued; :func:`urllib.request.urlopen` corresponds to the old 94 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary 95 parameter to ``urllib.urlopen``, can be obtained by using 96 :class:`ProxyHandler` objects. 97 98 .. versionchanged:: 3.2 99 *cafile* and *capath* were added. 100 101 .. versionchanged:: 3.2 102 HTTPS virtual hosts are now supported if possible (that is, if 103 :data:`ssl.HAS_SNI` is true). 104 105 .. versionadded:: 3.2 106 *data* can be an iterable object. 107 108 .. versionchanged:: 3.3 109 *cadefault* was added. 110 111 .. versionchanged:: 3.4.3 112 *context* was added. 113 114 .. deprecated:: 3.6 115 116 *cafile*, *capath* and *cadefault* are deprecated in favor of *context*. 117 Please use :meth:`ssl.SSLContext.load_cert_chain` instead, or let 118 :func:`ssl.create_default_context` select the system's trusted CA 119 certificates for you. 120 121.. function:: install_opener(opener) 122 123 Install an :class:`OpenerDirector` instance as the default global opener. 124 Installing an opener is only necessary if you want urlopen to use that 125 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of 126 :func:`~urllib.request.urlopen`. The code does not check for a real 127 :class:`OpenerDirector`, and any class with the appropriate interface will 128 work. 129 130 131.. function:: build_opener([handler, ...]) 132 133 Return an :class:`OpenerDirector` instance, which chains the handlers in the 134 order given. *handler*\s can be either instances of :class:`BaseHandler`, or 135 subclasses of :class:`BaseHandler` (in which case it must be possible to call 136 the constructor without any parameters). Instances of the following classes 137 will be in front of the *handler*\s, unless the *handler*\s contain them, 138 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy 139 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`, 140 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`, 141 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`. 142 143 If the Python installation has SSL support (i.e., if the :mod:`ssl` module 144 can be imported), :class:`HTTPSHandler` will also be added. 145 146 A :class:`BaseHandler` subclass may also change its :attr:`handler_order` 147 attribute to modify its position in the handlers list. 148 149 150.. function:: pathname2url(path) 151 152 Convert the pathname *path* from the local syntax for a path to the form used in 153 the path component of a URL. This does not produce a complete URL. The return 154 value will already be quoted using the :func:`~urllib.parse.quote` function. 155 156 157.. function:: url2pathname(path) 158 159 Convert the path component *path* from a percent-encoded URL to the local syntax for a 160 path. This does not accept a complete URL. This function uses 161 :func:`~urllib.parse.unquote` to decode *path*. 162 163.. function:: getproxies() 164 165 This helper function returns a dictionary of scheme to proxy server URL 166 mappings. It scans the environment for variables named ``<scheme>_proxy``, 167 in a case insensitive approach, for all operating systems first, and when it 168 cannot find it, looks for proxy information from Mac OSX System 169 Configuration for Mac OS X and Windows Systems Registry for Windows. 170 If both lowercase and uppercase environment variables exist (and disagree), 171 lowercase is preferred. 172 173 .. note:: 174 175 If the environment variable ``REQUEST_METHOD`` is set, which usually 176 indicates your script is running in a CGI environment, the environment 177 variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is 178 because that variable can be injected by a client using the "Proxy:" HTTP 179 header. If you need to use an HTTP proxy in a CGI environment, either use 180 ``ProxyHandler`` explicitly, or make sure the variable name is in 181 lowercase (or at least the ``_proxy`` suffix). 182 183 184The following classes are provided: 185 186.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) 187 188 This class is an abstraction of a URL request. 189 190 *url* should be a string containing a valid URL. 191 192 *data* must be an object specifying additional data to send to the 193 server, or ``None`` if no such data is needed. Currently HTTP 194 requests are the only ones that use *data*. The supported object 195 types include bytes, file-like objects, and iterables. If no 196 ``Content-Length`` nor ``Transfer-Encoding`` header field 197 has been provided, :class:`HTTPHandler` will set these headers according 198 to the type of *data*. ``Content-Length`` will be used to send 199 bytes objects, while ``Transfer-Encoding: chunked`` as specified in 200 :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables. 201 202 For an HTTP POST request method, *data* should be a buffer in the 203 standard :mimetype:`application/x-www-form-urlencoded` format. The 204 :func:`urllib.parse.urlencode` function takes a mapping or sequence 205 of 2-tuples and returns an ASCII string in this format. It should 206 be encoded to bytes before being used as the *data* parameter. 207 208 *headers* should be a dictionary, and will be treated as if 209 :meth:`add_header` was called with each key and value as arguments. 210 This is often used to "spoof" the ``User-Agent`` header value, which is 211 used by a browser to identify itself -- some HTTP servers only 212 allow requests coming from common browsers as opposed to scripts. 213 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 214 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while 215 :mod:`urllib`'s default user agent string is 216 ``"Python-urllib/2.6"`` (on Python 2.6). 217 218 An appropriate ``Content-Type`` header should be included if the *data* 219 argument is present. If this header has not been provided and *data* 220 is not None, ``Content-Type: application/x-www-form-urlencoded`` will 221 be added as a default. 222 223 The final two arguments are only of interest for correct handling 224 of third-party HTTP cookies: 225 226 *origin_req_host* should be the request-host of the origin 227 transaction, as defined by :rfc:`2965`. It defaults to 228 ``http.cookiejar.request_host(self)``. This is the host name or IP 229 address of the original request that was initiated by the user. 230 For example, if the request is for an image in an HTML document, 231 this should be the request-host of the request for the page 232 containing the image. 233 234 *unverifiable* should indicate whether the request is unverifiable, 235 as defined by :rfc:`2965`. It defaults to ``False``. An unverifiable 236 request is one whose URL the user did not have the option to 237 approve. For example, if the request is for an image in an HTML 238 document, and the user had no option to approve the automatic 239 fetching of the image, this should be true. 240 241 *method* should be a string that indicates the HTTP request method that 242 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the 243 :attr:`~Request.method` attribute and is used by :meth:`get_method()`. 244 The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. 245 Subclasses may indicate a different default method by setting the 246 :attr:`~Request.method` attribute in the class itself. 247 248 .. note:: 249 The request will not work as expected if the data object is unable 250 to deliver its content more than once (e.g. a file or an iterable 251 that can produce the content only once) and the request is retried 252 for HTTP redirects or authentication. The *data* is sent to the 253 HTTP server right away after the headers. There is no support for 254 a 100-continue expectation in the library. 255 256 .. versionchanged:: 3.3 257 :attr:`Request.method` argument is added to the Request class. 258 259 .. versionchanged:: 3.4 260 Default :attr:`Request.method` may be indicated at the class level. 261 262 .. versionchanged:: 3.6 263 Do not raise an error if the ``Content-Length`` has not been 264 provided and *data* is neither ``None`` nor a bytes object. 265 Fall back to use chunked transfer encoding instead. 266 267.. class:: OpenerDirector() 268 269 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained 270 together. It manages the chaining of handlers, and recovery from errors. 271 272 273.. class:: BaseHandler() 274 275 This is the base class for all registered handlers --- and handles only the 276 simple mechanics of registration. 277 278 279.. class:: HTTPDefaultErrorHandler() 280 281 A class which defines a default handler for HTTP error responses; all responses 282 are turned into :exc:`~urllib.error.HTTPError` exceptions. 283 284 285.. class:: HTTPRedirectHandler() 286 287 A class to handle redirections. 288 289 290.. class:: HTTPCookieProcessor(cookiejar=None) 291 292 A class to handle HTTP Cookies. 293 294 295.. class:: ProxyHandler(proxies=None) 296 297 Cause requests to go through a proxy. If *proxies* is given, it must be a 298 dictionary mapping protocol names to URLs of proxies. The default is to read 299 the list of proxies from the environment variables 300 ``<protocol>_proxy``. If no proxy environment variables are set, then 301 in a Windows environment proxy settings are obtained from the registry's 302 Internet Settings section, and in a Mac OS X environment proxy information 303 is retrieved from the OS X System Configuration Framework. 304 305 To disable autodetected proxy pass an empty dictionary. 306 307 The :envvar:`no_proxy` environment variable can be used to specify hosts 308 which shouldn't be reached via proxy; if set, it should be a comma-separated 309 list of hostname suffixes, optionally with ``:port`` appended, for example 310 ``cern.ch,ncsa.uiuc.edu,some.host:8080``. 311 312 .. note:: 313 314 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; 315 see the documentation on :func:`~urllib.request.getproxies`. 316 317 318.. class:: HTTPPasswordMgr() 319 320 Keep a database of ``(realm, uri) -> (user, password)`` mappings. 321 322 323.. class:: HTTPPasswordMgrWithDefaultRealm() 324 325 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of 326 ``None`` is considered a catch-all realm, which is searched if no other realm 327 fits. 328 329 330.. class:: HTTPPasswordMgrWithPriorAuth() 331 332 A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a 333 database of ``uri -> is_authenticated`` mappings. Can be used by a 334 BasicAuth handler to determine when to send authentication credentials 335 immediately instead of waiting for a ``401`` response first. 336 337 .. versionadded:: 3.5 338 339 340.. class:: AbstractBasicAuthHandler(password_mgr=None) 341 342 This is a mixin class that helps with HTTP authentication, both to the remote 343 host and to a proxy. *password_mgr*, if given, should be something that is 344 compatible with :class:`HTTPPasswordMgr`; refer to section 345 :ref:`http-password-mgr` for information on the interface that must be 346 supported. If *passwd_mgr* also provides ``is_authenticated`` and 347 ``update_authenticated`` methods (see 348 :ref:`http-password-mgr-with-prior-auth`), then the handler will use the 349 ``is_authenticated`` result for a given URI to determine whether or not to 350 send authentication credentials with the request. If ``is_authenticated`` 351 returns ``True`` for the URI, credentials are sent. If ``is_authenticated`` 352 is ``False``, credentials are not sent, and then if a ``401`` response is 353 received the request is re-sent with the authentication credentials. If 354 authentication succeeds, ``update_authenticated`` is called to set 355 ``is_authenticated`` ``True`` for the URI, so that subsequent requests to 356 the URI or any of its super-URIs will automatically include the 357 authentication credentials. 358 359 .. versionadded:: 3.5 360 Added ``is_authenticated`` support. 361 362 363.. class:: HTTPBasicAuthHandler(password_mgr=None) 364 365 Handle authentication with the remote host. *password_mgr*, if given, should 366 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 367 section :ref:`http-password-mgr` for information on the interface that must 368 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when 369 presented with a wrong Authentication scheme. 370 371 372.. class:: ProxyBasicAuthHandler(password_mgr=None) 373 374 Handle authentication with the proxy. *password_mgr*, if given, should be 375 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 376 :ref:`http-password-mgr` for information on the interface that must be 377 supported. 378 379 380.. class:: AbstractDigestAuthHandler(password_mgr=None) 381 382 This is a mixin class that helps with HTTP authentication, both to the remote 383 host and to a proxy. *password_mgr*, if given, should be something that is 384 compatible with :class:`HTTPPasswordMgr`; refer to section 385 :ref:`http-password-mgr` for information on the interface that must be 386 supported. 387 388 389.. class:: HTTPDigestAuthHandler(password_mgr=None) 390 391 Handle authentication with the remote host. *password_mgr*, if given, should 392 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 393 section :ref:`http-password-mgr` for information on the interface that must 394 be supported. When both Digest Authentication Handler and Basic 395 Authentication Handler are both added, Digest Authentication is always tried 396 first. If the Digest Authentication returns a 40x response again, it is sent 397 to Basic Authentication handler to Handle. This Handler method will raise a 398 :exc:`ValueError` when presented with an authentication scheme other than 399 Digest or Basic. 400 401 .. versionchanged:: 3.3 402 Raise :exc:`ValueError` on unsupported Authentication Scheme. 403 404 405 406.. class:: ProxyDigestAuthHandler(password_mgr=None) 407 408 Handle authentication with the proxy. *password_mgr*, if given, should be 409 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 410 :ref:`http-password-mgr` for information on the interface that must be 411 supported. 412 413 414.. class:: HTTPHandler() 415 416 A class to handle opening of HTTP URLs. 417 418 419.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None) 420 421 A class to handle opening of HTTPS URLs. *context* and *check_hostname* 422 have the same meaning as in :class:`http.client.HTTPSConnection`. 423 424 .. versionchanged:: 3.2 425 *context* and *check_hostname* were added. 426 427 428.. class:: FileHandler() 429 430 Open local files. 431 432.. class:: DataHandler() 433 434 Open data URLs. 435 436 .. versionadded:: 3.4 437 438.. class:: FTPHandler() 439 440 Open FTP URLs. 441 442 443.. class:: CacheFTPHandler() 444 445 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. 446 447 448.. class:: UnknownHandler() 449 450 A catch-all class to handle unknown URLs. 451 452 453.. class:: HTTPErrorProcessor() 454 455 Process HTTP error responses. 456 457 458.. _request-objects: 459 460Request Objects 461--------------- 462 463The following methods describe :class:`Request`'s public interface, 464and so all may be overridden in subclasses. It also defines several 465public attributes that can be used by clients to inspect the parsed 466request. 467 468.. attribute:: Request.full_url 469 470 The original URL passed to the constructor. 471 472 .. versionchanged:: 3.4 473 474 Request.full_url is a property with setter, getter and a deleter. Getting 475 :attr:`~Request.full_url` returns the original request URL with the 476 fragment, if it was present. 477 478.. attribute:: Request.type 479 480 The URI scheme. 481 482.. attribute:: Request.host 483 484 The URI authority, typically a host, but may also contain a port 485 separated by a colon. 486 487.. attribute:: Request.origin_req_host 488 489 The original host for the request, without port. 490 491.. attribute:: Request.selector 492 493 The URI path. If the :class:`Request` uses a proxy, then selector 494 will be the full URL that is passed to the proxy. 495 496.. attribute:: Request.data 497 498 The entity body for the request, or ``None`` if not specified. 499 500 .. versionchanged:: 3.4 501 Changing value of :attr:`Request.data` now deletes "Content-Length" 502 header if it was previously set or calculated. 503 504.. attribute:: Request.unverifiable 505 506 boolean, indicates whether the request is unverifiable as defined 507 by :rfc:`2965`. 508 509.. attribute:: Request.method 510 511 The HTTP request method to use. By default its value is :const:`None`, 512 which means that :meth:`~Request.get_method` will do its normal computation 513 of the method to be used. Its value can be set (thus overriding the default 514 computation in :meth:`~Request.get_method`) either by providing a default 515 value by setting it at the class level in a :class:`Request` subclass, or by 516 passing a value in to the :class:`Request` constructor via the *method* 517 argument. 518 519 .. versionadded:: 3.3 520 521 .. versionchanged:: 3.4 522 A default value can now be set in subclasses; previously it could only 523 be set via the constructor argument. 524 525 526.. method:: Request.get_method() 527 528 Return a string indicating the HTTP request method. If 529 :attr:`Request.method` is not ``None``, return its value, otherwise return 530 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not. 531 This is only meaningful for HTTP requests. 532 533 .. versionchanged:: 3.3 534 get_method now looks at the value of :attr:`Request.method`. 535 536 537.. method:: Request.add_header(key, val) 538 539 Add another header to the request. Headers are currently ignored by all 540 handlers except HTTP handlers, where they are added to the list of headers sent 541 to the server. Note that there cannot be more than one header with the same 542 name, and later calls will overwrite previous calls in case the *key* collides. 543 Currently, this is no loss of HTTP functionality, since all headers which have 544 meaning when used more than once have a (header-specific) way of gaining the 545 same functionality using only one header. 546 547 548.. method:: Request.add_unredirected_header(key, header) 549 550 Add a header that will not be added to a redirected request. 551 552 553.. method:: Request.has_header(header) 554 555 Return whether the instance has the named header (checks both regular and 556 unredirected). 557 558 559.. method:: Request.remove_header(header) 560 561 Remove named header from the request instance (both from regular and 562 unredirected headers). 563 564 .. versionadded:: 3.4 565 566 567.. method:: Request.get_full_url() 568 569 Return the URL given in the constructor. 570 571 .. versionchanged:: 3.4 572 573 Returns :attr:`Request.full_url` 574 575 576.. method:: Request.set_proxy(host, type) 577 578 Prepare the request by connecting to a proxy server. The *host* and *type* will 579 replace those of the instance, and the instance's selector will be the original 580 URL given in the constructor. 581 582 583.. method:: Request.get_header(header_name, default=None) 584 585 Return the value of the given header. If the header is not present, return 586 the default value. 587 588 589.. method:: Request.header_items() 590 591 Return a list of tuples (header_name, header_value) of the Request headers. 592 593.. versionchanged:: 3.4 594 The request methods add_data, has_data, get_data, get_type, get_host, 595 get_selector, get_origin_req_host and is_unverifiable that were deprecated 596 since 3.3 have been removed. 597 598 599.. _opener-director-objects: 600 601OpenerDirector Objects 602---------------------- 603 604:class:`OpenerDirector` instances have the following methods: 605 606 607.. method:: OpenerDirector.add_handler(handler) 608 609 *handler* should be an instance of :class:`BaseHandler`. The following methods 610 are searched, and added to the possible chains (note that HTTP errors are a 611 special case). 612 613 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol* 614 URLs. 615 616 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP 617 errors with HTTP error code *type*. 618 619 * :meth:`protocol_error` --- signal that the handler knows how to handle errors 620 from (non-\ ``http``) *protocol*. 621 622 * :meth:`protocol_request` --- signal that the handler knows how to pre-process 623 *protocol* requests. 624 625 * :meth:`protocol_response` --- signal that the handler knows how to 626 post-process *protocol* responses. 627 628 629.. method:: OpenerDirector.open(url, data=None[, timeout]) 630 631 Open the given *url* (which can be a request object or a string), optionally 632 passing the given *data*. Arguments, return values and exceptions raised are 633 the same as those of :func:`urlopen` (which simply calls the :meth:`open` 634 method on the currently installed global :class:`OpenerDirector`). The 635 optional *timeout* parameter specifies a timeout in seconds for blocking 636 operations like the connection attempt (if not specified, the global default 637 timeout setting will be used). The timeout feature actually works only for 638 HTTP, HTTPS and FTP connections). 639 640 641.. method:: OpenerDirector.error(proto, *args) 642 643 Handle an error of the given protocol. This will call the registered error 644 handlers for the given protocol with the given arguments (which are protocol 645 specific). The HTTP protocol is a special case which uses the HTTP response 646 code to determine the specific error handler; refer to the :meth:`http_error_\*` 647 methods of the handler classes. 648 649 Return values and exceptions raised are the same as those of :func:`urlopen`. 650 651OpenerDirector objects open URLs in three stages: 652 653The order in which these methods are called within each stage is determined by 654sorting the handler instances. 655 656#. Every handler with a method named like :meth:`protocol_request` has that 657 method called to pre-process the request. 658 659#. Handlers with a method named like :meth:`protocol_open` are called to handle 660 the request. This stage ends when a handler either returns a non-\ :const:`None` 661 value (ie. a response), or raises an exception (usually 662 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate. 663 664 In fact, the above algorithm is first tried for methods named 665 :meth:`default_open`. If all such methods return :const:`None`, the algorithm 666 is repeated for methods named like :meth:`protocol_open`. If all such methods 667 return :const:`None`, the algorithm is repeated for methods named 668 :meth:`unknown_open`. 669 670 Note that the implementation of these methods may involve calls of the parent 671 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and 672 :meth:`~OpenerDirector.error` methods. 673 674#. Every handler with a method named like :meth:`protocol_response` has that 675 method called to post-process the response. 676 677 678.. _base-handler-objects: 679 680BaseHandler Objects 681------------------- 682 683:class:`BaseHandler` objects provide a couple of methods that are directly 684useful, and others that are meant to be used by derived classes. These are 685intended for direct use: 686 687 688.. method:: BaseHandler.add_parent(director) 689 690 Add a director as parent. 691 692 693.. method:: BaseHandler.close() 694 695 Remove any parents. 696 697The following attribute and methods should only be used by classes derived from 698:class:`BaseHandler`. 699 700.. note:: 701 702 The convention has been adopted that subclasses defining 703 :meth:`protocol_request` or :meth:`protocol_response` methods are named 704 :class:`\*Processor`; all others are named :class:`\*Handler`. 705 706 707.. attribute:: BaseHandler.parent 708 709 A valid :class:`OpenerDirector`, which can be used to open using a different 710 protocol, or handle errors. 711 712 713.. method:: BaseHandler.default_open(req) 714 715 This method is *not* defined in :class:`BaseHandler`, but subclasses should 716 define it if they want to catch all URLs. 717 718 This method, if implemented, will be called by the parent 719 :class:`OpenerDirector`. It should return a file-like object as described in 720 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``. 721 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional 722 thing happens (for example, :exc:`MemoryError` should not be mapped to 723 :exc:`URLError`). 724 725 This method will be called before any protocol-specific open method. 726 727 728.. method:: BaseHandler.protocol_open(req) 729 :noindex: 730 731 This method is *not* defined in :class:`BaseHandler`, but subclasses should 732 define it if they want to handle URLs with the given protocol. 733 734 This method, if defined, will be called by the parent :class:`OpenerDirector`. 735 Return values should be the same as for :meth:`default_open`. 736 737 738.. method:: BaseHandler.unknown_open(req) 739 740 This method is *not* defined in :class:`BaseHandler`, but subclasses should 741 define it if they want to catch all URLs with no specific registered handler to 742 open it. 743 744 This method, if implemented, will be called by the :attr:`parent` 745 :class:`OpenerDirector`. Return values should be the same as for 746 :meth:`default_open`. 747 748 749.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) 750 751 This method is *not* defined in :class:`BaseHandler`, but subclasses should 752 override it if they intend to provide a catch-all for otherwise unhandled HTTP 753 errors. It will be called automatically by the :class:`OpenerDirector` getting 754 the error, and should not normally be called in other circumstances. 755 756 *req* will be a :class:`Request` object, *fp* will be a file-like object with 757 the HTTP error body, *code* will be the three-digit code of the error, *msg* 758 will be the user-visible explanation of the code and *hdrs* will be a mapping 759 object with the headers of the error. 760 761 Return values and exceptions raised should be the same as those of 762 :func:`urlopen`. 763 764 765.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs) 766 767 *nnn* should be a three-digit HTTP error code. This method is also not defined 768 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a 769 subclass, when an HTTP error with code *nnn* occurs. 770 771 Subclasses should override this method to handle specific HTTP errors. 772 773 Arguments, return values and exceptions raised should be the same as for 774 :meth:`http_error_default`. 775 776 777.. method:: BaseHandler.protocol_request(req) 778 :noindex: 779 780 This method is *not* defined in :class:`BaseHandler`, but subclasses should 781 define it if they want to pre-process requests of the given protocol. 782 783 This method, if defined, will be called by the parent :class:`OpenerDirector`. 784 *req* will be a :class:`Request` object. The return value should be a 785 :class:`Request` object. 786 787 788.. method:: BaseHandler.protocol_response(req, response) 789 :noindex: 790 791 This method is *not* defined in :class:`BaseHandler`, but subclasses should 792 define it if they want to post-process responses of the given protocol. 793 794 This method, if defined, will be called by the parent :class:`OpenerDirector`. 795 *req* will be a :class:`Request` object. *response* will be an object 796 implementing the same interface as the return value of :func:`urlopen`. The 797 return value should implement the same interface as the return value of 798 :func:`urlopen`. 799 800 801.. _http-redirect-handler: 802 803HTTPRedirectHandler Objects 804--------------------------- 805 806.. note:: 807 808 Some HTTP redirections require action from this module's client code. If this 809 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for 810 details of the precise meanings of the various redirection codes. 811 812 An :class:`HTTPError` exception raised as a security consideration if the 813 HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, 814 HTTPS or FTP URL. 815 816 817.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) 818 819 Return a :class:`Request` or ``None`` in response to a redirect. This is called 820 by the default implementations of the :meth:`http_error_30\*` methods when a 821 redirection is received from the server. If a redirection should take place, 822 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the 823 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if 824 no other handler should try to handle this URL, or return ``None`` if you 825 can't but another handler might. 826 827 .. note:: 828 829 The default implementation of this method does not strictly follow :rfc:`2616`, 830 which says that 301 and 302 responses to ``POST`` requests must not be 831 automatically redirected without confirmation by the user. In reality, browsers 832 do allow automatic redirection of these responses, changing the POST to a 833 ``GET``, and the default implementation reproduces this behavior. 834 835 836.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) 837 838 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the 839 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. 840 841 842.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) 843 844 The same as :meth:`http_error_301`, but called for the 'found' response. 845 846 847.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) 848 849 The same as :meth:`http_error_301`, but called for the 'see other' response. 850 851 852.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) 853 854 The same as :meth:`http_error_301`, but called for the 'temporary redirect' 855 response. 856 857 858.. _http-cookie-processor: 859 860HTTPCookieProcessor Objects 861--------------------------- 862 863:class:`HTTPCookieProcessor` instances have one attribute: 864 865.. attribute:: HTTPCookieProcessor.cookiejar 866 867 The :class:`http.cookiejar.CookieJar` in which cookies are stored. 868 869 870.. _proxy-handler: 871 872ProxyHandler Objects 873-------------------- 874 875 876.. method:: ProxyHandler.protocol_open(request) 877 :noindex: 878 879 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every 880 *protocol* which has a proxy in the *proxies* dictionary given in the 881 constructor. The method will modify requests to go through the proxy, by 882 calling ``request.set_proxy()``, and call the next handler in the chain to 883 actually execute the protocol. 884 885 886.. _http-password-mgr: 887 888HTTPPasswordMgr Objects 889----------------------- 890 891These methods are available on :class:`HTTPPasswordMgr` and 892:class:`HTTPPasswordMgrWithDefaultRealm` objects. 893 894 895.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) 896 897 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and 898 *passwd* must be strings. This causes ``(user, passwd)`` to be used as 899 authentication tokens when authentication for *realm* and a super-URI of any of 900 the given URIs is given. 901 902 903.. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 904 905 Get user/password for given realm and URI, if any. This method will return 906 ``(None, None)`` if there is no matching user/password. 907 908 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be 909 searched if the given *realm* has no matching user/password. 910 911 912.. _http-password-mgr-with-prior-auth: 913 914HTTPPasswordMgrWithPriorAuth Objects 915------------------------------------ 916 917This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support 918tracking URIs for which authentication credentials should always be sent. 919 920 921.. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \ 922 passwd, is_authenticated=False) 923 924 *realm*, *uri*, *user*, *passwd* are as for 925 :meth:`HTTPPasswordMgr.add_password`. *is_authenticated* sets the initial 926 value of the ``is_authenticated`` flag for the given URI or list of URIs. 927 If *is_authenticated* is specified as ``True``, *realm* is ignored. 928 929 930.. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 931 932 Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects 933 934 935.. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \ 936 is_authenticated=False) 937 938 Update the ``is_authenticated`` flag for the given *uri* or list 939 of URIs. 940 941 942.. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri) 943 944 Returns the current state of the ``is_authenticated`` flag for 945 the given URI. 946 947 948.. _abstract-basic-auth-handler: 949 950AbstractBasicAuthHandler Objects 951-------------------------------- 952 953 954.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 955 956 Handle an authentication request by getting a user/password pair, and re-trying 957 the request. *authreq* should be the name of the header where the information 958 about the realm is included in the request, *host* specifies the URL and path to 959 authenticate for, *req* should be the (failed) :class:`Request` object, and 960 *headers* should be the error headers. 961 962 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an 963 authority component (e.g. ``"http://python.org/"``). In either case, the 964 authority must not contain a userinfo component (so, ``"python.org"`` and 965 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not). 966 967 968.. _http-basic-auth-handler: 969 970HTTPBasicAuthHandler Objects 971---------------------------- 972 973 974.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) 975 976 Retry the request with authentication information, if available. 977 978 979.. _proxy-basic-auth-handler: 980 981ProxyBasicAuthHandler Objects 982----------------------------- 983 984 985.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) 986 987 Retry the request with authentication information, if available. 988 989 990.. _abstract-digest-auth-handler: 991 992AbstractDigestAuthHandler Objects 993--------------------------------- 994 995 996.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 997 998 *authreq* should be the name of the header where the information about the realm 999 is included in the request, *host* should be the host to authenticate to, *req* 1000 should be the (failed) :class:`Request` object, and *headers* should be the 1001 error headers. 1002 1003 1004.. _http-digest-auth-handler: 1005 1006HTTPDigestAuthHandler Objects 1007----------------------------- 1008 1009 1010.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1011 1012 Retry the request with authentication information, if available. 1013 1014 1015.. _proxy-digest-auth-handler: 1016 1017ProxyDigestAuthHandler Objects 1018------------------------------ 1019 1020 1021.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1022 1023 Retry the request with authentication information, if available. 1024 1025 1026.. _http-handler-objects: 1027 1028HTTPHandler Objects 1029------------------- 1030 1031 1032.. method:: HTTPHandler.http_open(req) 1033 1034 Send an HTTP request, which can be either GET or POST, depending on 1035 ``req.has_data()``. 1036 1037 1038.. _https-handler-objects: 1039 1040HTTPSHandler Objects 1041-------------------- 1042 1043 1044.. method:: HTTPSHandler.https_open(req) 1045 1046 Send an HTTPS request, which can be either GET or POST, depending on 1047 ``req.has_data()``. 1048 1049 1050.. _file-handler-objects: 1051 1052FileHandler Objects 1053------------------- 1054 1055 1056.. method:: FileHandler.file_open(req) 1057 1058 Open the file locally, if there is no host name, or the host name is 1059 ``'localhost'``. 1060 1061 .. versionchanged:: 3.2 1062 This method is applicable only for local hostnames. When a remote 1063 hostname is given, an :exc:`~urllib.error.URLError` is raised. 1064 1065 1066.. _data-handler-objects: 1067 1068DataHandler Objects 1069------------------- 1070 1071.. method:: DataHandler.data_open(req) 1072 1073 Read a data URL. This kind of URL contains the content encoded in the URL 1074 itself. The data URL syntax is specified in :rfc:`2397`. This implementation 1075 ignores white spaces in base64 encoded data URLs so the URL may be wrapped 1076 in whatever source file it comes from. But even though some browsers don't 1077 mind about a missing padding at the end of a base64 encoded data URL, this 1078 implementation will raise an :exc:`ValueError` in that case. 1079 1080 1081.. _ftp-handler-objects: 1082 1083FTPHandler Objects 1084------------------ 1085 1086 1087.. method:: FTPHandler.ftp_open(req) 1088 1089 Open the FTP file indicated by *req*. The login is always done with empty 1090 username and password. 1091 1092 1093.. _cacheftp-handler-objects: 1094 1095CacheFTPHandler Objects 1096----------------------- 1097 1098:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the 1099following additional methods: 1100 1101 1102.. method:: CacheFTPHandler.setTimeout(t) 1103 1104 Set timeout of connections to *t* seconds. 1105 1106 1107.. method:: CacheFTPHandler.setMaxConns(m) 1108 1109 Set maximum number of cached connections to *m*. 1110 1111 1112.. _unknown-handler-objects: 1113 1114UnknownHandler Objects 1115---------------------- 1116 1117 1118.. method:: UnknownHandler.unknown_open() 1119 1120 Raise a :exc:`~urllib.error.URLError` exception. 1121 1122 1123.. _http-error-processor-objects: 1124 1125HTTPErrorProcessor Objects 1126-------------------------- 1127 1128.. method:: HTTPErrorProcessor.http_response(request, response) 1129 1130 Process HTTP error responses. 1131 1132 For 200 error codes, the response object is returned immediately. 1133 1134 For non-200 error codes, this simply passes the job on to the 1135 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`. 1136 Eventually, :class:`HTTPDefaultErrorHandler` will raise an 1137 :exc:`~urllib.error.HTTPError` if no other handler handles the error. 1138 1139 1140.. method:: HTTPErrorProcessor.https_response(request, response) 1141 1142 Process HTTPS error responses. 1143 1144 The behavior is same as :meth:`http_response`. 1145 1146 1147.. _urllib-request-examples: 1148 1149Examples 1150-------- 1151 1152In addition to the examples below, more examples are given in 1153:ref:`urllib-howto`. 1154 1155This example gets the python.org main page and displays the first 300 bytes of 1156it. :: 1157 1158 >>> import urllib.request 1159 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1160 ... print(f.read(300)) 1161 ... 1162 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1163 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html 1164 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n 1165 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n 1166 <title>Python Programming ' 1167 1168Note that urlopen returns a bytes object. This is because there is no way 1169for urlopen to automatically determine the encoding of the byte stream 1170it receives from the HTTP server. In general, a program will decode 1171the returned bytes object to string once it determines or guesses 1172the appropriate encoding. 1173 1174The following W3C document, https://www.w3.org/International/O-charset\ , lists 1175the various ways in which an (X)HTML or an XML document could have specified its 1176encoding information. 1177 1178As the python.org website uses *utf-8* encoding as specified in its meta tag, we 1179will use the same for decoding the bytes object. :: 1180 1181 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1182 ... print(f.read(100).decode('utf-8')) 1183 ... 1184 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1185 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1186 1187It is also possible to achieve the same result without using the 1188:term:`context manager` approach. :: 1189 1190 >>> import urllib.request 1191 >>> f = urllib.request.urlopen('http://www.python.org/') 1192 >>> print(f.read(100).decode('utf-8')) 1193 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1194 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1195 1196In the following example, we are sending a data-stream to the stdin of a CGI 1197and reading the data it returns to us. Note that this example will only work 1198when the Python installation supports SSL. :: 1199 1200 >>> import urllib.request 1201 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', 1202 ... data=b'This data is passed to stdin of the CGI') 1203 >>> with urllib.request.urlopen(req) as f: 1204 ... print(f.read().decode('utf-8')) 1205 ... 1206 Got Data: "This data is passed to stdin of the CGI" 1207 1208The code for the sample CGI used in the above example is:: 1209 1210 #!/usr/bin/env python 1211 import sys 1212 data = sys.stdin.read() 1213 print('Content-type: text/plain\n\nGot Data: "%s"' % data) 1214 1215Here is an example of doing a ``PUT`` request using :class:`Request`:: 1216 1217 import urllib.request 1218 DATA = b'some data' 1219 req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT') 1220 with urllib.request.urlopen(req) as f: 1221 pass 1222 print(f.status) 1223 print(f.reason) 1224 1225Use of Basic HTTP Authentication:: 1226 1227 import urllib.request 1228 # Create an OpenerDirector with support for Basic HTTP Authentication... 1229 auth_handler = urllib.request.HTTPBasicAuthHandler() 1230 auth_handler.add_password(realm='PDQ Application', 1231 uri='https://mahler:8092/site-updates.py', 1232 user='klem', 1233 passwd='kadidd!ehopper') 1234 opener = urllib.request.build_opener(auth_handler) 1235 # ...and install it globally so it can be used with urlopen. 1236 urllib.request.install_opener(opener) 1237 urllib.request.urlopen('http://www.example.com/login.html') 1238 1239:func:`build_opener` provides many handlers by default, including a 1240:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment 1241variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme 1242involved. For example, the :envvar:`http_proxy` environment variable is read to 1243obtain the HTTP proxy's URL. 1244 1245This example replaces the default :class:`ProxyHandler` with one that uses 1246programmatically-supplied proxy URLs, and adds proxy authorization support with 1247:class:`ProxyBasicAuthHandler`. :: 1248 1249 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) 1250 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() 1251 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') 1252 1253 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) 1254 # This time, rather than install the OpenerDirector, we use it directly: 1255 opener.open('http://www.example.com/login.html') 1256 1257Adding HTTP headers: 1258 1259Use the *headers* argument to the :class:`Request` constructor, or:: 1260 1261 import urllib.request 1262 req = urllib.request.Request('http://www.example.com/') 1263 req.add_header('Referer', 'http://www.python.org/') 1264 # Customize the default User-Agent header value: 1265 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') 1266 r = urllib.request.urlopen(req) 1267 1268:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to 1269every :class:`Request`. To change this:: 1270 1271 import urllib.request 1272 opener = urllib.request.build_opener() 1273 opener.addheaders = [('User-agent', 'Mozilla/5.0')] 1274 opener.open('http://www.example.com/') 1275 1276Also, remember that a few standard headers (:mailheader:`Content-Length`, 1277:mailheader:`Content-Type` and :mailheader:`Host`) 1278are added when the :class:`Request` is passed to :func:`urlopen` (or 1279:meth:`OpenerDirector.open`). 1280 1281.. _urllib-examples: 1282 1283Here is an example session that uses the ``GET`` method to retrieve a URL 1284containing parameters:: 1285 1286 >>> import urllib.request 1287 >>> import urllib.parse 1288 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1289 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params 1290 >>> with urllib.request.urlopen(url) as f: 1291 ... print(f.read().decode('utf-8')) 1292 ... 1293 1294The following example uses the ``POST`` method instead. Note that params output 1295from urlencode is encoded to bytes before it is sent to urlopen as data:: 1296 1297 >>> import urllib.request 1298 >>> import urllib.parse 1299 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1300 >>> data = data.encode('ascii') 1301 >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: 1302 ... print(f.read().decode('utf-8')) 1303 ... 1304 1305The following example uses an explicitly specified HTTP proxy, overriding 1306environment settings:: 1307 1308 >>> import urllib.request 1309 >>> proxies = {'http': 'http://proxy.example.com:8080/'} 1310 >>> opener = urllib.request.FancyURLopener(proxies) 1311 >>> with opener.open("http://www.python.org") as f: 1312 ... f.read().decode('utf-8') 1313 ... 1314 1315The following example uses no proxies at all, overriding environment settings:: 1316 1317 >>> import urllib.request 1318 >>> opener = urllib.request.FancyURLopener({}) 1319 >>> with opener.open("http://www.python.org/") as f: 1320 ... f.read().decode('utf-8') 1321 ... 1322 1323 1324Legacy interface 1325---------------- 1326 1327The following functions and classes are ported from the Python 2 module 1328``urllib`` (as opposed to ``urllib2``). They might become deprecated at 1329some point in the future. 1330 1331.. function:: urlretrieve(url, filename=None, reporthook=None, data=None) 1332 1333 Copy a network object denoted by a URL to a local file. If the URL 1334 points to a local file, the object will not be copied unless filename is supplied. 1335 Return a tuple ``(filename, headers)`` where *filename* is the 1336 local file name under which the object can be found, and *headers* is whatever 1337 the :meth:`info` method of the object returned by :func:`urlopen` returned (for 1338 a remote object). Exceptions are the same as for :func:`urlopen`. 1339 1340 The second argument, if present, specifies the file location to copy to (if 1341 absent, the location will be a tempfile with a generated name). The third 1342 argument, if present, is a callable that will be called once on 1343 establishment of the network connection and once after each block read 1344 thereafter. The callable will be passed three arguments; a count of blocks 1345 transferred so far, a block size in bytes, and the total size of the file. The 1346 third argument may be ``-1`` on older FTP servers which do not return a file 1347 size in response to a retrieval request. 1348 1349 The following example illustrates the most common usage scenario:: 1350 1351 >>> import urllib.request 1352 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') 1353 >>> html = open(local_filename) 1354 >>> html.close() 1355 1356 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1357 argument may be given to specify a ``POST`` request (normally the request 1358 type is ``GET``). The *data* argument must be a bytes object in standard 1359 :mimetype:`application/x-www-form-urlencoded` format; see the 1360 :func:`urllib.parse.urlencode` function. 1361 1362 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that 1363 the amount of data available was less than the expected amount (which is the 1364 size reported by a *Content-Length* header). This can occur, for example, when 1365 the download is interrupted. 1366 1367 The *Content-Length* is treated as a lower bound: if there's more data to read, 1368 urlretrieve reads more data, but if less data is available, it raises the 1369 exception. 1370 1371 You can still retrieve the downloaded data in this case, it is stored in the 1372 :attr:`content` attribute of the exception instance. 1373 1374 If no *Content-Length* header was supplied, urlretrieve can not check the size 1375 of the data it has downloaded, and just returns it. In this case you just have 1376 to assume that the download was successful. 1377 1378.. function:: urlcleanup() 1379 1380 Cleans up temporary files that may have been left behind by previous 1381 calls to :func:`urlretrieve`. 1382 1383.. class:: URLopener(proxies=None, **x509) 1384 1385 .. deprecated:: 3.3 1386 1387 Base class for opening and reading URLs. Unless you need to support opening 1388 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`, 1389 you probably want to use :class:`FancyURLopener`. 1390 1391 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header 1392 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number. 1393 Applications can define their own :mailheader:`User-Agent` header by subclassing 1394 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute 1395 :attr:`version` to an appropriate string value in the subclass definition. 1396 1397 The optional *proxies* parameter should be a dictionary mapping scheme names to 1398 proxy URLs, where an empty dictionary turns proxies off completely. Its default 1399 value is ``None``, in which case environmental proxy settings will be used if 1400 present, as discussed in the definition of :func:`urlopen`, above. 1401 1402 Additional keyword parameters, collected in *x509*, may be used for 1403 authentication of the client when using the :file:`https:` scheme. The keywords 1404 *key_file* and *cert_file* are supported to provide an SSL key and certificate; 1405 both are needed to support client authentication. 1406 1407 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server 1408 returns an error code. 1409 1410 .. method:: open(fullurl, data=None) 1411 1412 Open *fullurl* using the appropriate protocol. This method sets up cache and 1413 proxy information, then calls the appropriate open method with its input 1414 arguments. If the scheme is not recognized, :meth:`open_unknown` is called. 1415 The *data* argument has the same meaning as the *data* argument of 1416 :func:`urlopen`. 1417 1418 1419 .. method:: open_unknown(fullurl, data=None) 1420 1421 Overridable interface to open unknown URL types. 1422 1423 1424 .. method:: retrieve(url, filename=None, reporthook=None, data=None) 1425 1426 Retrieves the contents of *url* and places it in *filename*. The return value 1427 is a tuple consisting of a local filename and either an 1428 :class:`email.message.Message` object containing the response headers (for remote 1429 URLs) or ``None`` (for local URLs). The caller must then open and read the 1430 contents of *filename*. If *filename* is not given and the URL refers to a 1431 local file, the input filename is returned. If the URL is non-local and 1432 *filename* is not given, the filename is the output of :func:`tempfile.mktemp` 1433 with a suffix that matches the suffix of the last path component of the input 1434 URL. If *reporthook* is given, it must be a function accepting three numeric 1435 parameters: A chunk number, the maximum size chunks are read in and the total size of the download 1436 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the 1437 network. *reporthook* is ignored for local URLs. 1438 1439 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1440 argument may be given to specify a ``POST`` request (normally the request type 1441 is ``GET``). The *data* argument must in standard 1442 :mimetype:`application/x-www-form-urlencoded` format; see the 1443 :func:`urllib.parse.urlencode` function. 1444 1445 1446 .. attribute:: version 1447 1448 Variable that specifies the user agent of the opener object. To get 1449 :mod:`urllib` to tell servers that it is a particular user agent, set this in a 1450 subclass as a class variable or in the constructor before calling the base 1451 constructor. 1452 1453 1454.. class:: FancyURLopener(...) 1455 1456 .. deprecated:: 3.3 1457 1458 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling 1459 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x 1460 response codes listed above, the :mailheader:`Location` header is used to fetch 1461 the actual URL. For 401 response codes (authentication required), basic HTTP 1462 authentication is performed. For the 30x response codes, recursion is bounded 1463 by the value of the *maxtries* attribute, which defaults to 10. 1464 1465 For all other response codes, the method :meth:`http_error_default` is called 1466 which you can override in subclasses to handle the error appropriately. 1467 1468 .. note:: 1469 1470 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests 1471 must not be automatically redirected without confirmation by the user. In 1472 reality, browsers do allow automatic redirection of these responses, changing 1473 the POST to a GET, and :mod:`urllib` reproduces this behaviour. 1474 1475 The parameters to the constructor are the same as those for :class:`URLopener`. 1476 1477 .. note:: 1478 1479 When performing basic authentication, a :class:`FancyURLopener` instance calls 1480 its :meth:`prompt_user_passwd` method. The default implementation asks the 1481 users for the required information on the controlling terminal. A subclass may 1482 override this method to support more appropriate behavior if needed. 1483 1484 The :class:`FancyURLopener` class offers one additional method that should be 1485 overloaded to provide the appropriate behavior: 1486 1487 .. method:: prompt_user_passwd(host, realm) 1488 1489 Return information needed to authenticate the user at the given host in the 1490 specified security realm. The return value should be a tuple, ``(user, 1491 password)``, which can be used for basic authentication. 1492 1493 The implementation prompts for this information on the terminal; an application 1494 should override this method to use an appropriate interaction model in the local 1495 environment. 1496 1497 1498:mod:`urllib.request` Restrictions 1499---------------------------------- 1500 1501 .. index:: 1502 pair: HTTP; protocol 1503 pair: FTP; protocol 1504 1505* Currently, only the following protocols are supported: HTTP (versions 0.9 and 1506 1.0), FTP, local files, and data URLs. 1507 1508 .. versionchanged:: 3.4 Added support for data URLs. 1509 1510* The caching feature of :func:`urlretrieve` has been disabled until someone 1511 finds the time to hack proper processing of Expiration time headers. 1512 1513* There should be a function to query whether a particular URL is in the cache. 1514 1515* For backward compatibility, if a URL appears to point to a local file but the 1516 file can't be opened, the URL is re-interpreted using the FTP protocol. This 1517 can sometimes cause confusing error messages. 1518 1519* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily 1520 long delays while waiting for a network connection to be set up. This means 1521 that it is difficult to build an interactive Web client using these functions 1522 without using threads. 1523 1524 .. index:: 1525 single: HTML 1526 pair: HTTP; protocol 1527 1528* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data 1529 returned by the server. This may be binary data (such as an image), plain text 1530 or (for example) HTML. The HTTP protocol provides type information in the reply 1531 header, which can be inspected by looking at the :mailheader:`Content-Type` 1532 header. If the returned data is HTML, you can use the module 1533 :mod:`html.parser` to parse it. 1534 1535 .. index:: single: FTP 1536 1537* The code handling the FTP protocol cannot differentiate between a file and a 1538 directory. This can lead to unexpected behavior when attempting to read a URL 1539 that points to a file that is not accessible. If the URL ends in a ``/``, it is 1540 assumed to refer to a directory and will be handled accordingly. But if an 1541 attempt to read a file leads to a 550 error (meaning the URL cannot be found or 1542 is not accessible, often for permission reasons), then the path is treated as a 1543 directory in order to handle the case when a directory is specified by a URL but 1544 the trailing ``/`` has been left off. This can cause misleading results when 1545 you try to fetch a file whose read permissions make it inaccessible; the FTP 1546 code will try to read it, fail with a 550 error, and then perform a directory 1547 listing for the unreadable file. If fine-grained control is needed, consider 1548 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing 1549 *_urlopener* to meet your needs. 1550 1551 1552 1553:mod:`urllib.response` --- Response classes used by urllib 1554========================================================== 1555 1556.. module:: urllib.response 1557 :synopsis: Response classes used by urllib. 1558 1559The :mod:`urllib.response` module defines functions and classes which define a 1560minimal file like interface, including ``read()`` and ``readline()``. The 1561typical response object is an addinfourl instance, which defines an ``info()`` 1562method and that returns headers and a ``geturl()`` method that returns the url. 1563Functions defined by this module are used internally by the 1564:mod:`urllib.request` module. 1565 1566