1:mod:`tarfile` --- Read and write tar archive files 2=================================================== 3 4.. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de> 8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de> 9 10**Source code:** :source:`Lib/tarfile.py` 11 12-------------- 13 14The :mod:`tarfile` module makes it possible to read and write tar 15archives, including those using gzip, bz2 and lzma compression. 16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 17higher-level functions in :ref:`shutil <archiving-operations>`. 18 19Some facts and figures: 20 21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives 22 if the respective modules are available. 23 24* read/write support for the POSIX.1-1988 (ustar) format. 25 26* read/write support for the GNU tar format including *longname* and *longlink* 27 extensions, read-only support for all variants of the *sparse* extension 28 including restoration of sparse files. 29 30* read/write support for the POSIX.1-2001 (pax) format. 31 32* handles directories, regular files, hardlinks, symbolic links, fifos, 33 character devices and block devices and is able to acquire and restore file 34 information like timestamp, access permissions and owner. 35 36.. versionchanged:: 3.3 37 Added support for :mod:`lzma` compression. 38 39 40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs) 41 42 Return a :class:`TarFile` object for the pathname *name*. For detailed 43 information on :class:`TarFile` objects and the keyword arguments that are 44 allowed, see :ref:`tarfile-objects`. 45 46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 47 to ``'r'``. Here is a full list of mode combinations: 48 49 +------------------+---------------------------------------------+ 50 | mode | action | 51 +==================+=============================================+ 52 | ``'r' or 'r:*'`` | Open for reading with transparent | 53 | | compression (recommended). | 54 +------------------+---------------------------------------------+ 55 | ``'r:'`` | Open for reading exclusively without | 56 | | compression. | 57 +------------------+---------------------------------------------+ 58 | ``'r:gz'`` | Open for reading with gzip compression. | 59 +------------------+---------------------------------------------+ 60 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 61 +------------------+---------------------------------------------+ 62 | ``'r:xz'`` | Open for reading with lzma compression. | 63 +------------------+---------------------------------------------+ 64 | ``'x'`` or | Create a tarfile exclusively without | 65 | ``'x:'`` | compression. | 66 | | Raise an :exc:`FileExistsError` exception | 67 | | if it already exists. | 68 +------------------+---------------------------------------------+ 69 | ``'x:gz'`` | Create a tarfile with gzip compression. | 70 | | Raise an :exc:`FileExistsError` exception | 71 | | if it already exists. | 72 +------------------+---------------------------------------------+ 73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. | 74 | | Raise an :exc:`FileExistsError` exception | 75 | | if it already exists. | 76 +------------------+---------------------------------------------+ 77 | ``'x:xz'`` | Create a tarfile with lzma compression. | 78 | | Raise an :exc:`FileExistsError` exception | 79 | | if it already exists. | 80 +------------------+---------------------------------------------+ 81 | ``'a' or 'a:'`` | Open for appending with no compression. The | 82 | | file is created if it does not exist. | 83 +------------------+---------------------------------------------+ 84 | ``'w' or 'w:'`` | Open for uncompressed writing. | 85 +------------------+---------------------------------------------+ 86 | ``'w:gz'`` | Open for gzip compressed writing. | 87 +------------------+---------------------------------------------+ 88 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 89 +------------------+---------------------------------------------+ 90 | ``'w:xz'`` | Open for lzma compressed writing. | 91 +------------------+---------------------------------------------+ 92 93 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode* 94 is not suitable to open a certain (compressed) file for reading, 95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a 96 compression method is not supported, :exc:`CompressionError` is raised. 97 98 If *fileobj* is specified, it is used as an alternative to a :term:`file object` 99 opened in binary mode for *name*. It is supposed to be at position 0. 100 101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``, 102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument 103 *compresslevel* (default ``9``) to specify the compression level of the file. 104 105 For special purposes, there is a second format for *mode*: 106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 107 object that processes its data as a stream of blocks. No random seeking will 108 be done on the file. If given, *fileobj* may be any object that has a 109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape 112 device. However, such a :class:`TarFile` object is limited in that it does 113 not allow random access, see :ref:`tar-examples`. The currently 114 possible modes: 115 116 +-------------+--------------------------------------------+ 117 | Mode | Action | 118 +=============+============================================+ 119 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 120 | | with transparent compression. | 121 +-------------+--------------------------------------------+ 122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 123 | | for reading. | 124 +-------------+--------------------------------------------+ 125 | ``'r|gz'`` | Open a gzip compressed *stream* for | 126 | | reading. | 127 +-------------+--------------------------------------------+ 128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 129 | | reading. | 130 +-------------+--------------------------------------------+ 131 | ``'r|xz'`` | Open an lzma compressed *stream* for | 132 | | reading. | 133 +-------------+--------------------------------------------+ 134 | ``'w|'`` | Open an uncompressed *stream* for writing. | 135 +-------------+--------------------------------------------+ 136 | ``'w|gz'`` | Open a gzip compressed *stream* for | 137 | | writing. | 138 +-------------+--------------------------------------------+ 139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 140 | | writing. | 141 +-------------+--------------------------------------------+ 142 | ``'w|xz'`` | Open an lzma compressed *stream* for | 143 | | writing. | 144 +-------------+--------------------------------------------+ 145 146 .. versionchanged:: 3.5 147 The ``'x'`` (exclusive creation) mode was added. 148 149 .. versionchanged:: 3.6 150 The *name* parameter accepts a :term:`path-like object`. 151 152 153.. class:: TarFile 154 155 Class for reading and writing tar archives. Do not use this class directly: 156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 157 158 159.. function:: is_tarfile(name) 160 161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 162 module can read. 163 164 165The :mod:`tarfile` module defines the following exceptions: 166 167 168.. exception:: TarError 169 170 Base class for all :mod:`tarfile` exceptions. 171 172 173.. exception:: ReadError 174 175 Is raised when a tar archive is opened, that either cannot be handled by the 176 :mod:`tarfile` module or is somehow invalid. 177 178 179.. exception:: CompressionError 180 181 Is raised when a compression method is not supported or when the data cannot be 182 decoded properly. 183 184 185.. exception:: StreamError 186 187 Is raised for the limitations that are typical for stream-like :class:`TarFile` 188 objects. 189 190 191.. exception:: ExtractError 192 193 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 194 :attr:`TarFile.errorlevel`\ ``== 2``. 195 196 197.. exception:: HeaderError 198 199 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 200 201 202The following constants are available at the module level: 203 204.. data:: ENCODING 205 206 The default character encoding: ``'utf-8'`` on Windows, the value returned by 207 :func:`sys.getfilesystemencoding` otherwise. 208 209 210Each of the following constants defines a tar archive format that the 211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 212details. 213 214 215.. data:: USTAR_FORMAT 216 217 POSIX.1-1988 (ustar) format. 218 219 220.. data:: GNU_FORMAT 221 222 GNU tar format. 223 224 225.. data:: PAX_FORMAT 226 227 POSIX.1-2001 (pax) format. 228 229 230.. data:: DEFAULT_FORMAT 231 232 The default format for creating archives. This is currently :const:`PAX_FORMAT`. 233 234 .. versionchanged:: 3.8 235 The default format for new archives was changed to 236 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`. 237 238 239.. seealso:: 240 241 Module :mod:`zipfile` 242 Documentation of the :mod:`zipfile` standard module. 243 244 :ref:`archiving-operations` 245 Documentation of the higher-level archiving facilities provided by the 246 standard :mod:`shutil` module. 247 248 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 249 Documentation for tar archive files, including GNU tar extensions. 250 251 252.. _tarfile-objects: 253 254TarFile Objects 255--------------- 256 257The :class:`TarFile` object provides an interface to a tar archive. A tar 258archive is a sequence of blocks. An archive member (a stored file) is made up of 259a header block followed by data blocks. It is possible to store a file in a tar 260archive several times. Each archive member is represented by a :class:`TarInfo` 261object, see :ref:`tarinfo-objects` for details. 262 263A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 264statement. It will automatically be closed when the block is completed. Please 265note that in the event of an exception an archive opened for writing will not 266be finalized; only the internally used file object will be closed. See the 267:ref:`tar-examples` section for a use case. 268 269.. versionadded:: 3.2 270 Added support for the context management protocol. 271 272.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0) 273 274 All following arguments are optional and can be accessed as instance attributes 275 as well. 276 277 *name* is the pathname of the archive. *name* may be a :term:`path-like object`. 278 It can be omitted if *fileobj* is given. 279 In this case, the file object's :attr:`name` attribute is used if it exists. 280 281 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 282 data to an existing file, ``'w'`` to create a new file overwriting an existing 283 one, or ``'x'`` to create a new file only if it does not already exist. 284 285 If *fileobj* is given, it is used for reading or writing data. If it can be 286 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 287 from position 0. 288 289 .. note:: 290 291 *fileobj* is not closed, when :class:`TarFile` is closed. 292 293 *format* controls the archive format for writing. It must be one of the constants 294 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 295 defined at module level. When reading, format will be automatically detected, even 296 if different formats are present in a single archive. 297 298 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 299 with a different one. 300 301 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 302 is :const:`True`, add the content of the target files to the archive. This has no 303 effect on systems that do not support symbolic links. 304 305 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 306 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 307 as possible. This is only useful for reading concatenated or damaged archives. 308 309 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 310 messages). The messages are written to ``sys.stderr``. 311 312 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. 313 Nevertheless, they appear as error messages in the debug output, when debugging 314 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` 315 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError` 316 exceptions as well. 317 318 The *encoding* and *errors* arguments define the character encoding to be 319 used for reading or writing the archive and how conversion errors are going 320 to be handled. The default settings will work for most users. 321 See section :ref:`tar-unicode` for in-depth information. 322 323 The *pax_headers* argument is an optional dictionary of strings which 324 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 325 326 .. versionchanged:: 3.2 327 Use ``'surrogateescape'`` as the default for the *errors* argument. 328 329 .. versionchanged:: 3.5 330 The ``'x'`` (exclusive creation) mode was added. 331 332 .. versionchanged:: 3.6 333 The *name* parameter accepts a :term:`path-like object`. 334 335 336.. classmethod:: TarFile.open(...) 337 338 Alternative constructor. The :func:`tarfile.open` function is actually a 339 shortcut to this classmethod. 340 341 342.. method:: TarFile.getmember(name) 343 344 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 345 in the archive, :exc:`KeyError` is raised. 346 347 .. note:: 348 349 If a member occurs more than once in the archive, its last occurrence is assumed 350 to be the most up-to-date version. 351 352 353.. method:: TarFile.getmembers() 354 355 Return the members of the archive as a list of :class:`TarInfo` objects. The 356 list has the same order as the members in the archive. 357 358 359.. method:: TarFile.getnames() 360 361 Return the members as a list of their names. It has the same order as the list 362 returned by :meth:`getmembers`. 363 364 365.. method:: TarFile.list(verbose=True, *, members=None) 366 367 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 368 only the names of the members are printed. If it is :const:`True`, output 369 similar to that of :program:`ls -l` is produced. If optional *members* is 370 given, it must be a subset of the list returned by :meth:`getmembers`. 371 372 .. versionchanged:: 3.5 373 Added the *members* parameter. 374 375 376.. method:: TarFile.next() 377 378 Return the next member of the archive as a :class:`TarInfo` object, when 379 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 380 available. 381 382 383.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False) 384 385 Extract all members from the archive to the current working directory or 386 directory *path*. If optional *members* is given, it must be a subset of the 387 list returned by :meth:`getmembers`. Directory information like owner, 388 modification time and permissions are set after all members have been extracted. 389 This is done to work around two problems: A directory's modification time is 390 reset each time a file is created in it. And, if a directory's permissions do 391 not allow writing, extracting files to it will fail. 392 393 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 394 are used to set the owner/group for the extracted files. Otherwise, the named 395 values from the tarfile are used. 396 397 .. warning:: 398 399 Never extract archives from untrusted sources without prior inspection. 400 It is possible that files are created outside of *path*, e.g. members 401 that have absolute filenames starting with ``"/"`` or filenames with two 402 dots ``".."``. 403 404 .. versionchanged:: 3.5 405 Added the *numeric_owner* parameter. 406 407 .. versionchanged:: 3.6 408 The *path* parameter accepts a :term:`path-like object`. 409 410 411.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False) 412 413 Extract a member from the archive to the current working directory, using its 414 full name. Its file information is extracted as accurately as possible. *member* 415 may be a filename or a :class:`TarInfo` object. You can specify a different 416 directory using *path*. *path* may be a :term:`path-like object`. 417 File attributes (owner, mtime, mode) are set unless *set_attrs* is false. 418 419 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 420 are used to set the owner/group for the extracted files. Otherwise, the named 421 values from the tarfile are used. 422 423 .. note:: 424 425 The :meth:`extract` method does not take care of several extraction issues. 426 In most cases you should consider using the :meth:`extractall` method. 427 428 .. warning:: 429 430 See the warning for :meth:`extractall`. 431 432 .. versionchanged:: 3.2 433 Added the *set_attrs* parameter. 434 435 .. versionchanged:: 3.5 436 Added the *numeric_owner* parameter. 437 438 .. versionchanged:: 3.6 439 The *path* parameter accepts a :term:`path-like object`. 440 441 442.. method:: TarFile.extractfile(member) 443 444 Extract a member from the archive as a file object. *member* may be a filename 445 or a :class:`TarInfo` object. If *member* is a regular file or a link, an 446 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is 447 returned. 448 449 .. versionchanged:: 3.3 450 Return an :class:`io.BufferedReader` object. 451 452 453.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None) 454 455 Add the file *name* to the archive. *name* may be any type of file 456 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an 457 alternative name for the file in the archive. Directories are added 458 recursively by default. This can be avoided by setting *recursive* to 459 :const:`False`. Recursion adds entries in sorted order. 460 If *filter* is given, it 461 should be a function that takes a :class:`TarInfo` object argument and 462 returns the changed :class:`TarInfo` object. If it instead returns 463 :const:`None` the :class:`TarInfo` object will be excluded from the 464 archive. See :ref:`tar-examples` for an example. 465 466 .. versionchanged:: 3.2 467 Added the *filter* parameter. 468 469 .. versionchanged:: 3.7 470 Recursion adds entries in sorted order. 471 472 473.. method:: TarFile.addfile(tarinfo, fileobj=None) 474 475 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 476 it should be a :term:`binary file`, and 477 ``tarinfo.size`` bytes are read from it and added to the archive. You can 478 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 479 480 481.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 482 483 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 484 equivalent on an existing file. The file is either named by *name*, or 485 specified as a :term:`file object` *fileobj* with a file descriptor. 486 *name* may be a :term:`path-like object`. If 487 given, *arcname* specifies an alternative name for the file in the 488 archive, otherwise, the name is taken from *fileobj*’s 489 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name 490 should be a text string. 491 492 You can modify 493 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`. 494 If the file object is not an ordinary file object positioned at the 495 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 496 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 497 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 498 could be a dummy string. 499 500 .. versionchanged:: 3.6 501 The *name* parameter accepts a :term:`path-like object`. 502 503 504.. method:: TarFile.close() 505 506 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 507 appended to the archive. 508 509 510.. attribute:: TarFile.pax_headers 511 512 A dictionary containing key-value pairs of pax global headers. 513 514 515 516.. _tarinfo-objects: 517 518TarInfo Objects 519--------------- 520 521A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 522from storing all required attributes of a file (like file type, size, time, 523permissions, owner etc.), it provides some useful methods to determine its type. 524It does *not* contain the file's data itself. 525 526:class:`TarInfo` objects are returned by :class:`TarFile`'s methods 527:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. 528 529 530.. class:: TarInfo(name="") 531 532 Create a :class:`TarInfo` object. 533 534 535.. classmethod:: TarInfo.frombuf(buf, encoding, errors) 536 537 Create and return a :class:`TarInfo` object from string buffer *buf*. 538 539 Raises :exc:`HeaderError` if the buffer is invalid. 540 541 542.. classmethod:: TarInfo.fromtarfile(tarfile) 543 544 Read the next member from the :class:`TarFile` object *tarfile* and return it as 545 a :class:`TarInfo` object. 546 547 548.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') 549 550 Create a string buffer from a :class:`TarInfo` object. For information on the 551 arguments see the constructor of the :class:`TarFile` class. 552 553 .. versionchanged:: 3.2 554 Use ``'surrogateescape'`` as the default for the *errors* argument. 555 556 557A ``TarInfo`` object has the following public data attributes: 558 559 560.. attribute:: TarInfo.name 561 562 Name of the archive member. 563 564 565.. attribute:: TarInfo.size 566 567 Size in bytes. 568 569 570.. attribute:: TarInfo.mtime 571 572 Time of last modification. 573 574 575.. attribute:: TarInfo.mode 576 577 Permission bits. 578 579 580.. attribute:: TarInfo.type 581 582 File type. *type* is usually one of these constants: :const:`REGTYPE`, 583 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 584 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 585 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 586 more conveniently, use the ``is*()`` methods below. 587 588 589.. attribute:: TarInfo.linkname 590 591 Name of the target file name, which is only present in :class:`TarInfo` objects 592 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 593 594 595.. attribute:: TarInfo.uid 596 597 User ID of the user who originally stored this member. 598 599 600.. attribute:: TarInfo.gid 601 602 Group ID of the user who originally stored this member. 603 604 605.. attribute:: TarInfo.uname 606 607 User name. 608 609 610.. attribute:: TarInfo.gname 611 612 Group name. 613 614 615.. attribute:: TarInfo.pax_headers 616 617 A dictionary containing key-value pairs of an associated pax extended header. 618 619 620A :class:`TarInfo` object also provides some convenient query methods: 621 622 623.. method:: TarInfo.isfile() 624 625 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 626 627 628.. method:: TarInfo.isreg() 629 630 Same as :meth:`isfile`. 631 632 633.. method:: TarInfo.isdir() 634 635 Return :const:`True` if it is a directory. 636 637 638.. method:: TarInfo.issym() 639 640 Return :const:`True` if it is a symbolic link. 641 642 643.. method:: TarInfo.islnk() 644 645 Return :const:`True` if it is a hard link. 646 647 648.. method:: TarInfo.ischr() 649 650 Return :const:`True` if it is a character device. 651 652 653.. method:: TarInfo.isblk() 654 655 Return :const:`True` if it is a block device. 656 657 658.. method:: TarInfo.isfifo() 659 660 Return :const:`True` if it is a FIFO. 661 662 663.. method:: TarInfo.isdev() 664 665 Return :const:`True` if it is one of character device, block device or FIFO. 666 667 668.. _tarfile-commandline: 669.. program:: tarfile 670 671Command-Line Interface 672---------------------- 673 674.. versionadded:: 3.4 675 676The :mod:`tarfile` module provides a simple command-line interface to interact 677with tar archives. 678 679If you want to create a new tar archive, specify its name after the :option:`-c` 680option and then list the filename(s) that should be included: 681 682.. code-block:: shell-session 683 684 $ python -m tarfile -c monty.tar spam.txt eggs.txt 685 686Passing a directory is also acceptable: 687 688.. code-block:: shell-session 689 690 $ python -m tarfile -c monty.tar life-of-brian_1979/ 691 692If you want to extract a tar archive into the current directory, use 693the :option:`-e` option: 694 695.. code-block:: shell-session 696 697 $ python -m tarfile -e monty.tar 698 699You can also extract a tar archive into a different directory by passing the 700directory's name: 701 702.. code-block:: shell-session 703 704 $ python -m tarfile -e monty.tar other-dir/ 705 706For a list of the files in a tar archive, use the :option:`-l` option: 707 708.. code-block:: shell-session 709 710 $ python -m tarfile -l monty.tar 711 712 713Command-line options 714~~~~~~~~~~~~~~~~~~~~ 715 716.. cmdoption:: -l <tarfile> 717 --list <tarfile> 718 719 List files in a tarfile. 720 721.. cmdoption:: -c <tarfile> <source1> ... <sourceN> 722 --create <tarfile> <source1> ... <sourceN> 723 724 Create tarfile from source files. 725 726.. cmdoption:: -e <tarfile> [<output_dir>] 727 --extract <tarfile> [<output_dir>] 728 729 Extract tarfile into the current directory if *output_dir* is not specified. 730 731.. cmdoption:: -t <tarfile> 732 --test <tarfile> 733 734 Test whether the tarfile is valid or not. 735 736.. cmdoption:: -v, --verbose 737 738 Verbose output. 739 740.. _tar-examples: 741 742Examples 743-------- 744 745How to extract an entire tar archive to the current working directory:: 746 747 import tarfile 748 tar = tarfile.open("sample.tar.gz") 749 tar.extractall() 750 tar.close() 751 752How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 753a generator function instead of a list:: 754 755 import os 756 import tarfile 757 758 def py_files(members): 759 for tarinfo in members: 760 if os.path.splitext(tarinfo.name)[1] == ".py": 761 yield tarinfo 762 763 tar = tarfile.open("sample.tar.gz") 764 tar.extractall(members=py_files(tar)) 765 tar.close() 766 767How to create an uncompressed tar archive from a list of filenames:: 768 769 import tarfile 770 tar = tarfile.open("sample.tar", "w") 771 for name in ["foo", "bar", "quux"]: 772 tar.add(name) 773 tar.close() 774 775The same example using the :keyword:`with` statement:: 776 777 import tarfile 778 with tarfile.open("sample.tar", "w") as tar: 779 for name in ["foo", "bar", "quux"]: 780 tar.add(name) 781 782How to read a gzip compressed tar archive and display some member information:: 783 784 import tarfile 785 tar = tarfile.open("sample.tar.gz", "r:gz") 786 for tarinfo in tar: 787 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="") 788 if tarinfo.isreg(): 789 print("a regular file.") 790 elif tarinfo.isdir(): 791 print("a directory.") 792 else: 793 print("something else.") 794 tar.close() 795 796How to create an archive and reset the user information using the *filter* 797parameter in :meth:`TarFile.add`:: 798 799 import tarfile 800 def reset(tarinfo): 801 tarinfo.uid = tarinfo.gid = 0 802 tarinfo.uname = tarinfo.gname = "root" 803 return tarinfo 804 tar = tarfile.open("sample.tar.gz", "w:gz") 805 tar.add("foo", filter=reset) 806 tar.close() 807 808 809.. _tar-formats: 810 811Supported tar formats 812--------------------- 813 814There are three tar formats that can be created with the :mod:`tarfile` module: 815 816* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 817 up to a length of at best 256 characters and linknames up to 100 characters. 818 The maximum file size is 8 GiB. This is an old and limited but widely 819 supported format. 820 821* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 822 linknames, files bigger than 8 GiB and sparse files. It is the de facto 823 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 824 extensions for long names, sparse file support is read-only. 825 826* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 827 format with virtually no limits. It supports long filenames and linknames, large 828 files and stores pathnames in a portable way. Modern tar implementations, 829 including GNU tar, bsdtar/libarchive and star, fully support extended *pax* 830 features; some old or unmaintained libraries may not, but should treat 831 *pax* archives as if they were in the universally-supported *ustar* format. 832 It is the current default format for new archives. 833 834 It extends the existing *ustar* format with extra headers for information 835 that cannot be stored otherwise. There are two flavours of pax headers: 836 Extended headers only affect the subsequent file header, global 837 headers are valid for the complete archive and affect all following files. 838 All the data in a pax header is encoded in *UTF-8* for portability reasons. 839 840There are some more variants of the tar format which can be read, but not 841created: 842 843* The ancient V7 format. This is the first tar format from Unix Seventh Edition, 844 storing only regular files and directories. Names must not be longer than 100 845 characters, there is no user/group name information. Some archives have 846 miscalculated header checksums in case of fields with non-ASCII characters. 847 848* The SunOS tar extended format. This format is a variant of the POSIX.1-2001 849 pax format, but is not compatible. 850 851.. _tar-unicode: 852 853Unicode issues 854-------------- 855 856The tar format was originally conceived to make backups on tape drives with the 857main focus on preserving file system information. Nowadays tar archives are 858commonly used for file distribution and exchanging archives over networks. One 859problem of the original format (which is the basis of all other formats) is 860that there is no concept of supporting different character encodings. For 861example, an ordinary tar archive created on a *UTF-8* system cannot be read 862correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual 863metadata (like filenames, linknames, user/group names) will appear damaged. 864Unfortunately, there is no way to autodetect the encoding of an archive. The 865pax format was designed to solve this problem. It stores non-ASCII metadata 866using the universal character encoding *UTF-8*. 867 868The details of character conversion in :mod:`tarfile` are controlled by the 869*encoding* and *errors* keyword arguments of the :class:`TarFile` class. 870 871*encoding* defines the character encoding to use for the metadata in the 872archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` 873as a fallback. Depending on whether the archive is read or written, the 874metadata must be either decoded or encoded. If *encoding* is not set 875appropriately, this conversion may fail. 876 877The *errors* argument defines how characters are treated that cannot be 878converted. Possible values are listed in section :ref:`error-handlers`. 879The default scheme is ``'surrogateescape'`` which Python also uses for its 880file system calls, see :ref:`os-filenames`. 881 882For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed 883because all the metadata is stored using *UTF-8*. *encoding* is only used in 884the rare cases when binary pax headers are decoded or when strings with 885surrogate characters are stored. 886