1:mod:`tarfile` --- Read and write tar archive files 2=================================================== 3 4.. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de> 8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de> 9 10**Source code:** :source:`Lib/tarfile.py` 11 12-------------- 13 14The :mod:`tarfile` module makes it possible to read and write tar 15archives, including those using gzip, bz2 and lzma compression. 16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 17higher-level functions in :ref:`shutil <archiving-operations>`. 18 19Some facts and figures: 20 21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives 22 if the respective modules are available. 23 24* read/write support for the POSIX.1-1988 (ustar) format. 25 26* read/write support for the GNU tar format including *longname* and *longlink* 27 extensions, read-only support for all variants of the *sparse* extension 28 including restoration of sparse files. 29 30* read/write support for the POSIX.1-2001 (pax) format. 31 32* handles directories, regular files, hardlinks, symbolic links, fifos, 33 character devices and block devices and is able to acquire and restore file 34 information like timestamp, access permissions and owner. 35 36.. versionchanged:: 3.3 37 Added support for :mod:`lzma` compression. 38 39 40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs) 41 42 Return a :class:`TarFile` object for the pathname *name*. For detailed 43 information on :class:`TarFile` objects and the keyword arguments that are 44 allowed, see :ref:`tarfile-objects`. 45 46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 47 to ``'r'``. Here is a full list of mode combinations: 48 49 +------------------+---------------------------------------------+ 50 | mode | action | 51 +==================+=============================================+ 52 | ``'r' or 'r:*'`` | Open for reading with transparent | 53 | | compression (recommended). | 54 +------------------+---------------------------------------------+ 55 | ``'r:'`` | Open for reading exclusively without | 56 | | compression. | 57 +------------------+---------------------------------------------+ 58 | ``'r:gz'`` | Open for reading with gzip compression. | 59 +------------------+---------------------------------------------+ 60 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 61 +------------------+---------------------------------------------+ 62 | ``'r:xz'`` | Open for reading with lzma compression. | 63 +------------------+---------------------------------------------+ 64 | ``'x'`` or | Create a tarfile exclusively without | 65 | ``'x:'`` | compression. | 66 | | Raise an :exc:`FileExistsError` exception | 67 | | if it already exists. | 68 +------------------+---------------------------------------------+ 69 | ``'x:gz'`` | Create a tarfile with gzip compression. | 70 | | Raise an :exc:`FileExistsError` exception | 71 | | if it already exists. | 72 +------------------+---------------------------------------------+ 73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. | 74 | | Raise an :exc:`FileExistsError` exception | 75 | | if it already exists. | 76 +------------------+---------------------------------------------+ 77 | ``'x:xz'`` | Create a tarfile with lzma compression. | 78 | | Raise an :exc:`FileExistsError` exception | 79 | | if it already exists. | 80 +------------------+---------------------------------------------+ 81 | ``'a' or 'a:'`` | Open for appending with no compression. The | 82 | | file is created if it does not exist. | 83 +------------------+---------------------------------------------+ 84 | ``'w' or 'w:'`` | Open for uncompressed writing. | 85 +------------------+---------------------------------------------+ 86 | ``'w:gz'`` | Open for gzip compressed writing. | 87 +------------------+---------------------------------------------+ 88 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 89 +------------------+---------------------------------------------+ 90 | ``'w:xz'`` | Open for lzma compressed writing. | 91 +------------------+---------------------------------------------+ 92 93 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode* 94 is not suitable to open a certain (compressed) file for reading, 95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a 96 compression method is not supported, :exc:`CompressionError` is raised. 97 98 If *fileobj* is specified, it is used as an alternative to a :term:`file object` 99 opened in binary mode for *name*. It is supposed to be at position 0. 100 101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``, 102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument 103 *compresslevel* (default ``9``) to specify the compression level of the file. 104 105 For special purposes, there is a second format for *mode*: 106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 107 object that processes its data as a stream of blocks. No random seeking will 108 be done on the file. If given, *fileobj* may be any object that has a 109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape 112 device. However, such a :class:`TarFile` object is limited in that it does 113 not allow random access, see :ref:`tar-examples`. The currently 114 possible modes: 115 116 +-------------+--------------------------------------------+ 117 | Mode | Action | 118 +=============+============================================+ 119 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 120 | | with transparent compression. | 121 +-------------+--------------------------------------------+ 122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 123 | | for reading. | 124 +-------------+--------------------------------------------+ 125 | ``'r|gz'`` | Open a gzip compressed *stream* for | 126 | | reading. | 127 +-------------+--------------------------------------------+ 128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 129 | | reading. | 130 +-------------+--------------------------------------------+ 131 | ``'r|xz'`` | Open an lzma compressed *stream* for | 132 | | reading. | 133 +-------------+--------------------------------------------+ 134 | ``'w|'`` | Open an uncompressed *stream* for writing. | 135 +-------------+--------------------------------------------+ 136 | ``'w|gz'`` | Open a gzip compressed *stream* for | 137 | | writing. | 138 +-------------+--------------------------------------------+ 139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 140 | | writing. | 141 +-------------+--------------------------------------------+ 142 | ``'w|xz'`` | Open an lzma compressed *stream* for | 143 | | writing. | 144 +-------------+--------------------------------------------+ 145 146 .. versionchanged:: 3.5 147 The ``'x'`` (exclusive creation) mode was added. 148 149.. class:: TarFile 150 151 Class for reading and writing tar archives. Do not use this class directly: 152 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 153 154 155.. function:: is_tarfile(name) 156 157 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 158 module can read. 159 160 161The :mod:`tarfile` module defines the following exceptions: 162 163 164.. exception:: TarError 165 166 Base class for all :mod:`tarfile` exceptions. 167 168 169.. exception:: ReadError 170 171 Is raised when a tar archive is opened, that either cannot be handled by the 172 :mod:`tarfile` module or is somehow invalid. 173 174 175.. exception:: CompressionError 176 177 Is raised when a compression method is not supported or when the data cannot be 178 decoded properly. 179 180 181.. exception:: StreamError 182 183 Is raised for the limitations that are typical for stream-like :class:`TarFile` 184 objects. 185 186 187.. exception:: ExtractError 188 189 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 190 :attr:`TarFile.errorlevel`\ ``== 2``. 191 192 193.. exception:: HeaderError 194 195 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 196 197 198The following constants are available at the module level: 199 200.. data:: ENCODING 201 202 The default character encoding: ``'utf-8'`` on Windows, the value returned by 203 :func:`sys.getfilesystemencoding` otherwise. 204 205 206Each of the following constants defines a tar archive format that the 207:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 208details. 209 210 211.. data:: USTAR_FORMAT 212 213 POSIX.1-1988 (ustar) format. 214 215 216.. data:: GNU_FORMAT 217 218 GNU tar format. 219 220 221.. data:: PAX_FORMAT 222 223 POSIX.1-2001 (pax) format. 224 225 226.. data:: DEFAULT_FORMAT 227 228 The default format for creating archives. This is currently :const:`GNU_FORMAT`. 229 230 231.. seealso:: 232 233 Module :mod:`zipfile` 234 Documentation of the :mod:`zipfile` standard module. 235 236 :ref:`archiving-operations` 237 Documentation of the higher-level archiving facilities provided by the 238 standard :mod:`shutil` module. 239 240 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 241 Documentation for tar archive files, including GNU tar extensions. 242 243 244.. _tarfile-objects: 245 246TarFile Objects 247--------------- 248 249The :class:`TarFile` object provides an interface to a tar archive. A tar 250archive is a sequence of blocks. An archive member (a stored file) is made up of 251a header block followed by data blocks. It is possible to store a file in a tar 252archive several times. Each archive member is represented by a :class:`TarInfo` 253object, see :ref:`tarinfo-objects` for details. 254 255A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 256statement. It will automatically be closed when the block is completed. Please 257note that in the event of an exception an archive opened for writing will not 258be finalized; only the internally used file object will be closed. See the 259:ref:`tar-examples` section for a use case. 260 261.. versionadded:: 3.2 262 Added support for the context management protocol. 263 264.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0) 265 266 All following arguments are optional and can be accessed as instance attributes 267 as well. 268 269 *name* is the pathname of the archive. It can be omitted if *fileobj* is given. 270 In this case, the file object's :attr:`name` attribute is used if it exists. 271 272 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 273 data to an existing file, ``'w'`` to create a new file overwriting an existing 274 one, or ``'x'`` to create a new file only if it does not already exist. 275 276 If *fileobj* is given, it is used for reading or writing data. If it can be 277 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 278 from position 0. 279 280 .. note:: 281 282 *fileobj* is not closed, when :class:`TarFile` is closed. 283 284 *format* controls the archive format. It must be one of the constants 285 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 286 defined at module level. 287 288 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 289 with a different one. 290 291 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 292 is :const:`True`, add the content of the target files to the archive. This has no 293 effect on systems that do not support symbolic links. 294 295 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 296 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 297 as possible. This is only useful for reading concatenated or damaged archives. 298 299 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 300 messages). The messages are written to ``sys.stderr``. 301 302 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. 303 Nevertheless, they appear as error messages in the debug output, when debugging 304 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` 305 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError` 306 exceptions as well. 307 308 The *encoding* and *errors* arguments define the character encoding to be 309 used for reading or writing the archive and how conversion errors are going 310 to be handled. The default settings will work for most users. 311 See section :ref:`tar-unicode` for in-depth information. 312 313 The *pax_headers* argument is an optional dictionary of strings which 314 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 315 316 .. versionchanged:: 3.2 317 Use ``'surrogateescape'`` as the default for the *errors* argument. 318 319 .. versionchanged:: 3.5 320 The ``'x'`` (exclusive creation) mode was added. 321 322.. classmethod:: TarFile.open(...) 323 324 Alternative constructor. The :func:`tarfile.open` function is actually a 325 shortcut to this classmethod. 326 327 328.. method:: TarFile.getmember(name) 329 330 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 331 in the archive, :exc:`KeyError` is raised. 332 333 .. note:: 334 335 If a member occurs more than once in the archive, its last occurrence is assumed 336 to be the most up-to-date version. 337 338 339.. method:: TarFile.getmembers() 340 341 Return the members of the archive as a list of :class:`TarInfo` objects. The 342 list has the same order as the members in the archive. 343 344 345.. method:: TarFile.getnames() 346 347 Return the members as a list of their names. It has the same order as the list 348 returned by :meth:`getmembers`. 349 350 351.. method:: TarFile.list(verbose=True, *, members=None) 352 353 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 354 only the names of the members are printed. If it is :const:`True`, output 355 similar to that of :program:`ls -l` is produced. If optional *members* is 356 given, it must be a subset of the list returned by :meth:`getmembers`. 357 358 .. versionchanged:: 3.5 359 Added the *members* parameter. 360 361 362.. method:: TarFile.next() 363 364 Return the next member of the archive as a :class:`TarInfo` object, when 365 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 366 available. 367 368 369.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False) 370 371 Extract all members from the archive to the current working directory or 372 directory *path*. If optional *members* is given, it must be a subset of the 373 list returned by :meth:`getmembers`. Directory information like owner, 374 modification time and permissions are set after all members have been extracted. 375 This is done to work around two problems: A directory's modification time is 376 reset each time a file is created in it. And, if a directory's permissions do 377 not allow writing, extracting files to it will fail. 378 379 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 380 are used to set the owner/group for the extracted files. Otherwise, the named 381 values from the tarfile are used. 382 383 .. warning:: 384 385 Never extract archives from untrusted sources without prior inspection. 386 It is possible that files are created outside of *path*, e.g. members 387 that have absolute filenames starting with ``"/"`` or filenames with two 388 dots ``".."``. 389 390 .. versionchanged:: 3.5 391 Added the *numeric_owner* parameter. 392 393 394.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False) 395 396 Extract a member from the archive to the current working directory, using its 397 full name. Its file information is extracted as accurately as possible. *member* 398 may be a filename or a :class:`TarInfo` object. You can specify a different 399 directory using *path*. File attributes (owner, mtime, mode) are set unless 400 *set_attrs* is false. 401 402 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 403 are used to set the owner/group for the extracted files. Otherwise, the named 404 values from the tarfile are used. 405 406 .. note:: 407 408 The :meth:`extract` method does not take care of several extraction issues. 409 In most cases you should consider using the :meth:`extractall` method. 410 411 .. warning:: 412 413 See the warning for :meth:`extractall`. 414 415 .. versionchanged:: 3.2 416 Added the *set_attrs* parameter. 417 418 .. versionchanged:: 3.5 419 Added the *numeric_owner* parameter. 420 421.. method:: TarFile.extractfile(member) 422 423 Extract a member from the archive as a file object. *member* may be a filename 424 or a :class:`TarInfo` object. If *member* is a regular file or a link, an 425 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is 426 returned. 427 428 .. versionchanged:: 3.3 429 Return an :class:`io.BufferedReader` object. 430 431 432.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None) 433 434 Add the file *name* to the archive. *name* may be any type of file 435 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an 436 alternative name for the file in the archive. Directories are added 437 recursively by default. This can be avoided by setting *recursive* to 438 :const:`False`. If *exclude* is given, it must be a function that takes one 439 filename argument and returns a boolean value. Depending on this value the 440 respective file is either excluded (:const:`True`) or added 441 (:const:`False`). If *filter* is specified it must be a keyword argument. It 442 should be a function that takes a :class:`TarInfo` object argument and 443 returns the changed :class:`TarInfo` object. If it instead returns 444 :const:`None` the :class:`TarInfo` object will be excluded from the 445 archive. See :ref:`tar-examples` for an example. 446 447 .. versionchanged:: 3.2 448 Added the *filter* parameter. 449 450 .. deprecated:: 3.2 451 The *exclude* parameter is deprecated, please use the *filter* parameter 452 instead. 453 454 455.. method:: TarFile.addfile(tarinfo, fileobj=None) 456 457 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 458 it should be a :term:`binary file`, and 459 ``tarinfo.size`` bytes are read from it and added to the archive. You can 460 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 461 462 463.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 464 465 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 466 equivalent on an existing file. The file is either named by *name*, or 467 specified as a :term:`file object` *fileobj* with a file descriptor. If 468 given, *arcname* specifies an alternative name for the file in the 469 archive, otherwise, the name is taken from *fileobj*’s 470 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name 471 should be a text string. 472 473 You can modify 474 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`. 475 If the file object is not an ordinary file object positioned at the 476 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 477 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 478 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 479 could be a dummy string. 480 481 482.. method:: TarFile.close() 483 484 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 485 appended to the archive. 486 487 488.. attribute:: TarFile.pax_headers 489 490 A dictionary containing key-value pairs of pax global headers. 491 492 493 494.. _tarinfo-objects: 495 496TarInfo Objects 497--------------- 498 499A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 500from storing all required attributes of a file (like file type, size, time, 501permissions, owner etc.), it provides some useful methods to determine its type. 502It does *not* contain the file's data itself. 503 504:class:`TarInfo` objects are returned by :class:`TarFile`'s methods 505:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. 506 507 508.. class:: TarInfo(name="") 509 510 Create a :class:`TarInfo` object. 511 512 513.. classmethod:: TarInfo.frombuf(buf, encoding, errors) 514 515 Create and return a :class:`TarInfo` object from string buffer *buf*. 516 517 Raises :exc:`HeaderError` if the buffer is invalid. 518 519 520.. classmethod:: TarInfo.fromtarfile(tarfile) 521 522 Read the next member from the :class:`TarFile` object *tarfile* and return it as 523 a :class:`TarInfo` object. 524 525 526.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') 527 528 Create a string buffer from a :class:`TarInfo` object. For information on the 529 arguments see the constructor of the :class:`TarFile` class. 530 531 .. versionchanged:: 3.2 532 Use ``'surrogateescape'`` as the default for the *errors* argument. 533 534 535A ``TarInfo`` object has the following public data attributes: 536 537 538.. attribute:: TarInfo.name 539 540 Name of the archive member. 541 542 543.. attribute:: TarInfo.size 544 545 Size in bytes. 546 547 548.. attribute:: TarInfo.mtime 549 550 Time of last modification. 551 552 553.. attribute:: TarInfo.mode 554 555 Permission bits. 556 557 558.. attribute:: TarInfo.type 559 560 File type. *type* is usually one of these constants: :const:`REGTYPE`, 561 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 562 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 563 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 564 more conveniently, use the ``is*()`` methods below. 565 566 567.. attribute:: TarInfo.linkname 568 569 Name of the target file name, which is only present in :class:`TarInfo` objects 570 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 571 572 573.. attribute:: TarInfo.uid 574 575 User ID of the user who originally stored this member. 576 577 578.. attribute:: TarInfo.gid 579 580 Group ID of the user who originally stored this member. 581 582 583.. attribute:: TarInfo.uname 584 585 User name. 586 587 588.. attribute:: TarInfo.gname 589 590 Group name. 591 592 593.. attribute:: TarInfo.pax_headers 594 595 A dictionary containing key-value pairs of an associated pax extended header. 596 597 598A :class:`TarInfo` object also provides some convenient query methods: 599 600 601.. method:: TarInfo.isfile() 602 603 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 604 605 606.. method:: TarInfo.isreg() 607 608 Same as :meth:`isfile`. 609 610 611.. method:: TarInfo.isdir() 612 613 Return :const:`True` if it is a directory. 614 615 616.. method:: TarInfo.issym() 617 618 Return :const:`True` if it is a symbolic link. 619 620 621.. method:: TarInfo.islnk() 622 623 Return :const:`True` if it is a hard link. 624 625 626.. method:: TarInfo.ischr() 627 628 Return :const:`True` if it is a character device. 629 630 631.. method:: TarInfo.isblk() 632 633 Return :const:`True` if it is a block device. 634 635 636.. method:: TarInfo.isfifo() 637 638 Return :const:`True` if it is a FIFO. 639 640 641.. method:: TarInfo.isdev() 642 643 Return :const:`True` if it is one of character device, block device or FIFO. 644 645 646.. _tarfile-commandline: 647.. program:: tarfile 648 649Command-Line Interface 650---------------------- 651 652.. versionadded:: 3.4 653 654The :mod:`tarfile` module provides a simple command-line interface to interact 655with tar archives. 656 657If you want to create a new tar archive, specify its name after the :option:`-c` 658option and then list the filename(s) that should be included: 659 660.. code-block:: shell-session 661 662 $ python -m tarfile -c monty.tar spam.txt eggs.txt 663 664Passing a directory is also acceptable: 665 666.. code-block:: shell-session 667 668 $ python -m tarfile -c monty.tar life-of-brian_1979/ 669 670If you want to extract a tar archive into the current directory, use 671the :option:`-e` option: 672 673.. code-block:: shell-session 674 675 $ python -m tarfile -e monty.tar 676 677You can also extract a tar archive into a different directory by passing the 678directory's name: 679 680.. code-block:: shell-session 681 682 $ python -m tarfile -e monty.tar other-dir/ 683 684For a list of the files in a tar archive, use the :option:`-l` option: 685 686.. code-block:: shell-session 687 688 $ python -m tarfile -l monty.tar 689 690 691Command-line options 692~~~~~~~~~~~~~~~~~~~~ 693 694.. cmdoption:: -l <tarfile> 695 --list <tarfile> 696 697 List files in a tarfile. 698 699.. cmdoption:: -c <tarfile> <source1> ... <sourceN> 700 --create <tarfile> <source1> ... <sourceN> 701 702 Create tarfile from source files. 703 704.. cmdoption:: -e <tarfile> [<output_dir>] 705 --extract <tarfile> [<output_dir>] 706 707 Extract tarfile into the current directory if *output_dir* is not specified. 708 709.. cmdoption:: -t <tarfile> 710 --test <tarfile> 711 712 Test whether the tarfile is valid or not. 713 714.. cmdoption:: -v, --verbose 715 716 Verbose output. 717 718.. _tar-examples: 719 720Examples 721-------- 722 723How to extract an entire tar archive to the current working directory:: 724 725 import tarfile 726 tar = tarfile.open("sample.tar.gz") 727 tar.extractall() 728 tar.close() 729 730How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 731a generator function instead of a list:: 732 733 import os 734 import tarfile 735 736 def py_files(members): 737 for tarinfo in members: 738 if os.path.splitext(tarinfo.name)[1] == ".py": 739 yield tarinfo 740 741 tar = tarfile.open("sample.tar.gz") 742 tar.extractall(members=py_files(tar)) 743 tar.close() 744 745How to create an uncompressed tar archive from a list of filenames:: 746 747 import tarfile 748 tar = tarfile.open("sample.tar", "w") 749 for name in ["foo", "bar", "quux"]: 750 tar.add(name) 751 tar.close() 752 753The same example using the :keyword:`with` statement:: 754 755 import tarfile 756 with tarfile.open("sample.tar", "w") as tar: 757 for name in ["foo", "bar", "quux"]: 758 tar.add(name) 759 760How to read a gzip compressed tar archive and display some member information:: 761 762 import tarfile 763 tar = tarfile.open("sample.tar.gz", "r:gz") 764 for tarinfo in tar: 765 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="") 766 if tarinfo.isreg(): 767 print("a regular file.") 768 elif tarinfo.isdir(): 769 print("a directory.") 770 else: 771 print("something else.") 772 tar.close() 773 774How to create an archive and reset the user information using the *filter* 775parameter in :meth:`TarFile.add`:: 776 777 import tarfile 778 def reset(tarinfo): 779 tarinfo.uid = tarinfo.gid = 0 780 tarinfo.uname = tarinfo.gname = "root" 781 return tarinfo 782 tar = tarfile.open("sample.tar.gz", "w:gz") 783 tar.add("foo", filter=reset) 784 tar.close() 785 786 787.. _tar-formats: 788 789Supported tar formats 790--------------------- 791 792There are three tar formats that can be created with the :mod:`tarfile` module: 793 794* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 795 up to a length of at best 256 characters and linknames up to 100 characters. The 796 maximum file size is 8 GiB. This is an old and limited but widely 797 supported format. 798 799* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 800 linknames, files bigger than 8 GiB and sparse files. It is the de facto 801 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 802 extensions for long names, sparse file support is read-only. 803 804* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 805 format with virtually no limits. It supports long filenames and linknames, large 806 files and stores pathnames in a portable way. However, not all tar 807 implementations today are able to handle pax archives properly. 808 809 The *pax* format is an extension to the existing *ustar* format. It uses extra 810 headers for information that cannot be stored otherwise. There are two flavours 811 of pax headers: Extended headers only affect the subsequent file header, global 812 headers are valid for the complete archive and affect all following files. All 813 the data in a pax header is encoded in *UTF-8* for portability reasons. 814 815There are some more variants of the tar format which can be read, but not 816created: 817 818* The ancient V7 format. This is the first tar format from Unix Seventh Edition, 819 storing only regular files and directories. Names must not be longer than 100 820 characters, there is no user/group name information. Some archives have 821 miscalculated header checksums in case of fields with non-ASCII characters. 822 823* The SunOS tar extended format. This format is a variant of the POSIX.1-2001 824 pax format, but is not compatible. 825 826.. _tar-unicode: 827 828Unicode issues 829-------------- 830 831The tar format was originally conceived to make backups on tape drives with the 832main focus on preserving file system information. Nowadays tar archives are 833commonly used for file distribution and exchanging archives over networks. One 834problem of the original format (which is the basis of all other formats) is 835that there is no concept of supporting different character encodings. For 836example, an ordinary tar archive created on a *UTF-8* system cannot be read 837correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual 838metadata (like filenames, linknames, user/group names) will appear damaged. 839Unfortunately, there is no way to autodetect the encoding of an archive. The 840pax format was designed to solve this problem. It stores non-ASCII metadata 841using the universal character encoding *UTF-8*. 842 843The details of character conversion in :mod:`tarfile` are controlled by the 844*encoding* and *errors* keyword arguments of the :class:`TarFile` class. 845 846*encoding* defines the character encoding to use for the metadata in the 847archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` 848as a fallback. Depending on whether the archive is read or written, the 849metadata must be either decoded or encoded. If *encoding* is not set 850appropriately, this conversion may fail. 851 852The *errors* argument defines how characters are treated that cannot be 853converted. Possible values are listed in section :ref:`error-handlers`. 854The default scheme is ``'surrogateescape'`` which Python also uses for its 855file system calls, see :ref:`os-filenames`. 856 857In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed 858because all the metadata is stored using *UTF-8*. *encoding* is only used in 859the rare cases when binary pax headers are decoded or when strings with 860surrogate characters are stored. 861