1:mod:`tarfile` --- Read and write tar archive files 2=================================================== 3 4.. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de> 8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de> 9 10**Source code:** :source:`Lib/tarfile.py` 11 12-------------- 13 14The :mod:`tarfile` module makes it possible to read and write tar 15archives, including those using gzip, bz2 and lzma compression. 16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 17higher-level functions in :ref:`shutil <archiving-operations>`. 18 19Some facts and figures: 20 21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives 22 if the respective modules are available. 23 24* read/write support for the POSIX.1-1988 (ustar) format. 25 26* read/write support for the GNU tar format including *longname* and *longlink* 27 extensions, read-only support for all variants of the *sparse* extension 28 including restoration of sparse files. 29 30* read/write support for the POSIX.1-2001 (pax) format. 31 32* handles directories, regular files, hardlinks, symbolic links, fifos, 33 character devices and block devices and is able to acquire and restore file 34 information like timestamp, access permissions and owner. 35 36.. versionchanged:: 3.3 37 Added support for :mod:`lzma` compression. 38 39 40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) 41 42 Return a :class:`TarFile` object for the pathname *name*. For detailed 43 information on :class:`TarFile` objects and the keyword arguments that are 44 allowed, see :ref:`tarfile-objects`. 45 46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 47 to ``'r'``. Here is a full list of mode combinations: 48 49 +------------------+---------------------------------------------+ 50 | mode | action | 51 +==================+=============================================+ 52 | ``'r' or 'r:*'`` | Open for reading with transparent | 53 | | compression (recommended). | 54 +------------------+---------------------------------------------+ 55 | ``'r:'`` | Open for reading exclusively without | 56 | | compression. | 57 +------------------+---------------------------------------------+ 58 | ``'r:gz'`` | Open for reading with gzip compression. | 59 +------------------+---------------------------------------------+ 60 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 61 +------------------+---------------------------------------------+ 62 | ``'r:xz'`` | Open for reading with lzma compression. | 63 +------------------+---------------------------------------------+ 64 | ``'x'`` or | Create a tarfile exclusively without | 65 | ``'x:'`` | compression. | 66 | | Raise a :exc:`FileExistsError` exception | 67 | | if it already exists. | 68 +------------------+---------------------------------------------+ 69 | ``'x:gz'`` | Create a tarfile with gzip compression. | 70 | | Raise a :exc:`FileExistsError` exception | 71 | | if it already exists. | 72 +------------------+---------------------------------------------+ 73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. | 74 | | Raise a :exc:`FileExistsError` exception | 75 | | if it already exists. | 76 +------------------+---------------------------------------------+ 77 | ``'x:xz'`` | Create a tarfile with lzma compression. | 78 | | Raise a :exc:`FileExistsError` exception | 79 | | if it already exists. | 80 +------------------+---------------------------------------------+ 81 | ``'a' or 'a:'`` | Open for appending with no compression. The | 82 | | file is created if it does not exist. | 83 +------------------+---------------------------------------------+ 84 | ``'w' or 'w:'`` | Open for uncompressed writing. | 85 +------------------+---------------------------------------------+ 86 | ``'w:gz'`` | Open for gzip compressed writing. | 87 +------------------+---------------------------------------------+ 88 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 89 +------------------+---------------------------------------------+ 90 | ``'w:xz'`` | Open for lzma compressed writing. | 91 +------------------+---------------------------------------------+ 92 93 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode* 94 is not suitable to open a certain (compressed) file for reading, 95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a 96 compression method is not supported, :exc:`CompressionError` is raised. 97 98 If *fileobj* is specified, it is used as an alternative to a :term:`file object` 99 opened in binary mode for *name*. It is supposed to be at position 0. 100 101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``, 102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument 103 *compresslevel* (default ``9``) to specify the compression level of the file. 104 105 For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the 106 keyword argument *preset* to specify the compression level of the file. 107 108 For special purposes, there is a second format for *mode*: 109 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 110 object that processes its data as a stream of blocks. No random seeking will 111 be done on the file. If given, *fileobj* may be any object that has a 112 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 113 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 114 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape 115 device. However, such a :class:`TarFile` object is limited in that it does 116 not allow random access, see :ref:`tar-examples`. The currently 117 possible modes: 118 119 +-------------+--------------------------------------------+ 120 | Mode | Action | 121 +=============+============================================+ 122 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 123 | | with transparent compression. | 124 +-------------+--------------------------------------------+ 125 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 126 | | for reading. | 127 +-------------+--------------------------------------------+ 128 | ``'r|gz'`` | Open a gzip compressed *stream* for | 129 | | reading. | 130 +-------------+--------------------------------------------+ 131 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 132 | | reading. | 133 +-------------+--------------------------------------------+ 134 | ``'r|xz'`` | Open an lzma compressed *stream* for | 135 | | reading. | 136 +-------------+--------------------------------------------+ 137 | ``'w|'`` | Open an uncompressed *stream* for writing. | 138 +-------------+--------------------------------------------+ 139 | ``'w|gz'`` | Open a gzip compressed *stream* for | 140 | | writing. | 141 +-------------+--------------------------------------------+ 142 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 143 | | writing. | 144 +-------------+--------------------------------------------+ 145 | ``'w|xz'`` | Open an lzma compressed *stream* for | 146 | | writing. | 147 +-------------+--------------------------------------------+ 148 149 .. versionchanged:: 3.5 150 The ``'x'`` (exclusive creation) mode was added. 151 152 .. versionchanged:: 3.6 153 The *name* parameter accepts a :term:`path-like object`. 154 155 156.. class:: TarFile 157 :noindex: 158 159 Class for reading and writing tar archives. Do not use this class directly: 160 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 161 162 163.. function:: is_tarfile(name) 164 165 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 166 module can read. *name* may be a :class:`str`, file, or file-like object. 167 168 .. versionchanged:: 3.9 169 Support for file and file-like objects. 170 171 172The :mod:`tarfile` module defines the following exceptions: 173 174 175.. exception:: TarError 176 177 Base class for all :mod:`tarfile` exceptions. 178 179 180.. exception:: ReadError 181 182 Is raised when a tar archive is opened, that either cannot be handled by the 183 :mod:`tarfile` module or is somehow invalid. 184 185 186.. exception:: CompressionError 187 188 Is raised when a compression method is not supported or when the data cannot be 189 decoded properly. 190 191 192.. exception:: StreamError 193 194 Is raised for the limitations that are typical for stream-like :class:`TarFile` 195 objects. 196 197 198.. exception:: ExtractError 199 200 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 201 :attr:`TarFile.errorlevel`\ ``== 2``. 202 203 204.. exception:: HeaderError 205 206 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 207 208 209.. exception:: FilterError 210 211 Base class for members :ref:`refused <tarfile-extraction-refuse>` by 212 filters. 213 214 .. attribute:: tarinfo 215 216 Information about the member that the filter refused to extract, 217 as :ref:`TarInfo <tarinfo-objects>`. 218 219.. exception:: AbsolutePathError 220 221 Raised to refuse extracting a member with an absolute path. 222 223.. exception:: OutsideDestinationError 224 225 Raised to refuse extracting a member outside the destination directory. 226 227.. exception:: SpecialFileError 228 229 Raised to refuse extracting a special file (e.g. a device or pipe). 230 231.. exception:: AbsoluteLinkError 232 233 Raised to refuse extracting a symbolic link with an absolute path. 234 235.. exception:: LinkOutsideDestinationError 236 237 Raised to refuse extracting a symbolic link pointing outside the destination 238 directory. 239 240.. exception:: LinkFallbackError 241 242 Raised to refuse emulating a link (hard or symbolic) by extracting another 243 archive member, when that member would be rejected by the filter location. 244 The exception that was raised to reject the replacement member is available 245 as :attr:`!BaseException.__context__`. 246 247 .. versionadded:: next 248 249 250The following constants are available at the module level: 251 252.. data:: ENCODING 253 254 The default character encoding: ``'utf-8'`` on Windows, the value returned by 255 :func:`sys.getfilesystemencoding` otherwise. 256 257 258Each of the following constants defines a tar archive format that the 259:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 260details. 261 262 263.. data:: USTAR_FORMAT 264 265 POSIX.1-1988 (ustar) format. 266 267 268.. data:: GNU_FORMAT 269 270 GNU tar format. 271 272 273.. data:: PAX_FORMAT 274 275 POSIX.1-2001 (pax) format. 276 277 278.. data:: DEFAULT_FORMAT 279 280 The default format for creating archives. This is currently :const:`PAX_FORMAT`. 281 282 .. versionchanged:: 3.8 283 The default format for new archives was changed to 284 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`. 285 286 287.. seealso:: 288 289 Module :mod:`zipfile` 290 Documentation of the :mod:`zipfile` standard module. 291 292 :ref:`archiving-operations` 293 Documentation of the higher-level archiving facilities provided by the 294 standard :mod:`shutil` module. 295 296 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 297 Documentation for tar archive files, including GNU tar extensions. 298 299 300.. _tarfile-objects: 301 302TarFile Objects 303--------------- 304 305The :class:`TarFile` object provides an interface to a tar archive. A tar 306archive is a sequence of blocks. An archive member (a stored file) is made up of 307a header block followed by data blocks. It is possible to store a file in a tar 308archive several times. Each archive member is represented by a :class:`TarInfo` 309object, see :ref:`tarinfo-objects` for details. 310 311A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 312statement. It will automatically be closed when the block is completed. Please 313note that in the event of an exception an archive opened for writing will not 314be finalized; only the internally used file object will be closed. See the 315:ref:`tar-examples` section for a use case. 316 317.. versionadded:: 3.2 318 Added support for the context management protocol. 319 320.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1) 321 322 All following arguments are optional and can be accessed as instance attributes 323 as well. 324 325 *name* is the pathname of the archive. *name* may be a :term:`path-like object`. 326 It can be omitted if *fileobj* is given. 327 In this case, the file object's :attr:`name` attribute is used if it exists. 328 329 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 330 data to an existing file, ``'w'`` to create a new file overwriting an existing 331 one, or ``'x'`` to create a new file only if it does not already exist. 332 333 If *fileobj* is given, it is used for reading or writing data. If it can be 334 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 335 from position 0. 336 337 .. note:: 338 339 *fileobj* is not closed, when :class:`TarFile` is closed. 340 341 *format* controls the archive format for writing. It must be one of the constants 342 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 343 defined at module level. When reading, format will be automatically detected, even 344 if different formats are present in a single archive. 345 346 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 347 with a different one. 348 349 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 350 is :const:`True`, add the content of the target files to the archive. This has no 351 effect on systems that do not support symbolic links. 352 353 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 354 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 355 as possible. This is only useful for reading concatenated or damaged archives. 356 357 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 358 messages). The messages are written to ``sys.stderr``. 359 360 *errorlevel* controls how extraction errors are handled, 361 see :attr:`the corresponding attribute <~TarFile.errorlevel>`. 362 363 The *encoding* and *errors* arguments define the character encoding to be 364 used for reading or writing the archive and how conversion errors are going 365 to be handled. The default settings will work for most users. 366 See section :ref:`tar-unicode` for in-depth information. 367 368 The *pax_headers* argument is an optional dictionary of strings which 369 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 370 371 .. versionchanged:: 3.2 372 Use ``'surrogateescape'`` as the default for the *errors* argument. 373 374 .. versionchanged:: 3.5 375 The ``'x'`` (exclusive creation) mode was added. 376 377 .. versionchanged:: 3.6 378 The *name* parameter accepts a :term:`path-like object`. 379 380 381.. classmethod:: TarFile.open(...) 382 383 Alternative constructor. The :func:`tarfile.open` function is actually a 384 shortcut to this classmethod. 385 386 387.. method:: TarFile.getmember(name) 388 389 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 390 in the archive, :exc:`KeyError` is raised. 391 392 .. note:: 393 394 If a member occurs more than once in the archive, its last occurrence is assumed 395 to be the most up-to-date version. 396 397 398.. method:: TarFile.getmembers() 399 400 Return the members of the archive as a list of :class:`TarInfo` objects. The 401 list has the same order as the members in the archive. 402 403 404.. method:: TarFile.getnames() 405 406 Return the members as a list of their names. It has the same order as the list 407 returned by :meth:`getmembers`. 408 409 410.. method:: TarFile.list(verbose=True, *, members=None) 411 412 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 413 only the names of the members are printed. If it is :const:`True`, output 414 similar to that of :program:`ls -l` is produced. If optional *members* is 415 given, it must be a subset of the list returned by :meth:`getmembers`. 416 417 .. versionchanged:: 3.5 418 Added the *members* parameter. 419 420 421.. method:: TarFile.next() 422 423 Return the next member of the archive as a :class:`TarInfo` object, when 424 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 425 available. 426 427 428.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None) 429 430 Extract all members from the archive to the current working directory or 431 directory *path*. If optional *members* is given, it must be a subset of the 432 list returned by :meth:`getmembers`. Directory information like owner, 433 modification time and permissions are set after all members have been extracted. 434 This is done to work around two problems: A directory's modification time is 435 reset each time a file is created in it. And, if a directory's permissions do 436 not allow writing, extracting files to it will fail. 437 438 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 439 are used to set the owner/group for the extracted files. Otherwise, the named 440 values from the tarfile are used. 441 442 The *filter* argument, which was added in Python 3.11.4, specifies how 443 ``members`` are modified or rejected before extraction. 444 See :ref:`tarfile-extraction-filter` for details. 445 It is recommended to set this explicitly depending on which *tar* features 446 you need to support. 447 448 .. warning:: 449 450 Never extract archives from untrusted sources without prior inspection. 451 It is possible that files are created outside of *path*, e.g. members 452 that have absolute filenames starting with ``"/"`` or filenames with two 453 dots ``".."``. 454 455 Set ``filter='data'`` to prevent the most dangerous security issues, 456 and read the :ref:`tarfile-extraction-filter` section for details. 457 458 .. versionchanged:: 3.5 459 Added the *numeric_owner* parameter. 460 461 .. versionchanged:: 3.6 462 The *path* parameter accepts a :term:`path-like object`. 463 464 .. versionchanged:: 3.11.4 465 Added the *filter* parameter. 466 467 468.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None) 469 470 Extract a member from the archive to the current working directory, using its 471 full name. Its file information is extracted as accurately as possible. *member* 472 may be a filename or a :class:`TarInfo` object. You can specify a different 473 directory using *path*. *path* may be a :term:`path-like object`. 474 File attributes (owner, mtime, mode) are set unless *set_attrs* is false. 475 476 The *numeric_owner* and *filter* arguments are the same as 477 for :meth:`extractall`. 478 479 .. note:: 480 481 The :meth:`extract` method does not take care of several extraction issues. 482 In most cases you should consider using the :meth:`extractall` method. 483 484 .. warning:: 485 486 See the warning for :meth:`extractall`. 487 488 Set ``filter='data'`` to prevent the most dangerous security issues, 489 and read the :ref:`tarfile-extraction-filter` section for details. 490 491 .. versionchanged:: 3.2 492 Added the *set_attrs* parameter. 493 494 .. versionchanged:: 3.5 495 Added the *numeric_owner* parameter. 496 497 .. versionchanged:: 3.6 498 The *path* parameter accepts a :term:`path-like object`. 499 500 .. versionchanged:: 3.11.4 501 Added the *filter* parameter. 502 503 504.. method:: TarFile.extractfile(member) 505 506 Extract a member from the archive as a file object. *member* may be 507 a filename or a :class:`TarInfo` object. If *member* is a regular file or 508 a link, an :class:`io.BufferedReader` object is returned. For all other 509 existing members, :const:`None` is returned. If *member* does not appear 510 in the archive, :exc:`KeyError` is raised. 511 512 .. versionchanged:: 3.3 513 Return an :class:`io.BufferedReader` object. 514 515.. attribute:: TarFile.errorlevel 516 :type: int 517 518 If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract` 519 and :meth:`TarFile.extractall`. 520 Nevertheless, they appear as error messages in the debug output when 521 *debug* is greater than 0. 522 If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or 523 :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised 524 as :exc:`TarError` exceptions as well. 525 526 Some exceptions, e.g. ones caused by wrong argument types or data 527 corruption, are always raised. 528 529 Custom :ref:`extraction filters <tarfile-extraction-filter>` 530 should raise :exc:`FilterError` for *fatal* errors 531 and :exc:`ExtractError` for *non-fatal* ones. 532 533 Note that when an exception is raised, the archive may be partially 534 extracted. It is the user’s responsibility to clean up. 535 536.. attribute:: TarFile.extraction_filter 537 538 .. versionadded:: 3.11.4 539 540 The :ref:`extraction filter <tarfile-extraction-filter>` used 541 as a default for the *filter* argument of :meth:`~TarFile.extract` 542 and :meth:`~TarFile.extractall`. 543 544 The attribute may be ``None`` or a callable. 545 String names are not allowed for this attribute, unlike the *filter* 546 argument to :meth:`~TarFile.extract`. 547 548 If ``extraction_filter`` is ``None`` (the default), 549 calling an extraction method without a *filter* argument will 550 use the :func:`fully_trusted <fully_trusted_filter>` filter for 551 compatibility with previous Python versions. 552 553 In Python 3.12+, leaving ``extraction_filter=None`` will emit a 554 ``DeprecationWarning``. 555 556 In Python 3.14+, leaving ``extraction_filter=None`` will cause 557 extraction methods to use the :func:`data <data_filter>` filter by default. 558 559 The attribute may be set on instances or overridden in subclasses. 560 It also is possible to set it on the ``TarFile`` class itself to set a 561 global default, although, since it affects all uses of *tarfile*, 562 it is best practice to only do so in top-level applications or 563 :mod:`site configuration <site>`. 564 To set a global default this way, a filter function needs to be wrapped in 565 :func:`staticmethod()` to prevent injection of a ``self`` argument. 566 567.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None) 568 569 Add the file *name* to the archive. *name* may be any type of file 570 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an 571 alternative name for the file in the archive. Directories are added 572 recursively by default. This can be avoided by setting *recursive* to 573 :const:`False`. Recursion adds entries in sorted order. 574 If *filter* is given, it 575 should be a function that takes a :class:`TarInfo` object argument and 576 returns the changed :class:`TarInfo` object. If it instead returns 577 :const:`None` the :class:`TarInfo` object will be excluded from the 578 archive. See :ref:`tar-examples` for an example. 579 580 .. versionchanged:: 3.2 581 Added the *filter* parameter. 582 583 .. versionchanged:: 3.7 584 Recursion adds entries in sorted order. 585 586 587.. method:: TarFile.addfile(tarinfo, fileobj=None) 588 589 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 590 it should be a :term:`binary file`, and 591 ``tarinfo.size`` bytes are read from it and added to the archive. You can 592 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 593 594 595.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 596 597 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 598 equivalent on an existing file. The file is either named by *name*, or 599 specified as a :term:`file object` *fileobj* with a file descriptor. 600 *name* may be a :term:`path-like object`. If 601 given, *arcname* specifies an alternative name for the file in the 602 archive, otherwise, the name is taken from *fileobj*’s 603 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name 604 should be a text string. 605 606 You can modify 607 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`. 608 If the file object is not an ordinary file object positioned at the 609 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 610 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 611 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 612 could be a dummy string. 613 614 .. versionchanged:: 3.6 615 The *name* parameter accepts a :term:`path-like object`. 616 617 618.. method:: TarFile.close() 619 620 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 621 appended to the archive. 622 623 624.. attribute:: TarFile.pax_headers 625 626 A dictionary containing key-value pairs of pax global headers. 627 628 629 630.. _tarinfo-objects: 631 632TarInfo Objects 633--------------- 634 635A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 636from storing all required attributes of a file (like file type, size, time, 637permissions, owner etc.), it provides some useful methods to determine its type. 638It does *not* contain the file's data itself. 639 640:class:`TarInfo` objects are returned by :class:`TarFile`'s methods 641:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and 642:meth:`~TarFile.gettarinfo`. 643 644Modifying the objects returned by :meth:`~!TarFile.getmember` or 645:meth:`~!TarFile.getmembers` will affect all subsequent 646operations on the archive. 647For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or 648call the :meth:`~TarInfo.replace` method to create a modified copy in one step. 649 650Several attributes can be set to ``None`` to indicate that a piece of metadata 651is unused or unknown. 652Different :class:`TarInfo` methods handle ``None`` differently: 653 654- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will 655 ignore the corresponding metadata, leaving it set to a default. 656- :meth:`~TarFile.addfile` will fail. 657- :meth:`~TarFile.list` will print a placeholder string. 658 659 660.. versionchanged:: 3.11.4 661 Added :meth:`~TarInfo.replace` and handling of ``None``. 662 663 664.. class:: TarInfo(name="") 665 666 Create a :class:`TarInfo` object. 667 668 669.. classmethod:: TarInfo.frombuf(buf, encoding, errors) 670 671 Create and return a :class:`TarInfo` object from string buffer *buf*. 672 673 Raises :exc:`HeaderError` if the buffer is invalid. 674 675 676.. classmethod:: TarInfo.fromtarfile(tarfile) 677 678 Read the next member from the :class:`TarFile` object *tarfile* and return it as 679 a :class:`TarInfo` object. 680 681 682.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') 683 684 Create a string buffer from a :class:`TarInfo` object. For information on the 685 arguments see the constructor of the :class:`TarFile` class. 686 687 .. versionchanged:: 3.2 688 Use ``'surrogateescape'`` as the default for the *errors* argument. 689 690 691A ``TarInfo`` object has the following public data attributes: 692 693 694.. attribute:: TarInfo.name 695 :type: str 696 697 Name of the archive member. 698 699 700.. attribute:: TarInfo.size 701 :type: int 702 703 Size in bytes. 704 705 706.. attribute:: TarInfo.mtime 707 :type: int | float 708 709 Time of last modification in seconds since the :ref:`epoch <epoch>`, 710 as in :attr:`os.stat_result.st_mtime`. 711 712 .. versionchanged:: 3.11.4 713 714 Can be set to ``None`` for :meth:`~TarFile.extract` and 715 :meth:`~TarFile.extractall`, causing extraction to skip applying this 716 attribute. 717 718.. attribute:: TarInfo.mode 719 :type: int 720 721 Permission bits, as for :func:`os.chmod`. 722 723 .. versionchanged:: 3.11.4 724 725 Can be set to ``None`` for :meth:`~TarFile.extract` and 726 :meth:`~TarFile.extractall`, causing extraction to skip applying this 727 attribute. 728 729.. attribute:: TarInfo.type 730 731 File type. *type* is usually one of these constants: :const:`REGTYPE`, 732 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 733 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 734 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 735 more conveniently, use the ``is*()`` methods below. 736 737 738.. attribute:: TarInfo.linkname 739 :type: str 740 741 Name of the target file name, which is only present in :class:`TarInfo` objects 742 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 743 744 745.. attribute:: TarInfo.uid 746 :type: int 747 748 User ID of the user who originally stored this member. 749 750 .. versionchanged:: 3.11.4 751 752 Can be set to ``None`` for :meth:`~TarFile.extract` and 753 :meth:`~TarFile.extractall`, causing extraction to skip applying this 754 attribute. 755 756.. attribute:: TarInfo.gid 757 :type: int 758 759 Group ID of the user who originally stored this member. 760 761 .. versionchanged:: 3.11.4 762 763 Can be set to ``None`` for :meth:`~TarFile.extract` and 764 :meth:`~TarFile.extractall`, causing extraction to skip applying this 765 attribute. 766 767.. attribute:: TarInfo.uname 768 :type: str 769 770 User name. 771 772 .. versionchanged:: 3.11.4 773 774 Can be set to ``None`` for :meth:`~TarFile.extract` and 775 :meth:`~TarFile.extractall`, causing extraction to skip applying this 776 attribute. 777 778.. attribute:: TarInfo.gname 779 :type: str 780 781 Group name. 782 783 .. versionchanged:: 3.11.4 784 785 Can be set to ``None`` for :meth:`~TarFile.extract` and 786 :meth:`~TarFile.extractall`, causing extraction to skip applying this 787 attribute. 788 789.. attribute:: TarInfo.pax_headers 790 :type: dict 791 792 A dictionary containing key-value pairs of an associated pax extended header. 793 794.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=..., 795 uid=..., gid=..., uname=..., gname=..., 796 deep=True) 797 798 .. versionadded:: 3.11.4 799 800 Return a *new* copy of the :class:`!TarInfo` object with the given attributes 801 changed. For example, to return a ``TarInfo`` with the group name set to 802 ``'staff'``, use:: 803 804 new_tarinfo = old_tarinfo.replace(gname='staff') 805 806 By default, a deep copy is made. 807 If *deep* is false, the copy is shallow, i.e. ``pax_headers`` 808 and any custom attributes are shared with the original ``TarInfo`` object. 809 810A :class:`TarInfo` object also provides some convenient query methods: 811 812 813.. method:: TarInfo.isfile() 814 815 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 816 817 818.. method:: TarInfo.isreg() 819 820 Same as :meth:`isfile`. 821 822 823.. method:: TarInfo.isdir() 824 825 Return :const:`True` if it is a directory. 826 827 828.. method:: TarInfo.issym() 829 830 Return :const:`True` if it is a symbolic link. 831 832 833.. method:: TarInfo.islnk() 834 835 Return :const:`True` if it is a hard link. 836 837 838.. method:: TarInfo.ischr() 839 840 Return :const:`True` if it is a character device. 841 842 843.. method:: TarInfo.isblk() 844 845 Return :const:`True` if it is a block device. 846 847 848.. method:: TarInfo.isfifo() 849 850 Return :const:`True` if it is a FIFO. 851 852 853.. method:: TarInfo.isdev() 854 855 Return :const:`True` if it is one of character device, block device or FIFO. 856 857 858.. _tarfile-extraction-filter: 859 860Extraction filters 861------------------ 862 863.. versionadded:: 3.11.4 864 865The *tar* format is designed to capture all details of a UNIX-like filesystem, 866which makes it very powerful. 867Unfortunately, the features make it easy to create tar files that have 868unintended -- and possibly malicious -- effects when extracted. 869For example, extracting a tar file can overwrite arbitrary files in various 870ways (e.g. by using absolute paths, ``..`` path components, or symlinks that 871affect later members). 872 873In most cases, the full functionality is not needed. 874Therefore, *tarfile* supports extraction filters: a mechanism to limit 875functionality, and thus mitigate some of the security issues. 876 877.. seealso:: 878 879 :pep:`706` 880 Contains further motivation and rationale behind the design. 881 882The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall` 883can be: 884 885* the string ``'fully_trusted'``: Honor all metadata as specified in the 886 archive. 887 Should be used if the user trusts the archive completely, or implements 888 their own complex verification. 889 890* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of 891 UNIX-like filesystems), but block features that are very likely to be 892 surprising or malicious. See :func:`tar_filter` for details. 893 894* the string ``'data'``: Ignore or block most features specific to UNIX-like 895 filesystems. Intended for extracting cross-platform data archives. 896 See :func:`data_filter` for details. 897 898* ``None`` (default): Use :attr:`TarFile.extraction_filter`. 899 900 If that is also ``None`` (the default), the ``'fully_trusted'`` 901 filter will be used (for compatibility with earlier versions of Python). 902 903 In Python 3.12, the default will emit a ``DeprecationWarning``. 904 905 In Python 3.14, the ``'data'`` filter will become the default instead. 906 It's possible to switch earlier; see :attr:`TarFile.extraction_filter`. 907 908* A callable which will be called for each extracted member with a 909 :ref:`TarInfo <tarinfo-objects>` describing the member and the destination 910 path to where the archive is extracted (i.e. the same path is used for all 911 members):: 912 913 filter(/, member: TarInfo, path: str) -> TarInfo | None 914 915 The callable is called just before each member is extracted, so it can 916 take the current state of the disk into account. 917 It can: 918 919 - return a :class:`TarInfo` object which will be used instead of the metadata 920 in the archive, or 921 - return ``None``, in which case the member will be skipped, or 922 - raise an exception to abort the operation or skip the member, 923 depending on :attr:`~TarFile.errorlevel`. 924 Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave 925 the archive partially extracted. It does not attempt to clean up. 926 927Default named filters 928~~~~~~~~~~~~~~~~~~~~~ 929 930The pre-defined, named filters are available as functions, so they can be 931reused in custom filters: 932 933.. function:: fully_trusted_filter(/, member, path) 934 935 Return *member* unchanged. 936 937 This implements the ``'fully_trusted'`` filter. 938 939.. function:: tar_filter(/, member, path) 940 941 Implements the ``'tar'`` filter. 942 943 - Strip leading slashes (``/`` and :attr:`os.sep`) from filenames. 944 - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute 945 paths (in case the name is absolute 946 even after stripping slashes, e.g. ``C:/foo`` on Windows). 947 This raises :class:`~tarfile.AbsolutePathError`. 948 - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute 949 path (after following symlinks) would end up outside the destination. 950 This raises :class:`~tarfile.OutsideDestinationError`. 951 - Clear high mode bits (setuid, setgid, sticky) and group/other write bits 952 (:attr:`~stat.S_IWGRP`|:attr:`~stat.S_IWOTH`). 953 954 Return the modified ``TarInfo`` member. 955 956.. function:: data_filter(/, member, path) 957 958 Implements the ``'data'`` filter. 959 In addition to what ``tar_filter`` does: 960 961 - Normalize link targets (:attr:`TarInfo.linkname`) using 962 :func:`os.path.normpath`. 963 Note that this removes internal ``..`` components, which may change the 964 meaning of the link if the path in :attr:`!TarInfo.linkname` traverses 965 symbolic links. 966 967 - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft) 968 that link to absolute paths, or ones that link outside the destination. 969 970 This raises :class:`~tarfile.AbsoluteLinkError` or 971 :class:`~tarfile.LinkOutsideDestinationError`. 972 973 Note that such files are refused even on platforms that do not support 974 symbolic links. 975 976 - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files 977 (including pipes). 978 This raises :class:`~tarfile.SpecialFileError`. 979 980 - For regular files, including hard links: 981 982 - Set the owner read and write permissions 983 (:attr:`~stat.S_IRUSR`|:attr:`~stat.S_IWUSR`). 984 - Remove the group & other executable permission 985 (:attr:`~stat.S_IXGRP`|:attr:`~stat.S_IXOTH`) 986 if the owner doesn’t have it (:attr:`~stat.S_IXUSR`). 987 988 - For other files (directories), set ``mode`` to ``None``, so 989 that extraction methods skip applying permission bits. 990 - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``) 991 to ``None``, so that extraction methods skip setting it. 992 993 Return the modified ``TarInfo`` member. 994 995 .. versionchanged:: next 996 997 Link targets are now normalized. 998 999 1000.. _tarfile-extraction-refuse: 1001 1002Filter errors 1003~~~~~~~~~~~~~ 1004 1005When a filter refuses to extract a file, it will raise an appropriate exception, 1006a subclass of :class:`~tarfile.FilterError`. 1007This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more. 1008With ``errorlevel=0`` the error will be logged and the member will be skipped, 1009but extraction will continue. 1010 1011 1012Hints for further verification 1013~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1014 1015Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted 1016files without prior inspection. 1017Among other issues, the pre-defined filters do not prevent denial-of-service 1018attacks. Users should do additional checks. 1019 1020Here is an incomplete list of things to consider: 1021 1022* Extract to a :func:`new temporary directory <tempfile.mkdtemp>` 1023 to prevent e.g. exploiting pre-existing links, and to make it easier to 1024 clean up after a failed extraction. 1025* Disallow symbolic links if you do not need the functionality. 1026* When working with untrusted data, use external (e.g. OS-level) limits on 1027 disk, memory and CPU usage. 1028* Check filenames against an allow-list of characters 1029 (to filter out control characters, confusables, foreign path separators, 1030 etc.). 1031* Check that filenames have expected extensions (discouraging files that 1032 execute when you “click on them”, or extension-less files like Windows special device names). 1033* Limit the number of extracted files, total size of extracted data, 1034 filename length (including symlink length), and size of individual files. 1035* Check for files that would be shadowed on case-insensitive filesystems. 1036 1037Also note that: 1038 1039* Tar files may contain multiple versions of the same file. 1040 Later ones are expected to overwrite any earlier ones. 1041 This feature is crucial to allow updating tape archives, but can be abused 1042 maliciously. 1043* *tarfile* does not protect against issues with “live” data, 1044 e.g. an attacker tinkering with the destination (or source) directory while 1045 extraction (or archiving) is in progress. 1046 1047 1048Supporting older Python versions 1049~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1050 1051Extraction filters were added to Python 3.12, and are backported to older 1052versions as security updates. 1053To check whether the feature is available, use e.g. 1054``hasattr(tarfile, 'data_filter')`` rather than checking the Python version. 1055 1056The following examples show how to support Python versions with and without 1057the feature. 1058Note that setting ``extraction_filter`` will affect any subsequent operations. 1059 1060* Fully trusted archive:: 1061 1062 my_tarfile.extraction_filter = (lambda member, path: member) 1063 my_tarfile.extractall() 1064 1065* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior 1066 (``'fully_trusted'``) if this feature is not available:: 1067 1068 my_tarfile.extraction_filter = getattr(tarfile, 'data_filter', 1069 (lambda member, path: member)) 1070 my_tarfile.extractall() 1071 1072* Use the ``'data'`` filter; *fail* if it is not available:: 1073 1074 my_tarfile.extractall(filter=tarfile.data_filter) 1075 1076 or:: 1077 1078 my_tarfile.extraction_filter = tarfile.data_filter 1079 my_tarfile.extractall() 1080 1081* Use the ``'data'`` filter; *warn* if it is not available:: 1082 1083 if hasattr(tarfile, 'data_filter'): 1084 my_tarfile.extractall(filter='data') 1085 else: 1086 # remove this when no longer needed 1087 warn_the_user('Extracting may be unsafe; consider updating Python') 1088 my_tarfile.extractall() 1089 1090 1091Stateful extraction filter example 1092~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1093 1094While *tarfile*'s extraction methods take a simple *filter* callable, 1095custom filters may be more complex objects with an internal state. 1096It may be useful to write these as context managers, to be used like this:: 1097 1098 with StatefulFilter() as filter_func: 1099 tar.extractall(path, filter=filter_func) 1100 1101Such a filter can be written as, for example:: 1102 1103 class StatefulFilter: 1104 def __init__(self): 1105 self.file_count = 0 1106 1107 def __enter__(self): 1108 return self 1109 1110 def __call__(self, member, path): 1111 self.file_count += 1 1112 return member 1113 1114 def __exit__(self, *exc_info): 1115 print(f'{self.file_count} files extracted') 1116 1117 1118.. _tarfile-commandline: 1119.. program:: tarfile 1120 1121 1122Command-Line Interface 1123---------------------- 1124 1125.. versionadded:: 3.4 1126 1127The :mod:`tarfile` module provides a simple command-line interface to interact 1128with tar archives. 1129 1130If you want to create a new tar archive, specify its name after the :option:`-c` 1131option and then list the filename(s) that should be included: 1132 1133.. code-block:: shell-session 1134 1135 $ python -m tarfile -c monty.tar spam.txt eggs.txt 1136 1137Passing a directory is also acceptable: 1138 1139.. code-block:: shell-session 1140 1141 $ python -m tarfile -c monty.tar life-of-brian_1979/ 1142 1143If you want to extract a tar archive into the current directory, use 1144the :option:`-e` option: 1145 1146.. code-block:: shell-session 1147 1148 $ python -m tarfile -e monty.tar 1149 1150You can also extract a tar archive into a different directory by passing the 1151directory's name: 1152 1153.. code-block:: shell-session 1154 1155 $ python -m tarfile -e monty.tar other-dir/ 1156 1157For a list of the files in a tar archive, use the :option:`-l` option: 1158 1159.. code-block:: shell-session 1160 1161 $ python -m tarfile -l monty.tar 1162 1163 1164Command-line options 1165~~~~~~~~~~~~~~~~~~~~ 1166 1167.. cmdoption:: -l <tarfile> 1168 --list <tarfile> 1169 1170 List files in a tarfile. 1171 1172.. cmdoption:: -c <tarfile> <source1> ... <sourceN> 1173 --create <tarfile> <source1> ... <sourceN> 1174 1175 Create tarfile from source files. 1176 1177.. cmdoption:: -e <tarfile> [<output_dir>] 1178 --extract <tarfile> [<output_dir>] 1179 1180 Extract tarfile into the current directory if *output_dir* is not specified. 1181 1182.. cmdoption:: -t <tarfile> 1183 --test <tarfile> 1184 1185 Test whether the tarfile is valid or not. 1186 1187.. cmdoption:: -v, --verbose 1188 1189 Verbose output. 1190 1191.. cmdoption:: --filter <filtername> 1192 1193 Specifies the *filter* for ``--extract``. 1194 See :ref:`tarfile-extraction-filter` for details. 1195 Only string names are accepted (that is, ``fully_trusted``, ``tar``, 1196 and ``data``). 1197 1198 .. versionadded:: 3.11.4 1199 1200.. _tar-examples: 1201 1202Examples 1203-------- 1204 1205How to extract an entire tar archive to the current working directory:: 1206 1207 import tarfile 1208 tar = tarfile.open("sample.tar.gz") 1209 tar.extractall() 1210 tar.close() 1211 1212How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 1213a generator function instead of a list:: 1214 1215 import os 1216 import tarfile 1217 1218 def py_files(members): 1219 for tarinfo in members: 1220 if os.path.splitext(tarinfo.name)[1] == ".py": 1221 yield tarinfo 1222 1223 tar = tarfile.open("sample.tar.gz") 1224 tar.extractall(members=py_files(tar)) 1225 tar.close() 1226 1227How to create an uncompressed tar archive from a list of filenames:: 1228 1229 import tarfile 1230 tar = tarfile.open("sample.tar", "w") 1231 for name in ["foo", "bar", "quux"]: 1232 tar.add(name) 1233 tar.close() 1234 1235The same example using the :keyword:`with` statement:: 1236 1237 import tarfile 1238 with tarfile.open("sample.tar", "w") as tar: 1239 for name in ["foo", "bar", "quux"]: 1240 tar.add(name) 1241 1242How to read a gzip compressed tar archive and display some member information:: 1243 1244 import tarfile 1245 tar = tarfile.open("sample.tar.gz", "r:gz") 1246 for tarinfo in tar: 1247 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="") 1248 if tarinfo.isreg(): 1249 print("a regular file.") 1250 elif tarinfo.isdir(): 1251 print("a directory.") 1252 else: 1253 print("something else.") 1254 tar.close() 1255 1256How to create an archive and reset the user information using the *filter* 1257parameter in :meth:`TarFile.add`:: 1258 1259 import tarfile 1260 def reset(tarinfo): 1261 tarinfo.uid = tarinfo.gid = 0 1262 tarinfo.uname = tarinfo.gname = "root" 1263 return tarinfo 1264 tar = tarfile.open("sample.tar.gz", "w:gz") 1265 tar.add("foo", filter=reset) 1266 tar.close() 1267 1268 1269.. _tar-formats: 1270 1271Supported tar formats 1272--------------------- 1273 1274There are three tar formats that can be created with the :mod:`tarfile` module: 1275 1276* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 1277 up to a length of at best 256 characters and linknames up to 100 characters. 1278 The maximum file size is 8 GiB. This is an old and limited but widely 1279 supported format. 1280 1281* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 1282 linknames, files bigger than 8 GiB and sparse files. It is the de facto 1283 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 1284 extensions for long names, sparse file support is read-only. 1285 1286* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 1287 format with virtually no limits. It supports long filenames and linknames, large 1288 files and stores pathnames in a portable way. Modern tar implementations, 1289 including GNU tar, bsdtar/libarchive and star, fully support extended *pax* 1290 features; some old or unmaintained libraries may not, but should treat 1291 *pax* archives as if they were in the universally supported *ustar* format. 1292 It is the current default format for new archives. 1293 1294 It extends the existing *ustar* format with extra headers for information 1295 that cannot be stored otherwise. There are two flavours of pax headers: 1296 Extended headers only affect the subsequent file header, global 1297 headers are valid for the complete archive and affect all following files. 1298 All the data in a pax header is encoded in *UTF-8* for portability reasons. 1299 1300There are some more variants of the tar format which can be read, but not 1301created: 1302 1303* The ancient V7 format. This is the first tar format from Unix Seventh Edition, 1304 storing only regular files and directories. Names must not be longer than 100 1305 characters, there is no user/group name information. Some archives have 1306 miscalculated header checksums in case of fields with non-ASCII characters. 1307 1308* The SunOS tar extended format. This format is a variant of the POSIX.1-2001 1309 pax format, but is not compatible. 1310 1311.. _tar-unicode: 1312 1313Unicode issues 1314-------------- 1315 1316The tar format was originally conceived to make backups on tape drives with the 1317main focus on preserving file system information. Nowadays tar archives are 1318commonly used for file distribution and exchanging archives over networks. One 1319problem of the original format (which is the basis of all other formats) is 1320that there is no concept of supporting different character encodings. For 1321example, an ordinary tar archive created on a *UTF-8* system cannot be read 1322correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual 1323metadata (like filenames, linknames, user/group names) will appear damaged. 1324Unfortunately, there is no way to autodetect the encoding of an archive. The 1325pax format was designed to solve this problem. It stores non-ASCII metadata 1326using the universal character encoding *UTF-8*. 1327 1328The details of character conversion in :mod:`tarfile` are controlled by the 1329*encoding* and *errors* keyword arguments of the :class:`TarFile` class. 1330 1331*encoding* defines the character encoding to use for the metadata in the 1332archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` 1333as a fallback. Depending on whether the archive is read or written, the 1334metadata must be either decoded or encoded. If *encoding* is not set 1335appropriately, this conversion may fail. 1336 1337The *errors* argument defines how characters are treated that cannot be 1338converted. Possible values are listed in section :ref:`error-handlers`. 1339The default scheme is ``'surrogateescape'`` which Python also uses for its 1340file system calls, see :ref:`os-filenames`. 1341 1342For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed 1343because all the metadata is stored using *UTF-8*. *encoding* is only used in 1344the rare cases when binary pax headers are decoded or when strings with 1345surrogate characters are stored. 1346