1:mod:`!tarfile` --- Read and write tar archive files 2==================================================== 3 4.. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de> 8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de> 9 10**Source code:** :source:`Lib/tarfile.py` 11 12-------------- 13 14The :mod:`tarfile` module makes it possible to read and write tar 15archives, including those using gzip, bz2 and lzma compression. 16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 17higher-level functions in :ref:`shutil <archiving-operations>`. 18 19Some facts and figures: 20 21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives 22 if the respective modules are available. 23 24* read/write support for the POSIX.1-1988 (ustar) format. 25 26* read/write support for the GNU tar format including *longname* and *longlink* 27 extensions, read-only support for all variants of the *sparse* extension 28 including restoration of sparse files. 29 30* read/write support for the POSIX.1-2001 (pax) format. 31 32* handles directories, regular files, hardlinks, symbolic links, fifos, 33 character devices and block devices and is able to acquire and restore file 34 information like timestamp, access permissions and owner. 35 36.. versionchanged:: 3.3 37 Added support for :mod:`lzma` compression. 38 39.. versionchanged:: 3.12 40 Archives are extracted using a :ref:`filter <tarfile-extraction-filter>`, 41 which makes it possible to either limit surprising/dangerous features, 42 or to acknowledge that they are expected and the archive is fully trusted. 43 By default, archives are fully trusted, but this default is deprecated 44 and slated to change in Python 3.14. 45 46 47.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) 48 49 Return a :class:`TarFile` object for the pathname *name*. For detailed 50 information on :class:`TarFile` objects and the keyword arguments that are 51 allowed, see :ref:`tarfile-objects`. 52 53 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 54 to ``'r'``. Here is a full list of mode combinations: 55 56 +------------------+---------------------------------------------+ 57 | mode | action | 58 +==================+=============================================+ 59 | ``'r' or 'r:*'`` | Open for reading with transparent | 60 | | compression (recommended). | 61 +------------------+---------------------------------------------+ 62 | ``'r:'`` | Open for reading exclusively without | 63 | | compression. | 64 +------------------+---------------------------------------------+ 65 | ``'r:gz'`` | Open for reading with gzip compression. | 66 +------------------+---------------------------------------------+ 67 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 68 +------------------+---------------------------------------------+ 69 | ``'r:xz'`` | Open for reading with lzma compression. | 70 +------------------+---------------------------------------------+ 71 | ``'x'`` or | Create a tarfile exclusively without | 72 | ``'x:'`` | compression. | 73 | | Raise a :exc:`FileExistsError` exception | 74 | | if it already exists. | 75 +------------------+---------------------------------------------+ 76 | ``'x:gz'`` | Create a tarfile with gzip compression. | 77 | | Raise a :exc:`FileExistsError` exception | 78 | | if it already exists. | 79 +------------------+---------------------------------------------+ 80 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. | 81 | | Raise a :exc:`FileExistsError` exception | 82 | | if it already exists. | 83 +------------------+---------------------------------------------+ 84 | ``'x:xz'`` | Create a tarfile with lzma compression. | 85 | | Raise a :exc:`FileExistsError` exception | 86 | | if it already exists. | 87 +------------------+---------------------------------------------+ 88 | ``'a' or 'a:'`` | Open for appending with no compression. The | 89 | | file is created if it does not exist. | 90 +------------------+---------------------------------------------+ 91 | ``'w' or 'w:'`` | Open for uncompressed writing. | 92 +------------------+---------------------------------------------+ 93 | ``'w:gz'`` | Open for gzip compressed writing. | 94 +------------------+---------------------------------------------+ 95 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 96 +------------------+---------------------------------------------+ 97 | ``'w:xz'`` | Open for lzma compressed writing. | 98 +------------------+---------------------------------------------+ 99 100 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode* 101 is not suitable to open a certain (compressed) file for reading, 102 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a 103 compression method is not supported, :exc:`CompressionError` is raised. 104 105 If *fileobj* is specified, it is used as an alternative to a :term:`file object` 106 opened in binary mode for *name*. It is supposed to be at position 0. 107 108 For modes ``'w:gz'``, ``'x:gz'``, ``'w|gz'``, ``'w:bz2'``, ``'x:bz2'``, 109 ``'w|bz2'``, :func:`tarfile.open` accepts the keyword argument 110 *compresslevel* (default ``9``) to specify the compression level of the file. 111 112 For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the 113 keyword argument *preset* to specify the compression level of the file. 114 115 For special purposes, there is a second format for *mode*: 116 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 117 object that processes its data as a stream of blocks. No random seeking will 118 be done on the file. If given, *fileobj* may be any object that has a 119 :meth:`~io.RawIOBase.read` or :meth:`~io.RawIOBase.write` method 120 (depending on the *mode*) that works with bytes. 121 *bufsize* specifies the blocksize and defaults to ``20 * 512`` bytes. 122 Use this variant in combination with e.g. ``sys.stdin.buffer``, a socket 123 :term:`file object` or a tape device. 124 However, such a :class:`TarFile` object is limited in that it does 125 not allow random access, see :ref:`tar-examples`. The currently 126 possible modes: 127 128 +-------------+--------------------------------------------+ 129 | Mode | Action | 130 +=============+============================================+ 131 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 132 | | with transparent compression. | 133 +-------------+--------------------------------------------+ 134 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 135 | | for reading. | 136 +-------------+--------------------------------------------+ 137 | ``'r|gz'`` | Open a gzip compressed *stream* for | 138 | | reading. | 139 +-------------+--------------------------------------------+ 140 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 141 | | reading. | 142 +-------------+--------------------------------------------+ 143 | ``'r|xz'`` | Open an lzma compressed *stream* for | 144 | | reading. | 145 +-------------+--------------------------------------------+ 146 | ``'w|'`` | Open an uncompressed *stream* for writing. | 147 +-------------+--------------------------------------------+ 148 | ``'w|gz'`` | Open a gzip compressed *stream* for | 149 | | writing. | 150 +-------------+--------------------------------------------+ 151 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 152 | | writing. | 153 +-------------+--------------------------------------------+ 154 | ``'w|xz'`` | Open an lzma compressed *stream* for | 155 | | writing. | 156 +-------------+--------------------------------------------+ 157 158 .. versionchanged:: 3.5 159 The ``'x'`` (exclusive creation) mode was added. 160 161 .. versionchanged:: 3.6 162 The *name* parameter accepts a :term:`path-like object`. 163 164 .. versionchanged:: 3.12 165 The *compresslevel* keyword argument also works for streams. 166 167 168.. class:: TarFile 169 :noindex: 170 171 Class for reading and writing tar archives. Do not use this class directly: 172 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 173 174 175.. function:: is_tarfile(name) 176 177 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 178 module can read. *name* may be a :class:`str`, file, or file-like object. 179 180 .. versionchanged:: 3.9 181 Support for file and file-like objects. 182 183 184The :mod:`tarfile` module defines the following exceptions: 185 186 187.. exception:: TarError 188 189 Base class for all :mod:`tarfile` exceptions. 190 191 192.. exception:: ReadError 193 194 Is raised when a tar archive is opened, that either cannot be handled by the 195 :mod:`tarfile` module or is somehow invalid. 196 197 198.. exception:: CompressionError 199 200 Is raised when a compression method is not supported or when the data cannot be 201 decoded properly. 202 203 204.. exception:: StreamError 205 206 Is raised for the limitations that are typical for stream-like :class:`TarFile` 207 objects. 208 209 210.. exception:: ExtractError 211 212 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 213 :attr:`TarFile.errorlevel`\ ``== 2``. 214 215 216.. exception:: HeaderError 217 218 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 219 220 221.. exception:: FilterError 222 223 Base class for members :ref:`refused <tarfile-extraction-refuse>` by 224 filters. 225 226 .. attribute:: tarinfo 227 228 Information about the member that the filter refused to extract, 229 as :ref:`TarInfo <tarinfo-objects>`. 230 231.. exception:: AbsolutePathError 232 233 Raised to refuse extracting a member with an absolute path. 234 235.. exception:: OutsideDestinationError 236 237 Raised to refuse extracting a member outside the destination directory. 238 239.. exception:: SpecialFileError 240 241 Raised to refuse extracting a special file (e.g. a device or pipe). 242 243.. exception:: AbsoluteLinkError 244 245 Raised to refuse extracting a symbolic link with an absolute path. 246 247.. exception:: LinkOutsideDestinationError 248 249 Raised to refuse extracting a symbolic link pointing outside the destination 250 directory. 251 252 253The following constants are available at the module level: 254 255.. data:: ENCODING 256 257 The default character encoding: ``'utf-8'`` on Windows, the value returned by 258 :func:`sys.getfilesystemencoding` otherwise. 259 260.. data:: REGTYPE 261 AREGTYPE 262 263 A regular file :attr:`~TarInfo.type`. 264 265.. data:: LNKTYPE 266 267 A link (inside tarfile) :attr:`~TarInfo.type`. 268 269.. data:: SYMTYPE 270 271 A symbolic link :attr:`~TarInfo.type`. 272 273.. data:: CHRTYPE 274 275 A character special device :attr:`~TarInfo.type`. 276 277.. data:: BLKTYPE 278 279 A block special device :attr:`~TarInfo.type`. 280 281.. data:: DIRTYPE 282 283 A directory :attr:`~TarInfo.type`. 284 285.. data:: FIFOTYPE 286 287 A FIFO special device :attr:`~TarInfo.type`. 288 289.. data:: CONTTYPE 290 291 A contiguous file :attr:`~TarInfo.type`. 292 293.. data:: GNUTYPE_LONGNAME 294 295 A GNU tar longname :attr:`~TarInfo.type`. 296 297.. data:: GNUTYPE_LONGLINK 298 299 A GNU tar longlink :attr:`~TarInfo.type`. 300 301.. data:: GNUTYPE_SPARSE 302 303 A GNU tar sparse file :attr:`~TarInfo.type`. 304 305 306Each of the following constants defines a tar archive format that the 307:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 308details. 309 310 311.. data:: USTAR_FORMAT 312 313 POSIX.1-1988 (ustar) format. 314 315 316.. data:: GNU_FORMAT 317 318 GNU tar format. 319 320 321.. data:: PAX_FORMAT 322 323 POSIX.1-2001 (pax) format. 324 325 326.. data:: DEFAULT_FORMAT 327 328 The default format for creating archives. This is currently :const:`PAX_FORMAT`. 329 330 .. versionchanged:: 3.8 331 The default format for new archives was changed to 332 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`. 333 334 335.. seealso:: 336 337 Module :mod:`zipfile` 338 Documentation of the :mod:`zipfile` standard module. 339 340 :ref:`archiving-operations` 341 Documentation of the higher-level archiving facilities provided by the 342 standard :mod:`shutil` module. 343 344 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 345 Documentation for tar archive files, including GNU tar extensions. 346 347 348.. _tarfile-objects: 349 350TarFile Objects 351--------------- 352 353The :class:`TarFile` object provides an interface to a tar archive. A tar 354archive is a sequence of blocks. An archive member (a stored file) is made up of 355a header block followed by data blocks. It is possible to store a file in a tar 356archive several times. Each archive member is represented by a :class:`TarInfo` 357object, see :ref:`tarinfo-objects` for details. 358 359A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 360statement. It will automatically be closed when the block is completed. Please 361note that in the event of an exception an archive opened for writing will not 362be finalized; only the internally used file object will be closed. See the 363:ref:`tar-examples` section for a use case. 364 365.. versionadded:: 3.2 366 Added support for the context management protocol. 367 368.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1, stream=False) 369 370 All following arguments are optional and can be accessed as instance attributes 371 as well. 372 373 *name* is the pathname of the archive. *name* may be a :term:`path-like object`. 374 It can be omitted if *fileobj* is given. 375 In this case, the file object's :attr:`!name` attribute is used if it exists. 376 377 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 378 data to an existing file, ``'w'`` to create a new file overwriting an existing 379 one, or ``'x'`` to create a new file only if it does not already exist. 380 381 If *fileobj* is given, it is used for reading or writing data. If it can be 382 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 383 from position 0. 384 385 .. note:: 386 387 *fileobj* is not closed, when :class:`TarFile` is closed. 388 389 *format* controls the archive format for writing. It must be one of the constants 390 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 391 defined at module level. When reading, format will be automatically detected, even 392 if different formats are present in a single archive. 393 394 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 395 with a different one. 396 397 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 398 is :const:`True`, add the content of the target files to the archive. This has no 399 effect on systems that do not support symbolic links. 400 401 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 402 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 403 as possible. This is only useful for reading concatenated or damaged archives. 404 405 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 406 messages). The messages are written to ``sys.stderr``. 407 408 *errorlevel* controls how extraction errors are handled, 409 see :attr:`the corresponding attribute <TarFile.errorlevel>`. 410 411 The *encoding* and *errors* arguments define the character encoding to be 412 used for reading or writing the archive and how conversion errors are going 413 to be handled. The default settings will work for most users. 414 See section :ref:`tar-unicode` for in-depth information. 415 416 The *pax_headers* argument is an optional dictionary of strings which 417 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 418 419 If *stream* is set to :const:`True` then while reading the archive info about files 420 in the archive are not cached, saving memory. 421 422 .. versionchanged:: 3.2 423 Use ``'surrogateescape'`` as the default for the *errors* argument. 424 425 .. versionchanged:: 3.5 426 The ``'x'`` (exclusive creation) mode was added. 427 428 .. versionchanged:: 3.6 429 The *name* parameter accepts a :term:`path-like object`. 430 431 .. versionchanged:: 3.13 432 Add the *stream* parameter. 433 434.. classmethod:: TarFile.open(...) 435 436 Alternative constructor. The :func:`tarfile.open` function is actually a 437 shortcut to this classmethod. 438 439 440.. method:: TarFile.getmember(name) 441 442 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 443 in the archive, :exc:`KeyError` is raised. 444 445 .. note:: 446 447 If a member occurs more than once in the archive, its last occurrence is assumed 448 to be the most up-to-date version. 449 450 451.. method:: TarFile.getmembers() 452 453 Return the members of the archive as a list of :class:`TarInfo` objects. The 454 list has the same order as the members in the archive. 455 456 457.. method:: TarFile.getnames() 458 459 Return the members as a list of their names. It has the same order as the list 460 returned by :meth:`getmembers`. 461 462 463.. method:: TarFile.list(verbose=True, *, members=None) 464 465 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 466 only the names of the members are printed. If it is :const:`True`, output 467 similar to that of :program:`ls -l` is produced. If optional *members* is 468 given, it must be a subset of the list returned by :meth:`getmembers`. 469 470 .. versionchanged:: 3.5 471 Added the *members* parameter. 472 473 474.. method:: TarFile.next() 475 476 Return the next member of the archive as a :class:`TarInfo` object, when 477 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 478 available. 479 480 481.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None) 482 483 Extract all members from the archive to the current working directory or 484 directory *path*. If optional *members* is given, it must be a subset of the 485 list returned by :meth:`getmembers`. Directory information like owner, 486 modification time and permissions are set after all members have been extracted. 487 This is done to work around two problems: A directory's modification time is 488 reset each time a file is created in it. And, if a directory's permissions do 489 not allow writing, extracting files to it will fail. 490 491 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 492 are used to set the owner/group for the extracted files. Otherwise, the named 493 values from the tarfile are used. 494 495 The *filter* argument specifies how ``members`` are modified or rejected 496 before extraction. 497 See :ref:`tarfile-extraction-filter` for details. 498 It is recommended to set this explicitly depending on which *tar* features 499 you need to support. 500 501 .. warning:: 502 503 Never extract archives from untrusted sources without prior inspection. 504 It is possible that files are created outside of *path*, e.g. members 505 that have absolute filenames starting with ``"/"`` or filenames with two 506 dots ``".."``. 507 508 Set ``filter='data'`` to prevent the most dangerous security issues, 509 and read the :ref:`tarfile-extraction-filter` section for details. 510 511 .. versionchanged:: 3.5 512 Added the *numeric_owner* parameter. 513 514 .. versionchanged:: 3.6 515 The *path* parameter accepts a :term:`path-like object`. 516 517 .. versionchanged:: 3.12 518 Added the *filter* parameter. 519 520 521.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None) 522 523 Extract a member from the archive to the current working directory, using its 524 full name. Its file information is extracted as accurately as possible. *member* 525 may be a filename or a :class:`TarInfo` object. You can specify a different 526 directory using *path*. *path* may be a :term:`path-like object`. 527 File attributes (owner, mtime, mode) are set unless *set_attrs* is false. 528 529 The *numeric_owner* and *filter* arguments are the same as 530 for :meth:`extractall`. 531 532 .. note:: 533 534 The :meth:`extract` method does not take care of several extraction issues. 535 In most cases you should consider using the :meth:`extractall` method. 536 537 .. warning:: 538 539 See the warning for :meth:`extractall`. 540 541 Set ``filter='data'`` to prevent the most dangerous security issues, 542 and read the :ref:`tarfile-extraction-filter` section for details. 543 544 .. versionchanged:: 3.2 545 Added the *set_attrs* parameter. 546 547 .. versionchanged:: 3.5 548 Added the *numeric_owner* parameter. 549 550 .. versionchanged:: 3.6 551 The *path* parameter accepts a :term:`path-like object`. 552 553 .. versionchanged:: 3.12 554 Added the *filter* parameter. 555 556 557.. method:: TarFile.extractfile(member) 558 559 Extract a member from the archive as a file object. *member* may be 560 a filename or a :class:`TarInfo` object. If *member* is a regular file or 561 a link, an :class:`io.BufferedReader` object is returned. For all other 562 existing members, :const:`None` is returned. If *member* does not appear 563 in the archive, :exc:`KeyError` is raised. 564 565 .. versionchanged:: 3.3 566 Return an :class:`io.BufferedReader` object. 567 568 .. versionchanged:: 3.13 569 The returned :class:`io.BufferedReader` object has the :attr:`!mode` 570 attribute which is always equal to ``'rb'``. 571 572.. attribute:: TarFile.errorlevel 573 :type: int 574 575 If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract` 576 and :meth:`TarFile.extractall`. 577 Nevertheless, they appear as error messages in the debug output when 578 *debug* is greater than 0. 579 If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or 580 :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised 581 as :exc:`TarError` exceptions as well. 582 583 Some exceptions, e.g. ones caused by wrong argument types or data 584 corruption, are always raised. 585 586 Custom :ref:`extraction filters <tarfile-extraction-filter>` 587 should raise :exc:`FilterError` for *fatal* errors 588 and :exc:`ExtractError` for *non-fatal* ones. 589 590 Note that when an exception is raised, the archive may be partially 591 extracted. It is the user’s responsibility to clean up. 592 593.. attribute:: TarFile.extraction_filter 594 595 .. versionadded:: 3.12 596 597 The :ref:`extraction filter <tarfile-extraction-filter>` used 598 as a default for the *filter* argument of :meth:`~TarFile.extract` 599 and :meth:`~TarFile.extractall`. 600 601 The attribute may be ``None`` or a callable. 602 String names are not allowed for this attribute, unlike the *filter* 603 argument to :meth:`~TarFile.extract`. 604 605 If ``extraction_filter`` is ``None`` (the default), 606 calling an extraction method without a *filter* argument will raise a 607 ``DeprecationWarning``, 608 and fall back to the :func:`fully_trusted <fully_trusted_filter>` filter, 609 whose dangerous behavior matches previous versions of Python. 610 611 In Python 3.14+, leaving ``extraction_filter=None`` will cause 612 extraction methods to use the :func:`data <data_filter>` filter by default. 613 614 The attribute may be set on instances or overridden in subclasses. 615 It also is possible to set it on the ``TarFile`` class itself to set a 616 global default, although, since it affects all uses of *tarfile*, 617 it is best practice to only do so in top-level applications or 618 :mod:`site configuration <site>`. 619 To set a global default this way, a filter function needs to be wrapped in 620 :func:`staticmethod` to prevent injection of a ``self`` argument. 621 622.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None) 623 624 Add the file *name* to the archive. *name* may be any type of file 625 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an 626 alternative name for the file in the archive. Directories are added 627 recursively by default. This can be avoided by setting *recursive* to 628 :const:`False`. Recursion adds entries in sorted order. 629 If *filter* is given, it 630 should be a function that takes a :class:`TarInfo` object argument and 631 returns the changed :class:`TarInfo` object. If it instead returns 632 :const:`None` the :class:`TarInfo` object will be excluded from the 633 archive. See :ref:`tar-examples` for an example. 634 635 .. versionchanged:: 3.2 636 Added the *filter* parameter. 637 638 .. versionchanged:: 3.7 639 Recursion adds entries in sorted order. 640 641 642.. method:: TarFile.addfile(tarinfo, fileobj=None) 643 644 Add the :class:`TarInfo` object *tarinfo* to the archive. If *tarinfo* represents 645 a non zero-size regular file, the *fileobj* argument should be a :term:`binary file`, 646 and ``tarinfo.size`` bytes are read from it and added to the archive. You can 647 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 648 649 .. versionchanged:: 3.13 650 651 *fileobj* must be given for non-zero-sized regular files. 652 653 654.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 655 656 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 657 equivalent on an existing file. The file is either named by *name*, or 658 specified as a :term:`file object` *fileobj* with a file descriptor. 659 *name* may be a :term:`path-like object`. If 660 given, *arcname* specifies an alternative name for the file in the 661 archive, otherwise, the name is taken from *fileobj*’s 662 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name 663 should be a text string. 664 665 You can modify 666 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`. 667 If the file object is not an ordinary file object positioned at the 668 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 669 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 670 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 671 could be a dummy string. 672 673 .. versionchanged:: 3.6 674 The *name* parameter accepts a :term:`path-like object`. 675 676 677.. method:: TarFile.close() 678 679 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 680 appended to the archive. 681 682 683.. attribute:: TarFile.pax_headers 684 :type: dict 685 686 A dictionary containing key-value pairs of pax global headers. 687 688 689 690.. _tarinfo-objects: 691 692TarInfo Objects 693--------------- 694 695A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 696from storing all required attributes of a file (like file type, size, time, 697permissions, owner etc.), it provides some useful methods to determine its type. 698It does *not* contain the file's data itself. 699 700:class:`TarInfo` objects are returned by :class:`TarFile`'s methods 701:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and 702:meth:`~TarFile.gettarinfo`. 703 704Modifying the objects returned by :meth:`~TarFile.getmember` or 705:meth:`~TarFile.getmembers` will affect all subsequent 706operations on the archive. 707For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or 708call the :meth:`~TarInfo.replace` method to create a modified copy in one step. 709 710Several attributes can be set to ``None`` to indicate that a piece of metadata 711is unused or unknown. 712Different :class:`TarInfo` methods handle ``None`` differently: 713 714- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will 715 ignore the corresponding metadata, leaving it set to a default. 716- :meth:`~TarFile.addfile` will fail. 717- :meth:`~TarFile.list` will print a placeholder string. 718 719.. class:: TarInfo(name="") 720 721 Create a :class:`TarInfo` object. 722 723 724.. classmethod:: TarInfo.frombuf(buf, encoding, errors) 725 726 Create and return a :class:`TarInfo` object from string buffer *buf*. 727 728 Raises :exc:`HeaderError` if the buffer is invalid. 729 730 731.. classmethod:: TarInfo.fromtarfile(tarfile) 732 733 Read the next member from the :class:`TarFile` object *tarfile* and return it as 734 a :class:`TarInfo` object. 735 736 737.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') 738 739 Create a string buffer from a :class:`TarInfo` object. For information on the 740 arguments see the constructor of the :class:`TarFile` class. 741 742 .. versionchanged:: 3.2 743 Use ``'surrogateescape'`` as the default for the *errors* argument. 744 745 746A ``TarInfo`` object has the following public data attributes: 747 748 749.. attribute:: TarInfo.name 750 :type: str 751 752 Name of the archive member. 753 754 755.. attribute:: TarInfo.size 756 :type: int 757 758 Size in bytes. 759 760 761.. attribute:: TarInfo.mtime 762 :type: int | float 763 764 Time of last modification in seconds since the :ref:`epoch <epoch>`, 765 as in :attr:`os.stat_result.st_mtime`. 766 767 .. versionchanged:: 3.12 768 769 Can be set to ``None`` for :meth:`~TarFile.extract` and 770 :meth:`~TarFile.extractall`, causing extraction to skip applying this 771 attribute. 772 773.. attribute:: TarInfo.mode 774 :type: int 775 776 Permission bits, as for :func:`os.chmod`. 777 778 .. versionchanged:: 3.12 779 780 Can be set to ``None`` for :meth:`~TarFile.extract` and 781 :meth:`~TarFile.extractall`, causing extraction to skip applying this 782 attribute. 783 784.. attribute:: TarInfo.type 785 786 File type. *type* is usually one of these constants: :const:`REGTYPE`, 787 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 788 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 789 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 790 more conveniently, use the ``is*()`` methods below. 791 792 793.. attribute:: TarInfo.linkname 794 :type: str 795 796 Name of the target file name, which is only present in :class:`TarInfo` objects 797 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 798 799 For symbolic links (``SYMTYPE``), the *linkname* is relative to the directory 800 that contains the link. 801 For hard links (``LNKTYPE``), the *linkname* is relative to the root of 802 the archive. 803 804 805.. attribute:: TarInfo.uid 806 :type: int 807 808 User ID of the user who originally stored this member. 809 810 .. versionchanged:: 3.12 811 812 Can be set to ``None`` for :meth:`~TarFile.extract` and 813 :meth:`~TarFile.extractall`, causing extraction to skip applying this 814 attribute. 815 816.. attribute:: TarInfo.gid 817 :type: int 818 819 Group ID of the user who originally stored this member. 820 821 .. versionchanged:: 3.12 822 823 Can be set to ``None`` for :meth:`~TarFile.extract` and 824 :meth:`~TarFile.extractall`, causing extraction to skip applying this 825 attribute. 826 827.. attribute:: TarInfo.uname 828 :type: str 829 830 User name. 831 832 .. versionchanged:: 3.12 833 834 Can be set to ``None`` for :meth:`~TarFile.extract` and 835 :meth:`~TarFile.extractall`, causing extraction to skip applying this 836 attribute. 837 838.. attribute:: TarInfo.gname 839 :type: str 840 841 Group name. 842 843 .. versionchanged:: 3.12 844 845 Can be set to ``None`` for :meth:`~TarFile.extract` and 846 :meth:`~TarFile.extractall`, causing extraction to skip applying this 847 attribute. 848 849.. attribute:: TarInfo.chksum 850 :type: int 851 852 Header checksum. 853 854 855.. attribute:: TarInfo.devmajor 856 :type: int 857 858 Device major number. 859 860 861.. attribute:: TarInfo.devminor 862 :type: int 863 864 Device minor number. 865 866 867.. attribute:: TarInfo.offset 868 :type: int 869 870 The tar header starts here. 871 872 873.. attribute:: TarInfo.offset_data 874 :type: int 875 876 The file's data starts here. 877 878 879.. attribute:: TarInfo.sparse 880 881 Sparse member information. 882 883 884.. attribute:: TarInfo.pax_headers 885 :type: dict 886 887 A dictionary containing key-value pairs of an associated pax extended header. 888 889.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=..., \ 890 uid=..., gid=..., uname=..., gname=..., \ 891 deep=True) 892 893 .. versionadded:: 3.12 894 895 Return a *new* copy of the :class:`!TarInfo` object with the given attributes 896 changed. For example, to return a ``TarInfo`` with the group name set to 897 ``'staff'``, use:: 898 899 new_tarinfo = old_tarinfo.replace(gname='staff') 900 901 By default, a deep copy is made. 902 If *deep* is false, the copy is shallow, i.e. ``pax_headers`` 903 and any custom attributes are shared with the original ``TarInfo`` object. 904 905A :class:`TarInfo` object also provides some convenient query methods: 906 907 908.. method:: TarInfo.isfile() 909 910 Return :const:`True` if the :class:`TarInfo` object is a regular file. 911 912 913.. method:: TarInfo.isreg() 914 915 Same as :meth:`isfile`. 916 917 918.. method:: TarInfo.isdir() 919 920 Return :const:`True` if it is a directory. 921 922 923.. method:: TarInfo.issym() 924 925 Return :const:`True` if it is a symbolic link. 926 927 928.. method:: TarInfo.islnk() 929 930 Return :const:`True` if it is a hard link. 931 932 933.. method:: TarInfo.ischr() 934 935 Return :const:`True` if it is a character device. 936 937 938.. method:: TarInfo.isblk() 939 940 Return :const:`True` if it is a block device. 941 942 943.. method:: TarInfo.isfifo() 944 945 Return :const:`True` if it is a FIFO. 946 947 948.. method:: TarInfo.isdev() 949 950 Return :const:`True` if it is one of character device, block device or FIFO. 951 952 953.. _tarfile-extraction-filter: 954 955Extraction filters 956------------------ 957 958.. versionadded:: 3.12 959 960The *tar* format is designed to capture all details of a UNIX-like filesystem, 961which makes it very powerful. 962Unfortunately, the features make it easy to create tar files that have 963unintended -- and possibly malicious -- effects when extracted. 964For example, extracting a tar file can overwrite arbitrary files in various 965ways (e.g. by using absolute paths, ``..`` path components, or symlinks that 966affect later members). 967 968In most cases, the full functionality is not needed. 969Therefore, *tarfile* supports extraction filters: a mechanism to limit 970functionality, and thus mitigate some of the security issues. 971 972.. seealso:: 973 974 :pep:`706` 975 Contains further motivation and rationale behind the design. 976 977The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall` 978can be: 979 980* the string ``'fully_trusted'``: Honor all metadata as specified in the 981 archive. 982 Should be used if the user trusts the archive completely, or implements 983 their own complex verification. 984 985* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of 986 UNIX-like filesystems), but block features that are very likely to be 987 surprising or malicious. See :func:`tar_filter` for details. 988 989* the string ``'data'``: Ignore or block most features specific to UNIX-like 990 filesystems. Intended for extracting cross-platform data archives. 991 See :func:`data_filter` for details. 992 993* ``None`` (default): Use :attr:`TarFile.extraction_filter`. 994 995 If that is also ``None`` (the default), raise a ``DeprecationWarning``, 996 and fall back to the ``'fully_trusted'`` filter, whose dangerous behavior 997 matches previous versions of Python. 998 999 In Python 3.14, the ``'data'`` filter will become the default instead. 1000 It's possible to switch earlier; see :attr:`TarFile.extraction_filter`. 1001 1002* A callable which will be called for each extracted member with a 1003 :ref:`TarInfo <tarinfo-objects>` describing the member and the destination 1004 path to where the archive is extracted (i.e. the same path is used for all 1005 members):: 1006 1007 filter(member: TarInfo, path: str, /) -> TarInfo | None 1008 1009 The callable is called just before each member is extracted, so it can 1010 take the current state of the disk into account. 1011 It can: 1012 1013 - return a :class:`TarInfo` object which will be used instead of the metadata 1014 in the archive, or 1015 - return ``None``, in which case the member will be skipped, or 1016 - raise an exception to abort the operation or skip the member, 1017 depending on :attr:`~TarFile.errorlevel`. 1018 Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave 1019 the archive partially extracted. It does not attempt to clean up. 1020 1021Default named filters 1022~~~~~~~~~~~~~~~~~~~~~ 1023 1024The pre-defined, named filters are available as functions, so they can be 1025reused in custom filters: 1026 1027.. function:: fully_trusted_filter(member, path) 1028 1029 Return *member* unchanged. 1030 1031 This implements the ``'fully_trusted'`` filter. 1032 1033.. function:: tar_filter(member, path) 1034 1035 Implements the ``'tar'`` filter. 1036 1037 - Strip leading slashes (``/`` and :data:`os.sep`) from filenames. 1038 - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute 1039 paths (in case the name is absolute 1040 even after stripping slashes, e.g. ``C:/foo`` on Windows). 1041 This raises :class:`~tarfile.AbsolutePathError`. 1042 - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute 1043 path (after following symlinks) would end up outside the destination. 1044 This raises :class:`~tarfile.OutsideDestinationError`. 1045 - Clear high mode bits (setuid, setgid, sticky) and group/other write bits 1046 (:const:`~stat.S_IWGRP` | :const:`~stat.S_IWOTH`). 1047 1048 Return the modified ``TarInfo`` member. 1049 1050.. function:: data_filter(member, path) 1051 1052 Implements the ``'data'`` filter. 1053 In addition to what ``tar_filter`` does: 1054 1055 - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft) 1056 that link to absolute paths, or ones that link outside the destination. 1057 1058 This raises :class:`~tarfile.AbsoluteLinkError` or 1059 :class:`~tarfile.LinkOutsideDestinationError`. 1060 1061 Note that such files are refused even on platforms that do not support 1062 symbolic links. 1063 1064 - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files 1065 (including pipes). 1066 This raises :class:`~tarfile.SpecialFileError`. 1067 1068 - For regular files, including hard links: 1069 1070 - Set the owner read and write permissions 1071 (:const:`~stat.S_IRUSR` | :const:`~stat.S_IWUSR`). 1072 - Remove the group & other executable permission 1073 (:const:`~stat.S_IXGRP` | :const:`~stat.S_IXOTH`) 1074 if the owner doesn’t have it (:const:`~stat.S_IXUSR`). 1075 1076 - For other files (directories), set ``mode`` to ``None``, so 1077 that extraction methods skip applying permission bits. 1078 - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``) 1079 to ``None``, so that extraction methods skip setting it. 1080 1081 Return the modified ``TarInfo`` member. 1082 1083 1084.. _tarfile-extraction-refuse: 1085 1086Filter errors 1087~~~~~~~~~~~~~ 1088 1089When a filter refuses to extract a file, it will raise an appropriate exception, 1090a subclass of :class:`~tarfile.FilterError`. 1091This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more. 1092With ``errorlevel=0`` the error will be logged and the member will be skipped, 1093but extraction will continue. 1094 1095 1096Hints for further verification 1097~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1098 1099Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted 1100files without prior inspection. 1101Among other issues, the pre-defined filters do not prevent denial-of-service 1102attacks. Users should do additional checks. 1103 1104Here is an incomplete list of things to consider: 1105 1106* Extract to a :func:`new temporary directory <tempfile.mkdtemp>` 1107 to prevent e.g. exploiting pre-existing links, and to make it easier to 1108 clean up after a failed extraction. 1109* When working with untrusted data, use external (e.g. OS-level) limits on 1110 disk, memory and CPU usage. 1111* Check filenames against an allow-list of characters 1112 (to filter out control characters, confusables, foreign path separators, 1113 etc.). 1114* Check that filenames have expected extensions (discouraging files that 1115 execute when you “click on them”, or extension-less files like Windows special device names). 1116* Limit the number of extracted files, total size of extracted data, 1117 filename length (including symlink length), and size of individual files. 1118* Check for files that would be shadowed on case-insensitive filesystems. 1119 1120Also note that: 1121 1122* Tar files may contain multiple versions of the same file. 1123 Later ones are expected to overwrite any earlier ones. 1124 This feature is crucial to allow updating tape archives, but can be abused 1125 maliciously. 1126* *tarfile* does not protect against issues with “live” data, 1127 e.g. an attacker tinkering with the destination (or source) directory while 1128 extraction (or archiving) is in progress. 1129 1130 1131Supporting older Python versions 1132~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1133 1134Extraction filters were added to Python 3.12, but may be backported to older 1135versions as security updates. 1136To check whether the feature is available, use e.g. 1137``hasattr(tarfile, 'data_filter')`` rather than checking the Python version. 1138 1139The following examples show how to support Python versions with and without 1140the feature. 1141Note that setting ``extraction_filter`` will affect any subsequent operations. 1142 1143* Fully trusted archive:: 1144 1145 my_tarfile.extraction_filter = (lambda member, path: member) 1146 my_tarfile.extractall() 1147 1148* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior 1149 (``'fully_trusted'``) if this feature is not available:: 1150 1151 my_tarfile.extraction_filter = getattr(tarfile, 'data_filter', 1152 (lambda member, path: member)) 1153 my_tarfile.extractall() 1154 1155* Use the ``'data'`` filter; *fail* if it is not available:: 1156 1157 my_tarfile.extractall(filter=tarfile.data_filter) 1158 1159 or:: 1160 1161 my_tarfile.extraction_filter = tarfile.data_filter 1162 my_tarfile.extractall() 1163 1164* Use the ``'data'`` filter; *warn* if it is not available:: 1165 1166 if hasattr(tarfile, 'data_filter'): 1167 my_tarfile.extractall(filter='data') 1168 else: 1169 # remove this when no longer needed 1170 warn_the_user('Extracting may be unsafe; consider updating Python') 1171 my_tarfile.extractall() 1172 1173 1174Stateful extraction filter example 1175~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1176 1177While *tarfile*'s extraction methods take a simple *filter* callable, 1178custom filters may be more complex objects with an internal state. 1179It may be useful to write these as context managers, to be used like this:: 1180 1181 with StatefulFilter() as filter_func: 1182 tar.extractall(path, filter=filter_func) 1183 1184Such a filter can be written as, for example:: 1185 1186 class StatefulFilter: 1187 def __init__(self): 1188 self.file_count = 0 1189 1190 def __enter__(self): 1191 return self 1192 1193 def __call__(self, member, path): 1194 self.file_count += 1 1195 return member 1196 1197 def __exit__(self, *exc_info): 1198 print(f'{self.file_count} files extracted') 1199 1200 1201.. _tarfile-commandline: 1202.. program:: tarfile 1203 1204 1205Command-Line Interface 1206---------------------- 1207 1208.. versionadded:: 3.4 1209 1210The :mod:`tarfile` module provides a simple command-line interface to interact 1211with tar archives. 1212 1213If you want to create a new tar archive, specify its name after the :option:`-c` 1214option and then list the filename(s) that should be included: 1215 1216.. code-block:: shell-session 1217 1218 $ python -m tarfile -c monty.tar spam.txt eggs.txt 1219 1220Passing a directory is also acceptable: 1221 1222.. code-block:: shell-session 1223 1224 $ python -m tarfile -c monty.tar life-of-brian_1979/ 1225 1226If you want to extract a tar archive into the current directory, use 1227the :option:`-e` option: 1228 1229.. code-block:: shell-session 1230 1231 $ python -m tarfile -e monty.tar 1232 1233You can also extract a tar archive into a different directory by passing the 1234directory's name: 1235 1236.. code-block:: shell-session 1237 1238 $ python -m tarfile -e monty.tar other-dir/ 1239 1240For a list of the files in a tar archive, use the :option:`-l` option: 1241 1242.. code-block:: shell-session 1243 1244 $ python -m tarfile -l monty.tar 1245 1246 1247Command-line options 1248~~~~~~~~~~~~~~~~~~~~ 1249 1250.. option:: -l <tarfile> 1251 --list <tarfile> 1252 1253 List files in a tarfile. 1254 1255.. option:: -c <tarfile> <source1> ... <sourceN> 1256 --create <tarfile> <source1> ... <sourceN> 1257 1258 Create tarfile from source files. 1259 1260.. option:: -e <tarfile> [<output_dir>] 1261 --extract <tarfile> [<output_dir>] 1262 1263 Extract tarfile into the current directory if *output_dir* is not specified. 1264 1265.. option:: -t <tarfile> 1266 --test <tarfile> 1267 1268 Test whether the tarfile is valid or not. 1269 1270.. option:: -v, --verbose 1271 1272 Verbose output. 1273 1274.. option:: --filter <filtername> 1275 1276 Specifies the *filter* for ``--extract``. 1277 See :ref:`tarfile-extraction-filter` for details. 1278 Only string names are accepted (that is, ``fully_trusted``, ``tar``, 1279 and ``data``). 1280 1281.. _tar-examples: 1282 1283Examples 1284-------- 1285 1286How to extract an entire tar archive to the current working directory:: 1287 1288 import tarfile 1289 tar = tarfile.open("sample.tar.gz") 1290 tar.extractall(filter='data') 1291 tar.close() 1292 1293How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 1294a generator function instead of a list:: 1295 1296 import os 1297 import tarfile 1298 1299 def py_files(members): 1300 for tarinfo in members: 1301 if os.path.splitext(tarinfo.name)[1] == ".py": 1302 yield tarinfo 1303 1304 tar = tarfile.open("sample.tar.gz") 1305 tar.extractall(members=py_files(tar)) 1306 tar.close() 1307 1308How to create an uncompressed tar archive from a list of filenames:: 1309 1310 import tarfile 1311 tar = tarfile.open("sample.tar", "w") 1312 for name in ["foo", "bar", "quux"]: 1313 tar.add(name) 1314 tar.close() 1315 1316The same example using the :keyword:`with` statement:: 1317 1318 import tarfile 1319 with tarfile.open("sample.tar", "w") as tar: 1320 for name in ["foo", "bar", "quux"]: 1321 tar.add(name) 1322 1323How to read a gzip compressed tar archive and display some member information:: 1324 1325 import tarfile 1326 tar = tarfile.open("sample.tar.gz", "r:gz") 1327 for tarinfo in tar: 1328 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="") 1329 if tarinfo.isreg(): 1330 print("a regular file.") 1331 elif tarinfo.isdir(): 1332 print("a directory.") 1333 else: 1334 print("something else.") 1335 tar.close() 1336 1337How to create an archive and reset the user information using the *filter* 1338parameter in :meth:`TarFile.add`:: 1339 1340 import tarfile 1341 def reset(tarinfo): 1342 tarinfo.uid = tarinfo.gid = 0 1343 tarinfo.uname = tarinfo.gname = "root" 1344 return tarinfo 1345 tar = tarfile.open("sample.tar.gz", "w:gz") 1346 tar.add("foo", filter=reset) 1347 tar.close() 1348 1349 1350.. _tar-formats: 1351 1352Supported tar formats 1353--------------------- 1354 1355There are three tar formats that can be created with the :mod:`tarfile` module: 1356 1357* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 1358 up to a length of at best 256 characters and linknames up to 100 characters. 1359 The maximum file size is 8 GiB. This is an old and limited but widely 1360 supported format. 1361 1362* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 1363 linknames, files bigger than 8 GiB and sparse files. It is the de facto 1364 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 1365 extensions for long names, sparse file support is read-only. 1366 1367* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 1368 format with virtually no limits. It supports long filenames and linknames, large 1369 files and stores pathnames in a portable way. Modern tar implementations, 1370 including GNU tar, bsdtar/libarchive and star, fully support extended *pax* 1371 features; some old or unmaintained libraries may not, but should treat 1372 *pax* archives as if they were in the universally supported *ustar* format. 1373 It is the current default format for new archives. 1374 1375 It extends the existing *ustar* format with extra headers for information 1376 that cannot be stored otherwise. There are two flavours of pax headers: 1377 Extended headers only affect the subsequent file header, global 1378 headers are valid for the complete archive and affect all following files. 1379 All the data in a pax header is encoded in *UTF-8* for portability reasons. 1380 1381There are some more variants of the tar format which can be read, but not 1382created: 1383 1384* The ancient V7 format. This is the first tar format from Unix Seventh Edition, 1385 storing only regular files and directories. Names must not be longer than 100 1386 characters, there is no user/group name information. Some archives have 1387 miscalculated header checksums in case of fields with non-ASCII characters. 1388 1389* The SunOS tar extended format. This format is a variant of the POSIX.1-2001 1390 pax format, but is not compatible. 1391 1392.. _tar-unicode: 1393 1394Unicode issues 1395-------------- 1396 1397The tar format was originally conceived to make backups on tape drives with the 1398main focus on preserving file system information. Nowadays tar archives are 1399commonly used for file distribution and exchanging archives over networks. One 1400problem of the original format (which is the basis of all other formats) is 1401that there is no concept of supporting different character encodings. For 1402example, an ordinary tar archive created on a *UTF-8* system cannot be read 1403correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual 1404metadata (like filenames, linknames, user/group names) will appear damaged. 1405Unfortunately, there is no way to autodetect the encoding of an archive. The 1406pax format was designed to solve this problem. It stores non-ASCII metadata 1407using the universal character encoding *UTF-8*. 1408 1409The details of character conversion in :mod:`tarfile` are controlled by the 1410*encoding* and *errors* keyword arguments of the :class:`TarFile` class. 1411 1412*encoding* defines the character encoding to use for the metadata in the 1413archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` 1414as a fallback. Depending on whether the archive is read or written, the 1415metadata must be either decoded or encoded. If *encoding* is not set 1416appropriately, this conversion may fail. 1417 1418The *errors* argument defines how characters are treated that cannot be 1419converted. Possible values are listed in section :ref:`error-handlers`. 1420The default scheme is ``'surrogateescape'`` which Python also uses for its 1421file system calls, see :ref:`os-filenames`. 1422 1423For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed 1424because all the metadata is stored using *UTF-8*. *encoding* is only used in 1425the rare cases when binary pax headers are decoded or when strings with 1426surrogate characters are stored. 1427