• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!tarfile` --- Read and write tar archive files
2====================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39.. versionchanged:: 3.12
40   Archives are extracted using a :ref:`filter <tarfile-extraction-filter>`,
41   which makes it possible to either limit surprising/dangerous features,
42   or to acknowledge that they are expected and the archive is fully trusted.
43   By default, archives are fully trusted, but this default is deprecated
44   and slated to change in Python 3.14.
45
46
47.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
48
49   Return a :class:`TarFile` object for the pathname *name*. For detailed
50   information on :class:`TarFile` objects and the keyword arguments that are
51   allowed, see :ref:`tarfile-objects`.
52
53   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
54   to ``'r'``. Here is a full list of mode combinations:
55
56   +------------------+---------------------------------------------+
57   | mode             | action                                      |
58   +==================+=============================================+
59   | ``'r' or 'r:*'`` | Open for reading with transparent           |
60   |                  | compression (recommended).                  |
61   +------------------+---------------------------------------------+
62   | ``'r:'``         | Open for reading exclusively without        |
63   |                  | compression.                                |
64   +------------------+---------------------------------------------+
65   | ``'r:gz'``       | Open for reading with gzip compression.     |
66   +------------------+---------------------------------------------+
67   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
68   +------------------+---------------------------------------------+
69   | ``'r:xz'``       | Open for reading with lzma compression.     |
70   +------------------+---------------------------------------------+
71   | ``'x'`` or       | Create a tarfile exclusively without        |
72   | ``'x:'``         | compression.                                |
73   |                  | Raise a :exc:`FileExistsError` exception    |
74   |                  | if it already exists.                       |
75   +------------------+---------------------------------------------+
76   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
77   |                  | Raise a :exc:`FileExistsError` exception    |
78   |                  | if it already exists.                       |
79   +------------------+---------------------------------------------+
80   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
81   |                  | Raise a :exc:`FileExistsError` exception    |
82   |                  | if it already exists.                       |
83   +------------------+---------------------------------------------+
84   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
85   |                  | Raise a :exc:`FileExistsError` exception    |
86   |                  | if it already exists.                       |
87   +------------------+---------------------------------------------+
88   | ``'a' or 'a:'``  | Open for appending with no compression. The |
89   |                  | file is created if it does not exist.       |
90   +------------------+---------------------------------------------+
91   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
92   +------------------+---------------------------------------------+
93   | ``'w:gz'``       | Open for gzip compressed writing.           |
94   +------------------+---------------------------------------------+
95   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
96   +------------------+---------------------------------------------+
97   | ``'w:xz'``       | Open for lzma compressed writing.           |
98   +------------------+---------------------------------------------+
99
100   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
101   is not suitable to open a certain (compressed) file for reading,
102   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
103   compression method is not supported, :exc:`CompressionError` is raised.
104
105   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
106   opened in binary mode for *name*. It is supposed to be at position 0.
107
108   For modes ``'w:gz'``, ``'x:gz'``, ``'w|gz'``, ``'w:bz2'``, ``'x:bz2'``,
109   ``'w|bz2'``, :func:`tarfile.open` accepts the keyword argument
110   *compresslevel* (default ``9``) to specify the compression level of the file.
111
112   For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the
113   keyword argument *preset* to specify the compression level of the file.
114
115   For special purposes, there is a second format for *mode*:
116   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
117   object that processes its data as a stream of blocks.  No random seeking will
118   be done on the file. If given, *fileobj* may be any object that has a
119   :meth:`~io.RawIOBase.read` or :meth:`~io.RawIOBase.write` method
120   (depending on the *mode*) that works with bytes.
121   *bufsize* specifies the blocksize and defaults to ``20 * 512`` bytes.
122   Use this variant in combination with e.g. ``sys.stdin.buffer``, a socket
123   :term:`file object` or a tape device.
124   However, such a :class:`TarFile` object is limited in that it does
125   not allow random access, see :ref:`tar-examples`.  The currently
126   possible modes:
127
128   +-------------+--------------------------------------------+
129   | Mode        | Action                                     |
130   +=============+============================================+
131   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
132   |             | with transparent compression.              |
133   +-------------+--------------------------------------------+
134   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
135   |             | for reading.                               |
136   +-------------+--------------------------------------------+
137   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
138   |             | reading.                                   |
139   +-------------+--------------------------------------------+
140   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
141   |             | reading.                                   |
142   +-------------+--------------------------------------------+
143   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
144   |             | reading.                                   |
145   +-------------+--------------------------------------------+
146   | ``'w|'``    | Open an uncompressed *stream* for writing. |
147   +-------------+--------------------------------------------+
148   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
149   |             | writing.                                   |
150   +-------------+--------------------------------------------+
151   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
152   |             | writing.                                   |
153   +-------------+--------------------------------------------+
154   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
155   |             | writing.                                   |
156   +-------------+--------------------------------------------+
157
158   .. versionchanged:: 3.5
159      The ``'x'`` (exclusive creation) mode was added.
160
161   .. versionchanged:: 3.6
162      The *name* parameter accepts a :term:`path-like object`.
163
164   .. versionchanged:: 3.12
165      The *compresslevel* keyword argument also works for streams.
166
167
168.. class:: TarFile
169   :noindex:
170
171   Class for reading and writing tar archives. Do not use this class directly:
172   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
173
174
175.. function:: is_tarfile(name)
176
177   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
178   module can read. *name* may be a :class:`str`, file, or file-like object.
179
180   .. versionchanged:: 3.9
181      Support for file and file-like objects.
182
183
184The :mod:`tarfile` module defines the following exceptions:
185
186
187.. exception:: TarError
188
189   Base class for all :mod:`tarfile` exceptions.
190
191
192.. exception:: ReadError
193
194   Is raised when a tar archive is opened, that either cannot be handled by the
195   :mod:`tarfile` module or is somehow invalid.
196
197
198.. exception:: CompressionError
199
200   Is raised when a compression method is not supported or when the data cannot be
201   decoded properly.
202
203
204.. exception:: StreamError
205
206   Is raised for the limitations that are typical for stream-like :class:`TarFile`
207   objects.
208
209
210.. exception:: ExtractError
211
212   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
213   :attr:`TarFile.errorlevel`\ ``== 2``.
214
215
216.. exception:: HeaderError
217
218   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
219
220
221.. exception:: FilterError
222
223   Base class for members :ref:`refused <tarfile-extraction-refuse>` by
224   filters.
225
226   .. attribute:: tarinfo
227
228      Information about the member that the filter refused to extract,
229      as :ref:`TarInfo <tarinfo-objects>`.
230
231.. exception:: AbsolutePathError
232
233   Raised to refuse extracting a member with an absolute path.
234
235.. exception:: OutsideDestinationError
236
237   Raised to refuse extracting a member outside the destination directory.
238
239.. exception:: SpecialFileError
240
241   Raised to refuse extracting a special file (e.g. a device or pipe).
242
243.. exception:: AbsoluteLinkError
244
245   Raised to refuse extracting a symbolic link with an absolute path.
246
247.. exception:: LinkOutsideDestinationError
248
249   Raised to refuse extracting a symbolic link pointing outside the destination
250   directory.
251
252
253The following constants are available at the module level:
254
255.. data:: ENCODING
256
257   The default character encoding: ``'utf-8'`` on Windows, the value returned by
258   :func:`sys.getfilesystemencoding` otherwise.
259
260.. data:: REGTYPE
261          AREGTYPE
262
263   A regular file :attr:`~TarInfo.type`.
264
265.. data:: LNKTYPE
266
267   A link (inside tarfile) :attr:`~TarInfo.type`.
268
269.. data:: SYMTYPE
270
271   A symbolic link :attr:`~TarInfo.type`.
272
273.. data:: CHRTYPE
274
275   A character special device :attr:`~TarInfo.type`.
276
277.. data:: BLKTYPE
278
279   A block special device :attr:`~TarInfo.type`.
280
281.. data:: DIRTYPE
282
283   A directory :attr:`~TarInfo.type`.
284
285.. data:: FIFOTYPE
286
287   A FIFO special device :attr:`~TarInfo.type`.
288
289.. data:: CONTTYPE
290
291   A contiguous file :attr:`~TarInfo.type`.
292
293.. data:: GNUTYPE_LONGNAME
294
295   A GNU tar longname :attr:`~TarInfo.type`.
296
297.. data:: GNUTYPE_LONGLINK
298
299   A GNU tar longlink :attr:`~TarInfo.type`.
300
301.. data:: GNUTYPE_SPARSE
302
303   A GNU tar sparse file :attr:`~TarInfo.type`.
304
305
306Each of the following constants defines a tar archive format that the
307:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
308details.
309
310
311.. data:: USTAR_FORMAT
312
313   POSIX.1-1988 (ustar) format.
314
315
316.. data:: GNU_FORMAT
317
318   GNU tar format.
319
320
321.. data:: PAX_FORMAT
322
323   POSIX.1-2001 (pax) format.
324
325
326.. data:: DEFAULT_FORMAT
327
328   The default format for creating archives. This is currently :const:`PAX_FORMAT`.
329
330   .. versionchanged:: 3.8
331      The default format for new archives was changed to
332      :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
333
334
335.. seealso::
336
337   Module :mod:`zipfile`
338      Documentation of the :mod:`zipfile` standard module.
339
340   :ref:`archiving-operations`
341      Documentation of the higher-level archiving facilities provided by the
342      standard :mod:`shutil` module.
343
344   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
345      Documentation for tar archive files, including GNU tar extensions.
346
347
348.. _tarfile-objects:
349
350TarFile Objects
351---------------
352
353The :class:`TarFile` object provides an interface to a tar archive. A tar
354archive is a sequence of blocks. An archive member (a stored file) is made up of
355a header block followed by data blocks. It is possible to store a file in a tar
356archive several times. Each archive member is represented by a :class:`TarInfo`
357object, see :ref:`tarinfo-objects` for details.
358
359A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
360statement. It will automatically be closed when the block is completed. Please
361note that in the event of an exception an archive opened for writing will not
362be finalized; only the internally used file object will be closed. See the
363:ref:`tar-examples` section for a use case.
364
365.. versionadded:: 3.2
366   Added support for the context management protocol.
367
368.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1, stream=False)
369
370   All following arguments are optional and can be accessed as instance attributes
371   as well.
372
373   *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
374   It can be omitted if *fileobj* is given.
375   In this case, the file object's :attr:`!name` attribute is used if it exists.
376
377   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
378   data to an existing file, ``'w'`` to create a new file overwriting an existing
379   one, or ``'x'`` to create a new file only if it does not already exist.
380
381   If *fileobj* is given, it is used for reading or writing data. If it can be
382   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
383   from position 0.
384
385   .. note::
386
387      *fileobj* is not closed, when :class:`TarFile` is closed.
388
389   *format* controls the archive format for writing. It must be one of the constants
390   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
391   defined at module level. When reading, format will be automatically detected, even
392   if different formats are present in a single archive.
393
394   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
395   with a different one.
396
397   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
398   is :const:`True`, add the content of the target files to the archive. This has no
399   effect on systems that do not support symbolic links.
400
401   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
402   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
403   as possible. This is only useful for reading concatenated or damaged archives.
404
405   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
406   messages). The messages are written to ``sys.stderr``.
407
408   *errorlevel* controls how extraction errors are handled,
409   see :attr:`the corresponding attribute <TarFile.errorlevel>`.
410
411   The *encoding* and *errors* arguments define the character encoding to be
412   used for reading or writing the archive and how conversion errors are going
413   to be handled. The default settings will work for most users.
414   See section :ref:`tar-unicode` for in-depth information.
415
416   The *pax_headers* argument is an optional dictionary of strings which
417   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
418
419   If *stream* is set to :const:`True` then while reading the archive info about files
420   in the archive are not cached, saving memory.
421
422   .. versionchanged:: 3.2
423      Use ``'surrogateescape'`` as the default for the *errors* argument.
424
425   .. versionchanged:: 3.5
426      The ``'x'`` (exclusive creation) mode was added.
427
428   .. versionchanged:: 3.6
429      The *name* parameter accepts a :term:`path-like object`.
430
431   .. versionchanged:: 3.13
432      Add the *stream* parameter.
433
434.. classmethod:: TarFile.open(...)
435
436   Alternative constructor. The :func:`tarfile.open` function is actually a
437   shortcut to this classmethod.
438
439
440.. method:: TarFile.getmember(name)
441
442   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
443   in the archive, :exc:`KeyError` is raised.
444
445   .. note::
446
447      If a member occurs more than once in the archive, its last occurrence is assumed
448      to be the most up-to-date version.
449
450
451.. method:: TarFile.getmembers()
452
453   Return the members of the archive as a list of :class:`TarInfo` objects. The
454   list has the same order as the members in the archive.
455
456
457.. method:: TarFile.getnames()
458
459   Return the members as a list of their names. It has the same order as the list
460   returned by :meth:`getmembers`.
461
462
463.. method:: TarFile.list(verbose=True, *, members=None)
464
465   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
466   only the names of the members are printed. If it is :const:`True`, output
467   similar to that of :program:`ls -l` is produced. If optional *members* is
468   given, it must be a subset of the list returned by :meth:`getmembers`.
469
470   .. versionchanged:: 3.5
471      Added the *members* parameter.
472
473
474.. method:: TarFile.next()
475
476   Return the next member of the archive as a :class:`TarInfo` object, when
477   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
478   available.
479
480
481.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None)
482
483   Extract all members from the archive to the current working directory or
484   directory *path*. If optional *members* is given, it must be a subset of the
485   list returned by :meth:`getmembers`. Directory information like owner,
486   modification time and permissions are set after all members have been extracted.
487   This is done to work around two problems: A directory's modification time is
488   reset each time a file is created in it. And, if a directory's permissions do
489   not allow writing, extracting files to it will fail.
490
491   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
492   are used to set the owner/group for the extracted files. Otherwise, the named
493   values from the tarfile are used.
494
495   The *filter* argument specifies how ``members`` are modified or rejected
496   before extraction.
497   See :ref:`tarfile-extraction-filter` for details.
498   It is recommended to set this explicitly depending on which *tar* features
499   you need to support.
500
501   .. warning::
502
503      Never extract archives from untrusted sources without prior inspection.
504      It is possible that files are created outside of *path*, e.g. members
505      that have absolute filenames starting with ``"/"`` or filenames with two
506      dots ``".."``.
507
508      Set ``filter='data'`` to prevent the most dangerous security issues,
509      and read the :ref:`tarfile-extraction-filter` section for details.
510
511   .. versionchanged:: 3.5
512      Added the *numeric_owner* parameter.
513
514   .. versionchanged:: 3.6
515      The *path* parameter accepts a :term:`path-like object`.
516
517   .. versionchanged:: 3.12
518      Added the *filter* parameter.
519
520
521.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None)
522
523   Extract a member from the archive to the current working directory, using its
524   full name. Its file information is extracted as accurately as possible. *member*
525   may be a filename or a :class:`TarInfo` object. You can specify a different
526   directory using *path*. *path* may be a :term:`path-like object`.
527   File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
528
529   The *numeric_owner* and *filter* arguments are the same as
530   for :meth:`extractall`.
531
532   .. note::
533
534      The :meth:`extract` method does not take care of several extraction issues.
535      In most cases you should consider using the :meth:`extractall` method.
536
537   .. warning::
538
539      See the warning for :meth:`extractall`.
540
541      Set ``filter='data'`` to prevent the most dangerous security issues,
542      and read the :ref:`tarfile-extraction-filter` section for details.
543
544   .. versionchanged:: 3.2
545      Added the *set_attrs* parameter.
546
547   .. versionchanged:: 3.5
548      Added the *numeric_owner* parameter.
549
550   .. versionchanged:: 3.6
551      The *path* parameter accepts a :term:`path-like object`.
552
553   .. versionchanged:: 3.12
554      Added the *filter* parameter.
555
556
557.. method:: TarFile.extractfile(member)
558
559   Extract a member from the archive as a file object. *member* may be
560   a filename or a :class:`TarInfo` object. If *member* is a regular file or
561   a link, an :class:`io.BufferedReader` object is returned. For all other
562   existing members, :const:`None` is returned. If *member* does not appear
563   in the archive, :exc:`KeyError` is raised.
564
565   .. versionchanged:: 3.3
566      Return an :class:`io.BufferedReader` object.
567
568   .. versionchanged:: 3.13
569      The returned :class:`io.BufferedReader` object has the :attr:`!mode`
570      attribute which is always equal to ``'rb'``.
571
572.. attribute:: TarFile.errorlevel
573   :type: int
574
575   If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract`
576   and :meth:`TarFile.extractall`.
577   Nevertheless, they appear as error messages in the debug output when
578   *debug* is greater than 0.
579   If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or
580   :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised
581   as :exc:`TarError` exceptions as well.
582
583   Some exceptions, e.g. ones caused by wrong argument types or data
584   corruption, are always raised.
585
586   Custom :ref:`extraction filters <tarfile-extraction-filter>`
587   should raise :exc:`FilterError` for *fatal* errors
588   and :exc:`ExtractError` for *non-fatal* ones.
589
590   Note that when an exception is raised, the archive may be partially
591   extracted. It is the user’s responsibility to clean up.
592
593.. attribute:: TarFile.extraction_filter
594
595   .. versionadded:: 3.12
596
597   The :ref:`extraction filter <tarfile-extraction-filter>` used
598   as a default for the *filter* argument of :meth:`~TarFile.extract`
599   and :meth:`~TarFile.extractall`.
600
601   The attribute may be ``None`` or a callable.
602   String names are not allowed for this attribute, unlike the *filter*
603   argument to :meth:`~TarFile.extract`.
604
605   If ``extraction_filter`` is ``None`` (the default),
606   calling an extraction method without a *filter* argument will raise a
607   ``DeprecationWarning``,
608   and fall back to the :func:`fully_trusted <fully_trusted_filter>` filter,
609   whose dangerous behavior matches previous versions of Python.
610
611   In Python 3.14+, leaving ``extraction_filter=None`` will cause
612   extraction methods to use the :func:`data <data_filter>` filter by default.
613
614   The attribute may be set on instances or overridden in subclasses.
615   It also is possible to set it on the ``TarFile`` class itself to set a
616   global default, although, since it affects all uses of *tarfile*,
617   it is best practice to only do so in top-level applications or
618   :mod:`site configuration <site>`.
619   To set a global default this way, a filter function needs to be wrapped in
620   :func:`staticmethod` to prevent injection of a ``self`` argument.
621
622.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
623
624   Add the file *name* to the archive. *name* may be any type of file
625   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
626   alternative name for the file in the archive. Directories are added
627   recursively by default. This can be avoided by setting *recursive* to
628   :const:`False`. Recursion adds entries in sorted order.
629   If *filter* is given, it
630   should be a function that takes a :class:`TarInfo` object argument and
631   returns the changed :class:`TarInfo` object. If it instead returns
632   :const:`None` the :class:`TarInfo` object will be excluded from the
633   archive. See :ref:`tar-examples` for an example.
634
635   .. versionchanged:: 3.2
636      Added the *filter* parameter.
637
638   .. versionchanged:: 3.7
639      Recursion adds entries in sorted order.
640
641
642.. method:: TarFile.addfile(tarinfo, fileobj=None)
643
644   Add the :class:`TarInfo` object *tarinfo* to the archive. If *tarinfo* represents
645   a non zero-size regular file, the *fileobj* argument should be a :term:`binary file`,
646   and ``tarinfo.size`` bytes are read from it and added to the archive.  You can
647   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
648
649   .. versionchanged:: 3.13
650
651      *fileobj* must be given for non-zero-sized regular files.
652
653
654.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
655
656   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
657   equivalent on an existing file.  The file is either named by *name*, or
658   specified as a :term:`file object` *fileobj* with a file descriptor.
659   *name* may be a :term:`path-like object`.  If
660   given, *arcname* specifies an alternative name for the file in the
661   archive, otherwise, the name is taken from *fileobj*’s
662   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
663   should be a text string.
664
665   You can modify
666   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
667   If the file object is not an ordinary file object positioned at the
668   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
669   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
670   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
671   could be a dummy string.
672
673   .. versionchanged:: 3.6
674      The *name* parameter accepts a :term:`path-like object`.
675
676
677.. method:: TarFile.close()
678
679   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
680   appended to the archive.
681
682
683.. attribute:: TarFile.pax_headers
684   :type: dict
685
686   A dictionary containing key-value pairs of pax global headers.
687
688
689
690.. _tarinfo-objects:
691
692TarInfo Objects
693---------------
694
695A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
696from storing all required attributes of a file (like file type, size, time,
697permissions, owner etc.), it provides some useful methods to determine its type.
698It does *not* contain the file's data itself.
699
700:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
701:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and
702:meth:`~TarFile.gettarinfo`.
703
704Modifying the objects returned by :meth:`~TarFile.getmember` or
705:meth:`~TarFile.getmembers` will affect all subsequent
706operations on the archive.
707For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or
708call the :meth:`~TarInfo.replace` method to create a modified copy in one step.
709
710Several attributes can be set to ``None`` to indicate that a piece of metadata
711is unused or unknown.
712Different :class:`TarInfo` methods handle ``None`` differently:
713
714- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will
715  ignore the corresponding metadata, leaving it set to a default.
716- :meth:`~TarFile.addfile` will fail.
717- :meth:`~TarFile.list` will print a placeholder string.
718
719.. class:: TarInfo(name="")
720
721   Create a :class:`TarInfo` object.
722
723
724.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
725
726   Create and return a :class:`TarInfo` object from string buffer *buf*.
727
728   Raises :exc:`HeaderError` if the buffer is invalid.
729
730
731.. classmethod:: TarInfo.fromtarfile(tarfile)
732
733   Read the next member from the :class:`TarFile` object *tarfile* and return it as
734   a :class:`TarInfo` object.
735
736
737.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
738
739   Create a string buffer from a :class:`TarInfo` object. For information on the
740   arguments see the constructor of the :class:`TarFile` class.
741
742   .. versionchanged:: 3.2
743      Use ``'surrogateescape'`` as the default for the *errors* argument.
744
745
746A ``TarInfo`` object has the following public data attributes:
747
748
749.. attribute:: TarInfo.name
750   :type: str
751
752   Name of the archive member.
753
754
755.. attribute:: TarInfo.size
756   :type: int
757
758   Size in bytes.
759
760
761.. attribute:: TarInfo.mtime
762   :type: int | float
763
764   Time of last modification in seconds since the :ref:`epoch <epoch>`,
765   as in :attr:`os.stat_result.st_mtime`.
766
767   .. versionchanged:: 3.12
768
769      Can be set to ``None`` for :meth:`~TarFile.extract` and
770      :meth:`~TarFile.extractall`, causing extraction to skip applying this
771      attribute.
772
773.. attribute:: TarInfo.mode
774   :type: int
775
776   Permission bits, as for :func:`os.chmod`.
777
778   .. versionchanged:: 3.12
779
780      Can be set to ``None`` for :meth:`~TarFile.extract` and
781      :meth:`~TarFile.extractall`, causing extraction to skip applying this
782      attribute.
783
784.. attribute:: TarInfo.type
785
786   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
787   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
788   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
789   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
790   more conveniently, use the ``is*()`` methods below.
791
792
793.. attribute:: TarInfo.linkname
794   :type: str
795
796   Name of the target file name, which is only present in :class:`TarInfo` objects
797   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
798
799   For symbolic links (``SYMTYPE``), the *linkname* is relative to the directory
800   that contains the link.
801   For hard links (``LNKTYPE``), the *linkname* is relative to the root of
802   the archive.
803
804
805.. attribute:: TarInfo.uid
806   :type: int
807
808   User ID of the user who originally stored this member.
809
810   .. versionchanged:: 3.12
811
812      Can be set to ``None`` for :meth:`~TarFile.extract` and
813      :meth:`~TarFile.extractall`, causing extraction to skip applying this
814      attribute.
815
816.. attribute:: TarInfo.gid
817   :type: int
818
819   Group ID of the user who originally stored this member.
820
821   .. versionchanged:: 3.12
822
823      Can be set to ``None`` for :meth:`~TarFile.extract` and
824      :meth:`~TarFile.extractall`, causing extraction to skip applying this
825      attribute.
826
827.. attribute:: TarInfo.uname
828   :type: str
829
830   User name.
831
832   .. versionchanged:: 3.12
833
834      Can be set to ``None`` for :meth:`~TarFile.extract` and
835      :meth:`~TarFile.extractall`, causing extraction to skip applying this
836      attribute.
837
838.. attribute:: TarInfo.gname
839   :type: str
840
841   Group name.
842
843   .. versionchanged:: 3.12
844
845      Can be set to ``None`` for :meth:`~TarFile.extract` and
846      :meth:`~TarFile.extractall`, causing extraction to skip applying this
847      attribute.
848
849.. attribute:: TarInfo.chksum
850   :type: int
851
852   Header checksum.
853
854
855.. attribute:: TarInfo.devmajor
856   :type: int
857
858   Device major number.
859
860
861.. attribute:: TarInfo.devminor
862   :type: int
863
864   Device minor number.
865
866
867.. attribute:: TarInfo.offset
868   :type: int
869
870   The tar header starts here.
871
872
873.. attribute:: TarInfo.offset_data
874   :type: int
875
876   The file's data starts here.
877
878
879.. attribute:: TarInfo.sparse
880
881   Sparse member information.
882
883
884.. attribute:: TarInfo.pax_headers
885   :type: dict
886
887   A dictionary containing key-value pairs of an associated pax extended header.
888
889.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=..., \
890                            uid=..., gid=..., uname=..., gname=..., \
891                            deep=True)
892
893   .. versionadded:: 3.12
894
895   Return a *new* copy of the :class:`!TarInfo` object with the given attributes
896   changed. For example, to return a ``TarInfo`` with the group name set to
897   ``'staff'``, use::
898
899       new_tarinfo = old_tarinfo.replace(gname='staff')
900
901   By default, a deep copy is made.
902   If *deep* is false, the copy is shallow, i.e. ``pax_headers``
903   and any custom attributes are shared with the original ``TarInfo`` object.
904
905A :class:`TarInfo` object also provides some convenient query methods:
906
907
908.. method:: TarInfo.isfile()
909
910   Return :const:`True` if the :class:`TarInfo` object is a regular file.
911
912
913.. method:: TarInfo.isreg()
914
915   Same as :meth:`isfile`.
916
917
918.. method:: TarInfo.isdir()
919
920   Return :const:`True` if it is a directory.
921
922
923.. method:: TarInfo.issym()
924
925   Return :const:`True` if it is a symbolic link.
926
927
928.. method:: TarInfo.islnk()
929
930   Return :const:`True` if it is a hard link.
931
932
933.. method:: TarInfo.ischr()
934
935   Return :const:`True` if it is a character device.
936
937
938.. method:: TarInfo.isblk()
939
940   Return :const:`True` if it is a block device.
941
942
943.. method:: TarInfo.isfifo()
944
945   Return :const:`True` if it is a FIFO.
946
947
948.. method:: TarInfo.isdev()
949
950   Return :const:`True` if it is one of character device, block device or FIFO.
951
952
953.. _tarfile-extraction-filter:
954
955Extraction filters
956------------------
957
958.. versionadded:: 3.12
959
960The *tar* format is designed to capture all details of a UNIX-like filesystem,
961which makes it very powerful.
962Unfortunately, the features make it easy to create tar files that have
963unintended -- and possibly malicious -- effects when extracted.
964For example, extracting a tar file can overwrite arbitrary files in various
965ways (e.g.  by using absolute paths, ``..`` path components, or symlinks that
966affect later members).
967
968In most cases, the full functionality is not needed.
969Therefore, *tarfile* supports extraction filters: a mechanism to limit
970functionality, and thus mitigate some of the security issues.
971
972.. seealso::
973
974   :pep:`706`
975      Contains further motivation and rationale behind the design.
976
977The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall`
978can be:
979
980* the string ``'fully_trusted'``: Honor all metadata as specified in the
981  archive.
982  Should be used if the user trusts the archive completely, or implements
983  their own complex verification.
984
985* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of
986  UNIX-like filesystems), but block features that are very likely to be
987  surprising or malicious. See :func:`tar_filter` for details.
988
989* the string ``'data'``: Ignore or block most features specific to UNIX-like
990  filesystems. Intended for extracting cross-platform data archives.
991  See :func:`data_filter` for details.
992
993* ``None`` (default): Use :attr:`TarFile.extraction_filter`.
994
995  If that is also ``None`` (the default), raise a ``DeprecationWarning``,
996  and fall back to the ``'fully_trusted'`` filter, whose dangerous behavior
997  matches previous versions of Python.
998
999  In Python 3.14, the ``'data'`` filter will become the default instead.
1000  It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
1001
1002* A callable which will be called for each extracted member with a
1003  :ref:`TarInfo <tarinfo-objects>` describing the member and the destination
1004  path to where the archive is extracted (i.e. the same path is used for all
1005  members)::
1006
1007      filter(member: TarInfo, path: str, /) -> TarInfo | None
1008
1009  The callable is called just before each member is extracted, so it can
1010  take the current state of the disk into account.
1011  It can:
1012
1013  - return a :class:`TarInfo` object which will be used instead of the metadata
1014    in the archive, or
1015  - return ``None``, in which case the member will be skipped, or
1016  - raise an exception to abort the operation or skip the member,
1017    depending on :attr:`~TarFile.errorlevel`.
1018    Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave
1019    the archive partially extracted. It does not attempt to clean up.
1020
1021Default named filters
1022~~~~~~~~~~~~~~~~~~~~~
1023
1024The pre-defined, named filters are available as functions, so they can be
1025reused in custom filters:
1026
1027.. function:: fully_trusted_filter(member, path)
1028
1029   Return *member* unchanged.
1030
1031   This implements the ``'fully_trusted'`` filter.
1032
1033.. function:: tar_filter(member, path)
1034
1035  Implements the ``'tar'`` filter.
1036
1037  - Strip leading slashes (``/`` and :data:`os.sep`) from filenames.
1038  - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute
1039    paths (in case the name is absolute
1040    even after stripping slashes, e.g. ``C:/foo`` on Windows).
1041    This raises :class:`~tarfile.AbsolutePathError`.
1042  - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute
1043    path (after following symlinks) would end up outside the destination.
1044    This raises :class:`~tarfile.OutsideDestinationError`.
1045  - Clear high mode bits (setuid, setgid, sticky) and group/other write bits
1046    (:const:`~stat.S_IWGRP` | :const:`~stat.S_IWOTH`).
1047
1048  Return the modified ``TarInfo`` member.
1049
1050.. function:: data_filter(member, path)
1051
1052  Implements the ``'data'`` filter.
1053  In addition to what ``tar_filter`` does:
1054
1055  - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
1056    that link to absolute paths, or ones that link outside the destination.
1057
1058    This raises :class:`~tarfile.AbsoluteLinkError` or
1059    :class:`~tarfile.LinkOutsideDestinationError`.
1060
1061    Note that such files are refused even on platforms that do not support
1062    symbolic links.
1063
1064  - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files
1065    (including pipes).
1066    This raises :class:`~tarfile.SpecialFileError`.
1067
1068  - For regular files, including hard links:
1069
1070    - Set the owner read and write permissions
1071      (:const:`~stat.S_IRUSR` | :const:`~stat.S_IWUSR`).
1072    - Remove the group & other executable permission
1073      (:const:`~stat.S_IXGRP` | :const:`~stat.S_IXOTH`)
1074      if the owner doesn’t have it (:const:`~stat.S_IXUSR`).
1075
1076  - For other files (directories), set ``mode`` to ``None``, so
1077    that extraction methods skip applying permission bits.
1078  - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``)
1079    to ``None``, so that extraction methods skip setting it.
1080
1081  Return the modified ``TarInfo`` member.
1082
1083
1084.. _tarfile-extraction-refuse:
1085
1086Filter errors
1087~~~~~~~~~~~~~
1088
1089When a filter refuses to extract a file, it will raise an appropriate exception,
1090a subclass of :class:`~tarfile.FilterError`.
1091This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more.
1092With ``errorlevel=0`` the error will be logged and the member will be skipped,
1093but extraction will continue.
1094
1095
1096Hints for further verification
1097~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1098
1099Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted
1100files without prior inspection.
1101Among other issues, the pre-defined filters do not prevent denial-of-service
1102attacks. Users should do additional checks.
1103
1104Here is an incomplete list of things to consider:
1105
1106* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
1107  to prevent e.g. exploiting pre-existing links, and to make it easier to
1108  clean up after a failed extraction.
1109* When working with untrusted data, use external (e.g. OS-level) limits on
1110  disk, memory and CPU usage.
1111* Check filenames against an allow-list of characters
1112  (to filter out control characters, confusables, foreign path separators,
1113  etc.).
1114* Check that filenames have expected extensions (discouraging files that
1115  execute when you “click on them”, or extension-less files like Windows special device names).
1116* Limit the number of extracted files, total size of extracted data,
1117  filename length (including symlink length), and size of individual files.
1118* Check for files that would be shadowed on case-insensitive filesystems.
1119
1120Also note that:
1121
1122* Tar files may contain multiple versions of the same file.
1123  Later ones are expected to overwrite any earlier ones.
1124  This feature is crucial to allow updating tape archives, but can be abused
1125  maliciously.
1126* *tarfile* does not protect against issues with “live” data,
1127  e.g. an attacker tinkering with the destination (or source) directory while
1128  extraction (or archiving) is in progress.
1129
1130
1131Supporting older Python versions
1132~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1133
1134Extraction filters were added to Python 3.12, but may be backported to older
1135versions as security updates.
1136To check whether the feature is available, use e.g.
1137``hasattr(tarfile, 'data_filter')`` rather than checking the Python version.
1138
1139The following examples show how to support Python versions with and without
1140the feature.
1141Note that setting ``extraction_filter`` will affect any subsequent operations.
1142
1143* Fully trusted archive::
1144
1145    my_tarfile.extraction_filter = (lambda member, path: member)
1146    my_tarfile.extractall()
1147
1148* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior
1149  (``'fully_trusted'``) if this feature is not available::
1150
1151    my_tarfile.extraction_filter = getattr(tarfile, 'data_filter',
1152                                           (lambda member, path: member))
1153    my_tarfile.extractall()
1154
1155* Use the ``'data'`` filter; *fail* if it is not available::
1156
1157    my_tarfile.extractall(filter=tarfile.data_filter)
1158
1159  or::
1160
1161    my_tarfile.extraction_filter = tarfile.data_filter
1162    my_tarfile.extractall()
1163
1164* Use the ``'data'`` filter; *warn* if it is not available::
1165
1166   if hasattr(tarfile, 'data_filter'):
1167       my_tarfile.extractall(filter='data')
1168   else:
1169       # remove this when no longer needed
1170       warn_the_user('Extracting may be unsafe; consider updating Python')
1171       my_tarfile.extractall()
1172
1173
1174Stateful extraction filter example
1175~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1176
1177While *tarfile*'s extraction methods take a simple *filter* callable,
1178custom filters may be more complex objects with an internal state.
1179It may be useful to write these as context managers, to be used like this::
1180
1181    with StatefulFilter() as filter_func:
1182        tar.extractall(path, filter=filter_func)
1183
1184Such a filter can be written as, for example::
1185
1186    class StatefulFilter:
1187        def __init__(self):
1188            self.file_count = 0
1189
1190        def __enter__(self):
1191            return self
1192
1193        def __call__(self, member, path):
1194            self.file_count += 1
1195            return member
1196
1197        def __exit__(self, *exc_info):
1198            print(f'{self.file_count} files extracted')
1199
1200
1201.. _tarfile-commandline:
1202.. program:: tarfile
1203
1204
1205Command-Line Interface
1206----------------------
1207
1208.. versionadded:: 3.4
1209
1210The :mod:`tarfile` module provides a simple command-line interface to interact
1211with tar archives.
1212
1213If you want to create a new tar archive, specify its name after the :option:`-c`
1214option and then list the filename(s) that should be included:
1215
1216.. code-block:: shell-session
1217
1218    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
1219
1220Passing a directory is also acceptable:
1221
1222.. code-block:: shell-session
1223
1224    $ python -m tarfile -c monty.tar life-of-brian_1979/
1225
1226If you want to extract a tar archive into the current directory, use
1227the :option:`-e` option:
1228
1229.. code-block:: shell-session
1230
1231    $ python -m tarfile -e monty.tar
1232
1233You can also extract a tar archive into a different directory by passing the
1234directory's name:
1235
1236.. code-block:: shell-session
1237
1238    $ python -m tarfile -e monty.tar  other-dir/
1239
1240For a list of the files in a tar archive, use the :option:`-l` option:
1241
1242.. code-block:: shell-session
1243
1244    $ python -m tarfile -l monty.tar
1245
1246
1247Command-line options
1248~~~~~~~~~~~~~~~~~~~~
1249
1250.. option:: -l <tarfile>
1251            --list <tarfile>
1252
1253   List files in a tarfile.
1254
1255.. option:: -c <tarfile> <source1> ... <sourceN>
1256            --create <tarfile> <source1> ... <sourceN>
1257
1258   Create tarfile from source files.
1259
1260.. option:: -e <tarfile> [<output_dir>]
1261            --extract <tarfile> [<output_dir>]
1262
1263   Extract tarfile into the current directory if *output_dir* is not specified.
1264
1265.. option:: -t <tarfile>
1266            --test <tarfile>
1267
1268   Test whether the tarfile is valid or not.
1269
1270.. option:: -v, --verbose
1271
1272   Verbose output.
1273
1274.. option:: --filter <filtername>
1275
1276   Specifies the *filter* for ``--extract``.
1277   See :ref:`tarfile-extraction-filter` for details.
1278   Only string names are accepted (that is, ``fully_trusted``, ``tar``,
1279   and ``data``).
1280
1281.. _tar-examples:
1282
1283Examples
1284--------
1285
1286How to extract an entire tar archive to the current working directory::
1287
1288   import tarfile
1289   tar = tarfile.open("sample.tar.gz")
1290   tar.extractall(filter='data')
1291   tar.close()
1292
1293How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
1294a generator function instead of a list::
1295
1296   import os
1297   import tarfile
1298
1299   def py_files(members):
1300       for tarinfo in members:
1301           if os.path.splitext(tarinfo.name)[1] == ".py":
1302               yield tarinfo
1303
1304   tar = tarfile.open("sample.tar.gz")
1305   tar.extractall(members=py_files(tar))
1306   tar.close()
1307
1308How to create an uncompressed tar archive from a list of filenames::
1309
1310   import tarfile
1311   tar = tarfile.open("sample.tar", "w")
1312   for name in ["foo", "bar", "quux"]:
1313       tar.add(name)
1314   tar.close()
1315
1316The same example using the :keyword:`with` statement::
1317
1318    import tarfile
1319    with tarfile.open("sample.tar", "w") as tar:
1320        for name in ["foo", "bar", "quux"]:
1321            tar.add(name)
1322
1323How to read a gzip compressed tar archive and display some member information::
1324
1325   import tarfile
1326   tar = tarfile.open("sample.tar.gz", "r:gz")
1327   for tarinfo in tar:
1328       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="")
1329       if tarinfo.isreg():
1330           print("a regular file.")
1331       elif tarinfo.isdir():
1332           print("a directory.")
1333       else:
1334           print("something else.")
1335   tar.close()
1336
1337How to create an archive and reset the user information using the *filter*
1338parameter in :meth:`TarFile.add`::
1339
1340    import tarfile
1341    def reset(tarinfo):
1342        tarinfo.uid = tarinfo.gid = 0
1343        tarinfo.uname = tarinfo.gname = "root"
1344        return tarinfo
1345    tar = tarfile.open("sample.tar.gz", "w:gz")
1346    tar.add("foo", filter=reset)
1347    tar.close()
1348
1349
1350.. _tar-formats:
1351
1352Supported tar formats
1353---------------------
1354
1355There are three tar formats that can be created with the :mod:`tarfile` module:
1356
1357* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
1358  up to a length of at best 256 characters and linknames up to 100 characters.
1359  The maximum file size is 8 GiB. This is an old and limited but widely
1360  supported format.
1361
1362* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
1363  linknames, files bigger than 8 GiB and sparse files. It is the de facto
1364  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
1365  extensions for long names, sparse file support is read-only.
1366
1367* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
1368  format with virtually no limits. It supports long filenames and linknames, large
1369  files and stores pathnames in a portable way. Modern tar implementations,
1370  including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
1371  features; some old or unmaintained libraries may not, but should treat
1372  *pax* archives as if they were in the universally supported *ustar* format.
1373  It is the current default format for new archives.
1374
1375  It extends the existing *ustar* format with extra headers for information
1376  that cannot be stored otherwise. There are two flavours of pax headers:
1377  Extended headers only affect the subsequent file header, global
1378  headers are valid for the complete archive and affect all following files.
1379  All the data in a pax header is encoded in *UTF-8* for portability reasons.
1380
1381There are some more variants of the tar format which can be read, but not
1382created:
1383
1384* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
1385  storing only regular files and directories. Names must not be longer than 100
1386  characters, there is no user/group name information. Some archives have
1387  miscalculated header checksums in case of fields with non-ASCII characters.
1388
1389* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
1390  pax format, but is not compatible.
1391
1392.. _tar-unicode:
1393
1394Unicode issues
1395--------------
1396
1397The tar format was originally conceived to make backups on tape drives with the
1398main focus on preserving file system information. Nowadays tar archives are
1399commonly used for file distribution and exchanging archives over networks. One
1400problem of the original format (which is the basis of all other formats) is
1401that there is no concept of supporting different character encodings. For
1402example, an ordinary tar archive created on a *UTF-8* system cannot be read
1403correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
1404metadata (like filenames, linknames, user/group names) will appear damaged.
1405Unfortunately, there is no way to autodetect the encoding of an archive. The
1406pax format was designed to solve this problem. It stores non-ASCII metadata
1407using the universal character encoding *UTF-8*.
1408
1409The details of character conversion in :mod:`tarfile` are controlled by the
1410*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
1411
1412*encoding* defines the character encoding to use for the metadata in the
1413archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
1414as a fallback. Depending on whether the archive is read or written, the
1415metadata must be either decoded or encoded. If *encoding* is not set
1416appropriately, this conversion may fail.
1417
1418The *errors* argument defines how characters are treated that cannot be
1419converted. Possible values are listed in section :ref:`error-handlers`.
1420The default scheme is ``'surrogateescape'`` which Python also uses for its
1421file system calls, see :ref:`os-filenames`.
1422
1423For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
1424because all the metadata is stored using *UTF-8*. *encoding* is only used in
1425the rare cases when binary pax headers are decoded or when strings with
1426surrogate characters are stored.
1427