• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39
40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
41
42   Return a :class:`TarFile` object for the pathname *name*. For detailed
43   information on :class:`TarFile` objects and the keyword arguments that are
44   allowed, see :ref:`tarfile-objects`.
45
46   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47   to ``'r'``. Here is a full list of mode combinations:
48
49   +------------------+---------------------------------------------+
50   | mode             | action                                      |
51   +==================+=============================================+
52   | ``'r' or 'r:*'`` | Open for reading with transparent           |
53   |                  | compression (recommended).                  |
54   +------------------+---------------------------------------------+
55   | ``'r:'``         | Open for reading exclusively without        |
56   |                  | compression.                                |
57   +------------------+---------------------------------------------+
58   | ``'r:gz'``       | Open for reading with gzip compression.     |
59   +------------------+---------------------------------------------+
60   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
61   +------------------+---------------------------------------------+
62   | ``'r:xz'``       | Open for reading with lzma compression.     |
63   +------------------+---------------------------------------------+
64   | ``'x'`` or       | Create a tarfile exclusively without        |
65   | ``'x:'``         | compression.                                |
66   |                  | Raise a :exc:`FileExistsError` exception    |
67   |                  | if it already exists.                       |
68   +------------------+---------------------------------------------+
69   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
70   |                  | Raise a :exc:`FileExistsError` exception    |
71   |                  | if it already exists.                       |
72   +------------------+---------------------------------------------+
73   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
74   |                  | Raise a :exc:`FileExistsError` exception    |
75   |                  | if it already exists.                       |
76   +------------------+---------------------------------------------+
77   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
78   |                  | Raise a :exc:`FileExistsError` exception    |
79   |                  | if it already exists.                       |
80   +------------------+---------------------------------------------+
81   | ``'a' or 'a:'``  | Open for appending with no compression. The |
82   |                  | file is created if it does not exist.       |
83   +------------------+---------------------------------------------+
84   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
85   +------------------+---------------------------------------------+
86   | ``'w:gz'``       | Open for gzip compressed writing.           |
87   +------------------+---------------------------------------------+
88   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
89   +------------------+---------------------------------------------+
90   | ``'w:xz'``       | Open for lzma compressed writing.           |
91   +------------------+---------------------------------------------+
92
93   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94   is not suitable to open a certain (compressed) file for reading,
95   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
96   compression method is not supported, :exc:`CompressionError` is raised.
97
98   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99   opened in binary mode for *name*. It is supposed to be at position 0.
100
101   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102   ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
103   *compresslevel* (default ``9``) to specify the compression level of the file.
104
105   For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the
106   keyword argument *preset* to specify the compression level of the file.
107
108   For special purposes, there is a second format for *mode*:
109   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
110   object that processes its data as a stream of blocks.  No random seeking will
111   be done on the file. If given, *fileobj* may be any object that has a
112   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
113   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
114   in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
115   device. However, such a :class:`TarFile` object is limited in that it does
116   not allow random access, see :ref:`tar-examples`.  The currently
117   possible modes:
118
119   +-------------+--------------------------------------------+
120   | Mode        | Action                                     |
121   +=============+============================================+
122   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
123   |             | with transparent compression.              |
124   +-------------+--------------------------------------------+
125   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
126   |             | for reading.                               |
127   +-------------+--------------------------------------------+
128   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
129   |             | reading.                                   |
130   +-------------+--------------------------------------------+
131   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
132   |             | reading.                                   |
133   +-------------+--------------------------------------------+
134   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
135   |             | reading.                                   |
136   +-------------+--------------------------------------------+
137   | ``'w|'``    | Open an uncompressed *stream* for writing. |
138   +-------------+--------------------------------------------+
139   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
140   |             | writing.                                   |
141   +-------------+--------------------------------------------+
142   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
143   |             | writing.                                   |
144   +-------------+--------------------------------------------+
145   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
146   |             | writing.                                   |
147   +-------------+--------------------------------------------+
148
149   .. versionchanged:: 3.5
150      The ``'x'`` (exclusive creation) mode was added.
151
152   .. versionchanged:: 3.6
153      The *name* parameter accepts a :term:`path-like object`.
154
155
156.. class:: TarFile
157   :noindex:
158
159   Class for reading and writing tar archives. Do not use this class directly:
160   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
161
162
163.. function:: is_tarfile(name)
164
165   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
166   module can read. *name* may be a :class:`str`, file, or file-like object.
167
168   .. versionchanged:: 3.9
169      Support for file and file-like objects.
170
171
172The :mod:`tarfile` module defines the following exceptions:
173
174
175.. exception:: TarError
176
177   Base class for all :mod:`tarfile` exceptions.
178
179
180.. exception:: ReadError
181
182   Is raised when a tar archive is opened, that either cannot be handled by the
183   :mod:`tarfile` module or is somehow invalid.
184
185
186.. exception:: CompressionError
187
188   Is raised when a compression method is not supported or when the data cannot be
189   decoded properly.
190
191
192.. exception:: StreamError
193
194   Is raised for the limitations that are typical for stream-like :class:`TarFile`
195   objects.
196
197
198.. exception:: ExtractError
199
200   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
201   :attr:`TarFile.errorlevel`\ ``== 2``.
202
203
204.. exception:: HeaderError
205
206   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
207
208
209.. exception:: FilterError
210
211   Base class for members :ref:`refused <tarfile-extraction-refuse>` by
212   filters.
213
214   .. attribute:: tarinfo
215
216      Information about the member that the filter refused to extract,
217      as :ref:`TarInfo <tarinfo-objects>`.
218
219.. exception:: AbsolutePathError
220
221   Raised to refuse extracting a member with an absolute path.
222
223.. exception:: OutsideDestinationError
224
225   Raised to refuse extracting a member outside the destination directory.
226
227.. exception:: SpecialFileError
228
229   Raised to refuse extracting a special file (e.g. a device or pipe).
230
231.. exception:: AbsoluteLinkError
232
233   Raised to refuse extracting a symbolic link with an absolute path.
234
235.. exception:: LinkOutsideDestinationError
236
237   Raised to refuse extracting a symbolic link pointing outside the destination
238   directory.
239
240.. exception:: LinkFallbackError
241
242   Raised to refuse emulating a link (hard or symbolic) by extracting another
243   archive member, when that member would be rejected by the filter location.
244   The exception that was raised to reject the replacement member is available
245   as :attr:`!BaseException.__context__`.
246
247   .. versionadded:: next
248
249
250The following constants are available at the module level:
251
252.. data:: ENCODING
253
254   The default character encoding: ``'utf-8'`` on Windows, the value returned by
255   :func:`sys.getfilesystemencoding` otherwise.
256
257
258Each of the following constants defines a tar archive format that the
259:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
260details.
261
262
263.. data:: USTAR_FORMAT
264
265   POSIX.1-1988 (ustar) format.
266
267
268.. data:: GNU_FORMAT
269
270   GNU tar format.
271
272
273.. data:: PAX_FORMAT
274
275   POSIX.1-2001 (pax) format.
276
277
278.. data:: DEFAULT_FORMAT
279
280   The default format for creating archives. This is currently :const:`PAX_FORMAT`.
281
282   .. versionchanged:: 3.8
283      The default format for new archives was changed to
284      :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
285
286
287.. seealso::
288
289   Module :mod:`zipfile`
290      Documentation of the :mod:`zipfile` standard module.
291
292   :ref:`archiving-operations`
293      Documentation of the higher-level archiving facilities provided by the
294      standard :mod:`shutil` module.
295
296   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
297      Documentation for tar archive files, including GNU tar extensions.
298
299
300.. _tarfile-objects:
301
302TarFile Objects
303---------------
304
305The :class:`TarFile` object provides an interface to a tar archive. A tar
306archive is a sequence of blocks. An archive member (a stored file) is made up of
307a header block followed by data blocks. It is possible to store a file in a tar
308archive several times. Each archive member is represented by a :class:`TarInfo`
309object, see :ref:`tarinfo-objects` for details.
310
311A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
312statement. It will automatically be closed when the block is completed. Please
313note that in the event of an exception an archive opened for writing will not
314be finalized; only the internally used file object will be closed. See the
315:ref:`tar-examples` section for a use case.
316
317.. versionadded:: 3.2
318   Added support for the context management protocol.
319
320.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1)
321
322   All following arguments are optional and can be accessed as instance attributes
323   as well.
324
325   *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
326   It can be omitted if *fileobj* is given.
327   In this case, the file object's :attr:`name` attribute is used if it exists.
328
329   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
330   data to an existing file, ``'w'`` to create a new file overwriting an existing
331   one, or ``'x'`` to create a new file only if it does not already exist.
332
333   If *fileobj* is given, it is used for reading or writing data. If it can be
334   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
335   from position 0.
336
337   .. note::
338
339      *fileobj* is not closed, when :class:`TarFile` is closed.
340
341   *format* controls the archive format for writing. It must be one of the constants
342   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
343   defined at module level. When reading, format will be automatically detected, even
344   if different formats are present in a single archive.
345
346   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
347   with a different one.
348
349   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
350   is :const:`True`, add the content of the target files to the archive. This has no
351   effect on systems that do not support symbolic links.
352
353   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
354   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
355   as possible. This is only useful for reading concatenated or damaged archives.
356
357   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
358   messages). The messages are written to ``sys.stderr``.
359
360   *errorlevel* controls how extraction errors are handled,
361   see :attr:`the corresponding attribute <~TarFile.errorlevel>`.
362
363   The *encoding* and *errors* arguments define the character encoding to be
364   used for reading or writing the archive and how conversion errors are going
365   to be handled. The default settings will work for most users.
366   See section :ref:`tar-unicode` for in-depth information.
367
368   The *pax_headers* argument is an optional dictionary of strings which
369   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
370
371   .. versionchanged:: 3.2
372      Use ``'surrogateescape'`` as the default for the *errors* argument.
373
374   .. versionchanged:: 3.5
375      The ``'x'`` (exclusive creation) mode was added.
376
377   .. versionchanged:: 3.6
378      The *name* parameter accepts a :term:`path-like object`.
379
380
381.. classmethod:: TarFile.open(...)
382
383   Alternative constructor. The :func:`tarfile.open` function is actually a
384   shortcut to this classmethod.
385
386
387.. method:: TarFile.getmember(name)
388
389   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
390   in the archive, :exc:`KeyError` is raised.
391
392   .. note::
393
394      If a member occurs more than once in the archive, its last occurrence is assumed
395      to be the most up-to-date version.
396
397
398.. method:: TarFile.getmembers()
399
400   Return the members of the archive as a list of :class:`TarInfo` objects. The
401   list has the same order as the members in the archive.
402
403
404.. method:: TarFile.getnames()
405
406   Return the members as a list of their names. It has the same order as the list
407   returned by :meth:`getmembers`.
408
409
410.. method:: TarFile.list(verbose=True, *, members=None)
411
412   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
413   only the names of the members are printed. If it is :const:`True`, output
414   similar to that of :program:`ls -l` is produced. If optional *members* is
415   given, it must be a subset of the list returned by :meth:`getmembers`.
416
417   .. versionchanged:: 3.5
418      Added the *members* parameter.
419
420
421.. method:: TarFile.next()
422
423   Return the next member of the archive as a :class:`TarInfo` object, when
424   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
425   available.
426
427
428.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None)
429
430   Extract all members from the archive to the current working directory or
431   directory *path*. If optional *members* is given, it must be a subset of the
432   list returned by :meth:`getmembers`. Directory information like owner,
433   modification time and permissions are set after all members have been extracted.
434   This is done to work around two problems: A directory's modification time is
435   reset each time a file is created in it. And, if a directory's permissions do
436   not allow writing, extracting files to it will fail.
437
438   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
439   are used to set the owner/group for the extracted files. Otherwise, the named
440   values from the tarfile are used.
441
442   The *filter* argument, which was added in Python 3.11.4, specifies how
443   ``members`` are modified or rejected before extraction.
444   See :ref:`tarfile-extraction-filter` for details.
445   It is recommended to set this explicitly depending on which *tar* features
446   you need to support.
447
448   .. warning::
449
450      Never extract archives from untrusted sources without prior inspection.
451      It is possible that files are created outside of *path*, e.g. members
452      that have absolute filenames starting with ``"/"`` or filenames with two
453      dots ``".."``.
454
455      Set ``filter='data'`` to prevent the most dangerous security issues,
456      and read the :ref:`tarfile-extraction-filter` section for details.
457
458   .. versionchanged:: 3.5
459      Added the *numeric_owner* parameter.
460
461   .. versionchanged:: 3.6
462      The *path* parameter accepts a :term:`path-like object`.
463
464   .. versionchanged:: 3.11.4
465      Added the *filter* parameter.
466
467
468.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None)
469
470   Extract a member from the archive to the current working directory, using its
471   full name. Its file information is extracted as accurately as possible. *member*
472   may be a filename or a :class:`TarInfo` object. You can specify a different
473   directory using *path*. *path* may be a :term:`path-like object`.
474   File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
475
476   The *numeric_owner* and *filter* arguments are the same as
477   for :meth:`extractall`.
478
479   .. note::
480
481      The :meth:`extract` method does not take care of several extraction issues.
482      In most cases you should consider using the :meth:`extractall` method.
483
484   .. warning::
485
486      See the warning for :meth:`extractall`.
487
488      Set ``filter='data'`` to prevent the most dangerous security issues,
489      and read the :ref:`tarfile-extraction-filter` section for details.
490
491   .. versionchanged:: 3.2
492      Added the *set_attrs* parameter.
493
494   .. versionchanged:: 3.5
495      Added the *numeric_owner* parameter.
496
497   .. versionchanged:: 3.6
498      The *path* parameter accepts a :term:`path-like object`.
499
500   .. versionchanged:: 3.11.4
501      Added the *filter* parameter.
502
503
504.. method:: TarFile.extractfile(member)
505
506   Extract a member from the archive as a file object. *member* may be
507   a filename or a :class:`TarInfo` object. If *member* is a regular file or
508   a link, an :class:`io.BufferedReader` object is returned. For all other
509   existing members, :const:`None` is returned. If *member* does not appear
510   in the archive, :exc:`KeyError` is raised.
511
512   .. versionchanged:: 3.3
513      Return an :class:`io.BufferedReader` object.
514
515.. attribute:: TarFile.errorlevel
516   :type: int
517
518   If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract`
519   and :meth:`TarFile.extractall`.
520   Nevertheless, they appear as error messages in the debug output when
521   *debug* is greater than 0.
522   If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or
523   :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised
524   as :exc:`TarError` exceptions as well.
525
526   Some exceptions, e.g. ones caused by wrong argument types or data
527   corruption, are always raised.
528
529   Custom :ref:`extraction filters <tarfile-extraction-filter>`
530   should raise :exc:`FilterError` for *fatal* errors
531   and :exc:`ExtractError` for *non-fatal* ones.
532
533   Note that when an exception is raised, the archive may be partially
534   extracted. It is the user’s responsibility to clean up.
535
536.. attribute:: TarFile.extraction_filter
537
538   .. versionadded:: 3.11.4
539
540   The :ref:`extraction filter <tarfile-extraction-filter>` used
541   as a default for the *filter* argument of :meth:`~TarFile.extract`
542   and :meth:`~TarFile.extractall`.
543
544   The attribute may be ``None`` or a callable.
545   String names are not allowed for this attribute, unlike the *filter*
546   argument to :meth:`~TarFile.extract`.
547
548   If ``extraction_filter`` is ``None`` (the default),
549   calling an extraction method without a *filter* argument will
550   use the :func:`fully_trusted <fully_trusted_filter>` filter for
551   compatibility with previous Python versions.
552
553   In Python 3.12+, leaving ``extraction_filter=None`` will emit a
554   ``DeprecationWarning``.
555
556   In Python 3.14+, leaving ``extraction_filter=None`` will cause
557   extraction methods to use the :func:`data <data_filter>` filter by default.
558
559   The attribute may be set on instances or overridden in subclasses.
560   It also is possible to set it on the ``TarFile`` class itself to set a
561   global default, although, since it affects all uses of *tarfile*,
562   it is best practice to only do so in top-level applications or
563   :mod:`site configuration <site>`.
564   To set a global default this way, a filter function needs to be wrapped in
565   :func:`staticmethod()` to prevent injection of a ``self`` argument.
566
567.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
568
569   Add the file *name* to the archive. *name* may be any type of file
570   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
571   alternative name for the file in the archive. Directories are added
572   recursively by default. This can be avoided by setting *recursive* to
573   :const:`False`. Recursion adds entries in sorted order.
574   If *filter* is given, it
575   should be a function that takes a :class:`TarInfo` object argument and
576   returns the changed :class:`TarInfo` object. If it instead returns
577   :const:`None` the :class:`TarInfo` object will be excluded from the
578   archive. See :ref:`tar-examples` for an example.
579
580   .. versionchanged:: 3.2
581      Added the *filter* parameter.
582
583   .. versionchanged:: 3.7
584      Recursion adds entries in sorted order.
585
586
587.. method:: TarFile.addfile(tarinfo, fileobj=None)
588
589   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
590   it should be a :term:`binary file`, and
591   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
592   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
593
594
595.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
596
597   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
598   equivalent on an existing file.  The file is either named by *name*, or
599   specified as a :term:`file object` *fileobj* with a file descriptor.
600   *name* may be a :term:`path-like object`.  If
601   given, *arcname* specifies an alternative name for the file in the
602   archive, otherwise, the name is taken from *fileobj*’s
603   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
604   should be a text string.
605
606   You can modify
607   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
608   If the file object is not an ordinary file object positioned at the
609   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
610   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
611   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
612   could be a dummy string.
613
614   .. versionchanged:: 3.6
615      The *name* parameter accepts a :term:`path-like object`.
616
617
618.. method:: TarFile.close()
619
620   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
621   appended to the archive.
622
623
624.. attribute:: TarFile.pax_headers
625
626   A dictionary containing key-value pairs of pax global headers.
627
628
629
630.. _tarinfo-objects:
631
632TarInfo Objects
633---------------
634
635A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
636from storing all required attributes of a file (like file type, size, time,
637permissions, owner etc.), it provides some useful methods to determine its type.
638It does *not* contain the file's data itself.
639
640:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
641:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and
642:meth:`~TarFile.gettarinfo`.
643
644Modifying the objects returned by :meth:`~!TarFile.getmember` or
645:meth:`~!TarFile.getmembers` will affect all subsequent
646operations on the archive.
647For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or
648call the :meth:`~TarInfo.replace` method to create a modified copy in one step.
649
650Several attributes can be set to ``None`` to indicate that a piece of metadata
651is unused or unknown.
652Different :class:`TarInfo` methods handle ``None`` differently:
653
654- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will
655  ignore the corresponding metadata, leaving it set to a default.
656- :meth:`~TarFile.addfile` will fail.
657- :meth:`~TarFile.list` will print a placeholder string.
658
659
660.. versionchanged:: 3.11.4
661   Added :meth:`~TarInfo.replace` and handling of ``None``.
662
663
664.. class:: TarInfo(name="")
665
666   Create a :class:`TarInfo` object.
667
668
669.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
670
671   Create and return a :class:`TarInfo` object from string buffer *buf*.
672
673   Raises :exc:`HeaderError` if the buffer is invalid.
674
675
676.. classmethod:: TarInfo.fromtarfile(tarfile)
677
678   Read the next member from the :class:`TarFile` object *tarfile* and return it as
679   a :class:`TarInfo` object.
680
681
682.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
683
684   Create a string buffer from a :class:`TarInfo` object. For information on the
685   arguments see the constructor of the :class:`TarFile` class.
686
687   .. versionchanged:: 3.2
688      Use ``'surrogateescape'`` as the default for the *errors* argument.
689
690
691A ``TarInfo`` object has the following public data attributes:
692
693
694.. attribute:: TarInfo.name
695   :type: str
696
697   Name of the archive member.
698
699
700.. attribute:: TarInfo.size
701   :type: int
702
703   Size in bytes.
704
705
706.. attribute:: TarInfo.mtime
707   :type: int | float
708
709   Time of last modification in seconds since the :ref:`epoch <epoch>`,
710   as in :attr:`os.stat_result.st_mtime`.
711
712   .. versionchanged:: 3.11.4
713
714      Can be set to ``None`` for :meth:`~TarFile.extract` and
715      :meth:`~TarFile.extractall`, causing extraction to skip applying this
716      attribute.
717
718.. attribute:: TarInfo.mode
719   :type: int
720
721   Permission bits, as for :func:`os.chmod`.
722
723   .. versionchanged:: 3.11.4
724
725      Can be set to ``None`` for :meth:`~TarFile.extract` and
726      :meth:`~TarFile.extractall`, causing extraction to skip applying this
727      attribute.
728
729.. attribute:: TarInfo.type
730
731   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
732   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
733   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
734   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
735   more conveniently, use the ``is*()`` methods below.
736
737
738.. attribute:: TarInfo.linkname
739   :type: str
740
741   Name of the target file name, which is only present in :class:`TarInfo` objects
742   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
743
744
745.. attribute:: TarInfo.uid
746   :type: int
747
748   User ID of the user who originally stored this member.
749
750   .. versionchanged:: 3.11.4
751
752      Can be set to ``None`` for :meth:`~TarFile.extract` and
753      :meth:`~TarFile.extractall`, causing extraction to skip applying this
754      attribute.
755
756.. attribute:: TarInfo.gid
757   :type: int
758
759   Group ID of the user who originally stored this member.
760
761   .. versionchanged:: 3.11.4
762
763      Can be set to ``None`` for :meth:`~TarFile.extract` and
764      :meth:`~TarFile.extractall`, causing extraction to skip applying this
765      attribute.
766
767.. attribute:: TarInfo.uname
768   :type: str
769
770   User name.
771
772   .. versionchanged:: 3.11.4
773
774      Can be set to ``None`` for :meth:`~TarFile.extract` and
775      :meth:`~TarFile.extractall`, causing extraction to skip applying this
776      attribute.
777
778.. attribute:: TarInfo.gname
779   :type: str
780
781   Group name.
782
783   .. versionchanged:: 3.11.4
784
785      Can be set to ``None`` for :meth:`~TarFile.extract` and
786      :meth:`~TarFile.extractall`, causing extraction to skip applying this
787      attribute.
788
789.. attribute:: TarInfo.pax_headers
790   :type: dict
791
792   A dictionary containing key-value pairs of an associated pax extended header.
793
794.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=...,
795                            uid=..., gid=..., uname=..., gname=...,
796                            deep=True)
797
798   .. versionadded:: 3.11.4
799
800   Return a *new* copy of the :class:`!TarInfo` object with the given attributes
801   changed. For example, to return a ``TarInfo`` with the group name set to
802   ``'staff'``, use::
803
804       new_tarinfo = old_tarinfo.replace(gname='staff')
805
806   By default, a deep copy is made.
807   If *deep* is false, the copy is shallow, i.e. ``pax_headers``
808   and any custom attributes are shared with the original ``TarInfo`` object.
809
810A :class:`TarInfo` object also provides some convenient query methods:
811
812
813.. method:: TarInfo.isfile()
814
815   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
816
817
818.. method:: TarInfo.isreg()
819
820   Same as :meth:`isfile`.
821
822
823.. method:: TarInfo.isdir()
824
825   Return :const:`True` if it is a directory.
826
827
828.. method:: TarInfo.issym()
829
830   Return :const:`True` if it is a symbolic link.
831
832
833.. method:: TarInfo.islnk()
834
835   Return :const:`True` if it is a hard link.
836
837
838.. method:: TarInfo.ischr()
839
840   Return :const:`True` if it is a character device.
841
842
843.. method:: TarInfo.isblk()
844
845   Return :const:`True` if it is a block device.
846
847
848.. method:: TarInfo.isfifo()
849
850   Return :const:`True` if it is a FIFO.
851
852
853.. method:: TarInfo.isdev()
854
855   Return :const:`True` if it is one of character device, block device or FIFO.
856
857
858.. _tarfile-extraction-filter:
859
860Extraction filters
861------------------
862
863.. versionadded:: 3.11.4
864
865The *tar* format is designed to capture all details of a UNIX-like filesystem,
866which makes it very powerful.
867Unfortunately, the features make it easy to create tar files that have
868unintended -- and possibly malicious -- effects when extracted.
869For example, extracting a tar file can overwrite arbitrary files in various
870ways (e.g.  by using absolute paths, ``..`` path components, or symlinks that
871affect later members).
872
873In most cases, the full functionality is not needed.
874Therefore, *tarfile* supports extraction filters: a mechanism to limit
875functionality, and thus mitigate some of the security issues.
876
877.. seealso::
878
879   :pep:`706`
880      Contains further motivation and rationale behind the design.
881
882The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall`
883can be:
884
885* the string ``'fully_trusted'``: Honor all metadata as specified in the
886  archive.
887  Should be used if the user trusts the archive completely, or implements
888  their own complex verification.
889
890* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of
891  UNIX-like filesystems), but block features that are very likely to be
892  surprising or malicious. See :func:`tar_filter` for details.
893
894* the string ``'data'``: Ignore or block most features specific to UNIX-like
895  filesystems. Intended for extracting cross-platform data archives.
896  See :func:`data_filter` for details.
897
898* ``None`` (default): Use :attr:`TarFile.extraction_filter`.
899
900  If that is also ``None`` (the default), the ``'fully_trusted'``
901  filter will be used (for compatibility with earlier versions of Python).
902
903  In Python 3.12, the default will emit a ``DeprecationWarning``.
904
905  In Python 3.14, the ``'data'`` filter will become the default instead.
906  It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
907
908* A callable which will be called for each extracted member with a
909  :ref:`TarInfo <tarinfo-objects>` describing the member and the destination
910  path to where the archive is extracted (i.e. the same path is used for all
911  members)::
912
913      filter(/, member: TarInfo, path: str) -> TarInfo | None
914
915  The callable is called just before each member is extracted, so it can
916  take the current state of the disk into account.
917  It can:
918
919  - return a :class:`TarInfo` object which will be used instead of the metadata
920    in the archive, or
921  - return ``None``, in which case the member will be skipped, or
922  - raise an exception to abort the operation or skip the member,
923    depending on :attr:`~TarFile.errorlevel`.
924    Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave
925    the archive partially extracted. It does not attempt to clean up.
926
927Default named filters
928~~~~~~~~~~~~~~~~~~~~~
929
930The pre-defined, named filters are available as functions, so they can be
931reused in custom filters:
932
933.. function:: fully_trusted_filter(/, member, path)
934
935   Return *member* unchanged.
936
937   This implements the ``'fully_trusted'`` filter.
938
939.. function:: tar_filter(/, member, path)
940
941  Implements the ``'tar'`` filter.
942
943  - Strip leading slashes (``/`` and :attr:`os.sep`) from filenames.
944  - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute
945    paths (in case the name is absolute
946    even after stripping slashes, e.g. ``C:/foo`` on Windows).
947    This raises :class:`~tarfile.AbsolutePathError`.
948  - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute
949    path (after following symlinks) would end up outside the destination.
950    This raises :class:`~tarfile.OutsideDestinationError`.
951  - Clear high mode bits (setuid, setgid, sticky) and group/other write bits
952    (:attr:`~stat.S_IWGRP`|:attr:`~stat.S_IWOTH`).
953
954  Return the modified ``TarInfo`` member.
955
956.. function:: data_filter(/, member, path)
957
958  Implements the ``'data'`` filter.
959  In addition to what ``tar_filter`` does:
960
961  - Normalize link targets (:attr:`TarInfo.linkname`) using
962    :func:`os.path.normpath`.
963    Note that this removes internal ``..`` components, which may change the
964    meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
965    symbolic links.
966
967  - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
968    that link to absolute paths, or ones that link outside the destination.
969
970    This raises :class:`~tarfile.AbsoluteLinkError` or
971    :class:`~tarfile.LinkOutsideDestinationError`.
972
973    Note that such files are refused even on platforms that do not support
974    symbolic links.
975
976  - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files
977    (including pipes).
978    This raises :class:`~tarfile.SpecialFileError`.
979
980  - For regular files, including hard links:
981
982    - Set the owner read and write permissions
983      (:attr:`~stat.S_IRUSR`|:attr:`~stat.S_IWUSR`).
984    - Remove the group & other executable permission
985      (:attr:`~stat.S_IXGRP`|:attr:`~stat.S_IXOTH`)
986      if the owner doesn’t have it (:attr:`~stat.S_IXUSR`).
987
988  - For other files (directories), set ``mode`` to ``None``, so
989    that extraction methods skip applying permission bits.
990  - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``)
991    to ``None``, so that extraction methods skip setting it.
992
993  Return the modified ``TarInfo`` member.
994
995  .. versionchanged:: next
996
997     Link targets are now normalized.
998
999
1000.. _tarfile-extraction-refuse:
1001
1002Filter errors
1003~~~~~~~~~~~~~
1004
1005When a filter refuses to extract a file, it will raise an appropriate exception,
1006a subclass of :class:`~tarfile.FilterError`.
1007This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more.
1008With ``errorlevel=0`` the error will be logged and the member will be skipped,
1009but extraction will continue.
1010
1011
1012Hints for further verification
1013~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1014
1015Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted
1016files without prior inspection.
1017Among other issues, the pre-defined filters do not prevent denial-of-service
1018attacks. Users should do additional checks.
1019
1020Here is an incomplete list of things to consider:
1021
1022* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
1023  to prevent e.g. exploiting pre-existing links, and to make it easier to
1024  clean up after a failed extraction.
1025* Disallow symbolic links if you do not need the functionality.
1026* When working with untrusted data, use external (e.g. OS-level) limits on
1027  disk, memory and CPU usage.
1028* Check filenames against an allow-list of characters
1029  (to filter out control characters, confusables, foreign path separators,
1030  etc.).
1031* Check that filenames have expected extensions (discouraging files that
1032  execute when you “click on them”, or extension-less files like Windows special device names).
1033* Limit the number of extracted files, total size of extracted data,
1034  filename length (including symlink length), and size of individual files.
1035* Check for files that would be shadowed on case-insensitive filesystems.
1036
1037Also note that:
1038
1039* Tar files may contain multiple versions of the same file.
1040  Later ones are expected to overwrite any earlier ones.
1041  This feature is crucial to allow updating tape archives, but can be abused
1042  maliciously.
1043* *tarfile* does not protect against issues with “live” data,
1044  e.g. an attacker tinkering with the destination (or source) directory while
1045  extraction (or archiving) is in progress.
1046
1047
1048Supporting older Python versions
1049~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1050
1051Extraction filters were added to Python 3.12, and are backported to older
1052versions as security updates.
1053To check whether the feature is available, use e.g.
1054``hasattr(tarfile, 'data_filter')`` rather than checking the Python version.
1055
1056The following examples show how to support Python versions with and without
1057the feature.
1058Note that setting ``extraction_filter`` will affect any subsequent operations.
1059
1060* Fully trusted archive::
1061
1062    my_tarfile.extraction_filter = (lambda member, path: member)
1063    my_tarfile.extractall()
1064
1065* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior
1066  (``'fully_trusted'``) if this feature is not available::
1067
1068    my_tarfile.extraction_filter = getattr(tarfile, 'data_filter',
1069                                           (lambda member, path: member))
1070    my_tarfile.extractall()
1071
1072* Use the ``'data'`` filter; *fail* if it is not available::
1073
1074    my_tarfile.extractall(filter=tarfile.data_filter)
1075
1076  or::
1077
1078    my_tarfile.extraction_filter = tarfile.data_filter
1079    my_tarfile.extractall()
1080
1081* Use the ``'data'`` filter; *warn* if it is not available::
1082
1083   if hasattr(tarfile, 'data_filter'):
1084       my_tarfile.extractall(filter='data')
1085   else:
1086       # remove this when no longer needed
1087       warn_the_user('Extracting may be unsafe; consider updating Python')
1088       my_tarfile.extractall()
1089
1090
1091Stateful extraction filter example
1092~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1093
1094While *tarfile*'s extraction methods take a simple *filter* callable,
1095custom filters may be more complex objects with an internal state.
1096It may be useful to write these as context managers, to be used like this::
1097
1098    with StatefulFilter() as filter_func:
1099        tar.extractall(path, filter=filter_func)
1100
1101Such a filter can be written as, for example::
1102
1103    class StatefulFilter:
1104        def __init__(self):
1105            self.file_count = 0
1106
1107        def __enter__(self):
1108            return self
1109
1110        def __call__(self, member, path):
1111            self.file_count += 1
1112            return member
1113
1114        def __exit__(self, *exc_info):
1115            print(f'{self.file_count} files extracted')
1116
1117
1118.. _tarfile-commandline:
1119.. program:: tarfile
1120
1121
1122Command-Line Interface
1123----------------------
1124
1125.. versionadded:: 3.4
1126
1127The :mod:`tarfile` module provides a simple command-line interface to interact
1128with tar archives.
1129
1130If you want to create a new tar archive, specify its name after the :option:`-c`
1131option and then list the filename(s) that should be included:
1132
1133.. code-block:: shell-session
1134
1135    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
1136
1137Passing a directory is also acceptable:
1138
1139.. code-block:: shell-session
1140
1141    $ python -m tarfile -c monty.tar life-of-brian_1979/
1142
1143If you want to extract a tar archive into the current directory, use
1144the :option:`-e` option:
1145
1146.. code-block:: shell-session
1147
1148    $ python -m tarfile -e monty.tar
1149
1150You can also extract a tar archive into a different directory by passing the
1151directory's name:
1152
1153.. code-block:: shell-session
1154
1155    $ python -m tarfile -e monty.tar  other-dir/
1156
1157For a list of the files in a tar archive, use the :option:`-l` option:
1158
1159.. code-block:: shell-session
1160
1161    $ python -m tarfile -l monty.tar
1162
1163
1164Command-line options
1165~~~~~~~~~~~~~~~~~~~~
1166
1167.. cmdoption:: -l <tarfile>
1168               --list <tarfile>
1169
1170   List files in a tarfile.
1171
1172.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
1173               --create <tarfile> <source1> ... <sourceN>
1174
1175   Create tarfile from source files.
1176
1177.. cmdoption:: -e <tarfile> [<output_dir>]
1178               --extract <tarfile> [<output_dir>]
1179
1180   Extract tarfile into the current directory if *output_dir* is not specified.
1181
1182.. cmdoption:: -t <tarfile>
1183               --test <tarfile>
1184
1185   Test whether the tarfile is valid or not.
1186
1187.. cmdoption:: -v, --verbose
1188
1189   Verbose output.
1190
1191.. cmdoption:: --filter <filtername>
1192
1193   Specifies the *filter* for ``--extract``.
1194   See :ref:`tarfile-extraction-filter` for details.
1195   Only string names are accepted (that is, ``fully_trusted``, ``tar``,
1196   and ``data``).
1197
1198   .. versionadded:: 3.11.4
1199
1200.. _tar-examples:
1201
1202Examples
1203--------
1204
1205How to extract an entire tar archive to the current working directory::
1206
1207   import tarfile
1208   tar = tarfile.open("sample.tar.gz")
1209   tar.extractall()
1210   tar.close()
1211
1212How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
1213a generator function instead of a list::
1214
1215   import os
1216   import tarfile
1217
1218   def py_files(members):
1219       for tarinfo in members:
1220           if os.path.splitext(tarinfo.name)[1] == ".py":
1221               yield tarinfo
1222
1223   tar = tarfile.open("sample.tar.gz")
1224   tar.extractall(members=py_files(tar))
1225   tar.close()
1226
1227How to create an uncompressed tar archive from a list of filenames::
1228
1229   import tarfile
1230   tar = tarfile.open("sample.tar", "w")
1231   for name in ["foo", "bar", "quux"]:
1232       tar.add(name)
1233   tar.close()
1234
1235The same example using the :keyword:`with` statement::
1236
1237    import tarfile
1238    with tarfile.open("sample.tar", "w") as tar:
1239        for name in ["foo", "bar", "quux"]:
1240            tar.add(name)
1241
1242How to read a gzip compressed tar archive and display some member information::
1243
1244   import tarfile
1245   tar = tarfile.open("sample.tar.gz", "r:gz")
1246   for tarinfo in tar:
1247       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="")
1248       if tarinfo.isreg():
1249           print("a regular file.")
1250       elif tarinfo.isdir():
1251           print("a directory.")
1252       else:
1253           print("something else.")
1254   tar.close()
1255
1256How to create an archive and reset the user information using the *filter*
1257parameter in :meth:`TarFile.add`::
1258
1259    import tarfile
1260    def reset(tarinfo):
1261        tarinfo.uid = tarinfo.gid = 0
1262        tarinfo.uname = tarinfo.gname = "root"
1263        return tarinfo
1264    tar = tarfile.open("sample.tar.gz", "w:gz")
1265    tar.add("foo", filter=reset)
1266    tar.close()
1267
1268
1269.. _tar-formats:
1270
1271Supported tar formats
1272---------------------
1273
1274There are three tar formats that can be created with the :mod:`tarfile` module:
1275
1276* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
1277  up to a length of at best 256 characters and linknames up to 100 characters.
1278  The maximum file size is 8 GiB. This is an old and limited but widely
1279  supported format.
1280
1281* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
1282  linknames, files bigger than 8 GiB and sparse files. It is the de facto
1283  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
1284  extensions for long names, sparse file support is read-only.
1285
1286* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
1287  format with virtually no limits. It supports long filenames and linknames, large
1288  files and stores pathnames in a portable way. Modern tar implementations,
1289  including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
1290  features; some old or unmaintained libraries may not, but should treat
1291  *pax* archives as if they were in the universally supported *ustar* format.
1292  It is the current default format for new archives.
1293
1294  It extends the existing *ustar* format with extra headers for information
1295  that cannot be stored otherwise. There are two flavours of pax headers:
1296  Extended headers only affect the subsequent file header, global
1297  headers are valid for the complete archive and affect all following files.
1298  All the data in a pax header is encoded in *UTF-8* for portability reasons.
1299
1300There are some more variants of the tar format which can be read, but not
1301created:
1302
1303* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
1304  storing only regular files and directories. Names must not be longer than 100
1305  characters, there is no user/group name information. Some archives have
1306  miscalculated header checksums in case of fields with non-ASCII characters.
1307
1308* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
1309  pax format, but is not compatible.
1310
1311.. _tar-unicode:
1312
1313Unicode issues
1314--------------
1315
1316The tar format was originally conceived to make backups on tape drives with the
1317main focus on preserving file system information. Nowadays tar archives are
1318commonly used for file distribution and exchanging archives over networks. One
1319problem of the original format (which is the basis of all other formats) is
1320that there is no concept of supporting different character encodings. For
1321example, an ordinary tar archive created on a *UTF-8* system cannot be read
1322correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
1323metadata (like filenames, linknames, user/group names) will appear damaged.
1324Unfortunately, there is no way to autodetect the encoding of an archive. The
1325pax format was designed to solve this problem. It stores non-ASCII metadata
1326using the universal character encoding *UTF-8*.
1327
1328The details of character conversion in :mod:`tarfile` are controlled by the
1329*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
1330
1331*encoding* defines the character encoding to use for the metadata in the
1332archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
1333as a fallback. Depending on whether the archive is read or written, the
1334metadata must be either decoded or encoded. If *encoding* is not set
1335appropriately, this conversion may fail.
1336
1337The *errors* argument defines how characters are treated that cannot be
1338converted. Possible values are listed in section :ref:`error-handlers`.
1339The default scheme is ``'surrogateescape'`` which Python also uses for its
1340file system calls, see :ref:`os-filenames`.
1341
1342For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
1343because all the metadata is stored using *UTF-8*. *encoding* is only used in
1344the rare cases when binary pax headers are decoded or when strings with
1345surrogate characters are stored.
1346