• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39
40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
41
42   Return a :class:`TarFile` object for the pathname *name*. For detailed
43   information on :class:`TarFile` objects and the keyword arguments that are
44   allowed, see :ref:`tarfile-objects`.
45
46   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47   to ``'r'``. Here is a full list of mode combinations:
48
49   +------------------+---------------------------------------------+
50   | mode             | action                                      |
51   +==================+=============================================+
52   | ``'r' or 'r:*'`` | Open for reading with transparent           |
53   |                  | compression (recommended).                  |
54   +------------------+---------------------------------------------+
55   | ``'r:'``         | Open for reading exclusively without        |
56   |                  | compression.                                |
57   +------------------+---------------------------------------------+
58   | ``'r:gz'``       | Open for reading with gzip compression.     |
59   +------------------+---------------------------------------------+
60   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
61   +------------------+---------------------------------------------+
62   | ``'r:xz'``       | Open for reading with lzma compression.     |
63   +------------------+---------------------------------------------+
64   | ``'x'`` or       | Create a tarfile exclusively without        |
65   | ``'x:'``         | compression.                                |
66   |                  | Raise an :exc:`FileExistsError` exception   |
67   |                  | if it already exists.                       |
68   +------------------+---------------------------------------------+
69   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
70   |                  | Raise an :exc:`FileExistsError` exception   |
71   |                  | if it already exists.                       |
72   +------------------+---------------------------------------------+
73   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
74   |                  | Raise an :exc:`FileExistsError` exception   |
75   |                  | if it already exists.                       |
76   +------------------+---------------------------------------------+
77   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
78   |                  | Raise an :exc:`FileExistsError` exception   |
79   |                  | if it already exists.                       |
80   +------------------+---------------------------------------------+
81   | ``'a' or 'a:'``  | Open for appending with no compression. The |
82   |                  | file is created if it does not exist.       |
83   +------------------+---------------------------------------------+
84   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
85   +------------------+---------------------------------------------+
86   | ``'w:gz'``       | Open for gzip compressed writing.           |
87   +------------------+---------------------------------------------+
88   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
89   +------------------+---------------------------------------------+
90   | ``'w:xz'``       | Open for lzma compressed writing.           |
91   +------------------+---------------------------------------------+
92
93   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94   is not suitable to open a certain (compressed) file for reading,
95   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
96   compression method is not supported, :exc:`CompressionError` is raised.
97
98   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99   opened in binary mode for *name*. It is supposed to be at position 0.
100
101   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102   ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
103   *compresslevel* (default ``9``) to specify the compression level of the file.
104
105   For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the
106   keyword argument *preset* to specify the compression level of the file.
107
108   For special purposes, there is a second format for *mode*:
109   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
110   object that processes its data as a stream of blocks.  No random seeking will
111   be done on the file. If given, *fileobj* may be any object that has a
112   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
113   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
114   in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
115   device. However, such a :class:`TarFile` object is limited in that it does
116   not allow random access, see :ref:`tar-examples`.  The currently
117   possible modes:
118
119   +-------------+--------------------------------------------+
120   | Mode        | Action                                     |
121   +=============+============================================+
122   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
123   |             | with transparent compression.              |
124   +-------------+--------------------------------------------+
125   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
126   |             | for reading.                               |
127   +-------------+--------------------------------------------+
128   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
129   |             | reading.                                   |
130   +-------------+--------------------------------------------+
131   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
132   |             | reading.                                   |
133   +-------------+--------------------------------------------+
134   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
135   |             | reading.                                   |
136   +-------------+--------------------------------------------+
137   | ``'w|'``    | Open an uncompressed *stream* for writing. |
138   +-------------+--------------------------------------------+
139   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
140   |             | writing.                                   |
141   +-------------+--------------------------------------------+
142   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
143   |             | writing.                                   |
144   +-------------+--------------------------------------------+
145   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
146   |             | writing.                                   |
147   +-------------+--------------------------------------------+
148
149   .. versionchanged:: 3.5
150      The ``'x'`` (exclusive creation) mode was added.
151
152   .. versionchanged:: 3.6
153      The *name* parameter accepts a :term:`path-like object`.
154
155
156.. class:: TarFile
157   :noindex:
158
159   Class for reading and writing tar archives. Do not use this class directly:
160   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
161
162
163.. function:: is_tarfile(name)
164
165   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
166   module can read. *name* may be a :class:`str`, file, or file-like object.
167
168   .. versionchanged:: 3.9
169      Support for file and file-like objects.
170
171
172The :mod:`tarfile` module defines the following exceptions:
173
174
175.. exception:: TarError
176
177   Base class for all :mod:`tarfile` exceptions.
178
179
180.. exception:: ReadError
181
182   Is raised when a tar archive is opened, that either cannot be handled by the
183   :mod:`tarfile` module or is somehow invalid.
184
185
186.. exception:: CompressionError
187
188   Is raised when a compression method is not supported or when the data cannot be
189   decoded properly.
190
191
192.. exception:: StreamError
193
194   Is raised for the limitations that are typical for stream-like :class:`TarFile`
195   objects.
196
197
198.. exception:: ExtractError
199
200   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
201   :attr:`TarFile.errorlevel`\ ``== 2``.
202
203
204.. exception:: HeaderError
205
206   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
207
208
209The following constants are available at the module level:
210
211.. data:: ENCODING
212
213   The default character encoding: ``'utf-8'`` on Windows, the value returned by
214   :func:`sys.getfilesystemencoding` otherwise.
215
216
217Each of the following constants defines a tar archive format that the
218:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
219details.
220
221
222.. data:: USTAR_FORMAT
223
224   POSIX.1-1988 (ustar) format.
225
226
227.. data:: GNU_FORMAT
228
229   GNU tar format.
230
231
232.. data:: PAX_FORMAT
233
234   POSIX.1-2001 (pax) format.
235
236
237.. data:: DEFAULT_FORMAT
238
239   The default format for creating archives. This is currently :const:`PAX_FORMAT`.
240
241   .. versionchanged:: 3.8
242      The default format for new archives was changed to
243      :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
244
245
246.. seealso::
247
248   Module :mod:`zipfile`
249      Documentation of the :mod:`zipfile` standard module.
250
251   :ref:`archiving-operations`
252      Documentation of the higher-level archiving facilities provided by the
253      standard :mod:`shutil` module.
254
255   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
256      Documentation for tar archive files, including GNU tar extensions.
257
258
259.. _tarfile-objects:
260
261TarFile Objects
262---------------
263
264The :class:`TarFile` object provides an interface to a tar archive. A tar
265archive is a sequence of blocks. An archive member (a stored file) is made up of
266a header block followed by data blocks. It is possible to store a file in a tar
267archive several times. Each archive member is represented by a :class:`TarInfo`
268object, see :ref:`tarinfo-objects` for details.
269
270A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
271statement. It will automatically be closed when the block is completed. Please
272note that in the event of an exception an archive opened for writing will not
273be finalized; only the internally used file object will be closed. See the
274:ref:`tar-examples` section for a use case.
275
276.. versionadded:: 3.2
277   Added support for the context management protocol.
278
279.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
280
281   All following arguments are optional and can be accessed as instance attributes
282   as well.
283
284   *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
285   It can be omitted if *fileobj* is given.
286   In this case, the file object's :attr:`name` attribute is used if it exists.
287
288   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
289   data to an existing file, ``'w'`` to create a new file overwriting an existing
290   one, or ``'x'`` to create a new file only if it does not already exist.
291
292   If *fileobj* is given, it is used for reading or writing data. If it can be
293   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
294   from position 0.
295
296   .. note::
297
298      *fileobj* is not closed, when :class:`TarFile` is closed.
299
300   *format* controls the archive format for writing. It must be one of the constants
301   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
302   defined at module level. When reading, format will be automatically detected, even
303   if different formats are present in a single archive.
304
305   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
306   with a different one.
307
308   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
309   is :const:`True`, add the content of the target files to the archive. This has no
310   effect on systems that do not support symbolic links.
311
312   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
313   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
314   as possible. This is only useful for reading concatenated or damaged archives.
315
316   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
317   messages). The messages are written to ``sys.stderr``.
318
319   If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
320   Nevertheless, they appear as error messages in the debug output, when debugging
321   is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError`
322   exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
323   exceptions as well.
324
325   The *encoding* and *errors* arguments define the character encoding to be
326   used for reading or writing the archive and how conversion errors are going
327   to be handled. The default settings will work for most users.
328   See section :ref:`tar-unicode` for in-depth information.
329
330   The *pax_headers* argument is an optional dictionary of strings which
331   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
332
333   .. versionchanged:: 3.2
334      Use ``'surrogateescape'`` as the default for the *errors* argument.
335
336   .. versionchanged:: 3.5
337      The ``'x'`` (exclusive creation) mode was added.
338
339   .. versionchanged:: 3.6
340      The *name* parameter accepts a :term:`path-like object`.
341
342
343.. classmethod:: TarFile.open(...)
344
345   Alternative constructor. The :func:`tarfile.open` function is actually a
346   shortcut to this classmethod.
347
348
349.. method:: TarFile.getmember(name)
350
351   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
352   in the archive, :exc:`KeyError` is raised.
353
354   .. note::
355
356      If a member occurs more than once in the archive, its last occurrence is assumed
357      to be the most up-to-date version.
358
359
360.. method:: TarFile.getmembers()
361
362   Return the members of the archive as a list of :class:`TarInfo` objects. The
363   list has the same order as the members in the archive.
364
365
366.. method:: TarFile.getnames()
367
368   Return the members as a list of their names. It has the same order as the list
369   returned by :meth:`getmembers`.
370
371
372.. method:: TarFile.list(verbose=True, *, members=None)
373
374   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
375   only the names of the members are printed. If it is :const:`True`, output
376   similar to that of :program:`ls -l` is produced. If optional *members* is
377   given, it must be a subset of the list returned by :meth:`getmembers`.
378
379   .. versionchanged:: 3.5
380      Added the *members* parameter.
381
382
383.. method:: TarFile.next()
384
385   Return the next member of the archive as a :class:`TarInfo` object, when
386   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
387   available.
388
389
390.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
391
392   Extract all members from the archive to the current working directory or
393   directory *path*. If optional *members* is given, it must be a subset of the
394   list returned by :meth:`getmembers`. Directory information like owner,
395   modification time and permissions are set after all members have been extracted.
396   This is done to work around two problems: A directory's modification time is
397   reset each time a file is created in it. And, if a directory's permissions do
398   not allow writing, extracting files to it will fail.
399
400   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
401   are used to set the owner/group for the extracted files. Otherwise, the named
402   values from the tarfile are used.
403
404   .. warning::
405
406      Never extract archives from untrusted sources without prior inspection.
407      It is possible that files are created outside of *path*, e.g. members
408      that have absolute filenames starting with ``"/"`` or filenames with two
409      dots ``".."``.
410
411   .. versionchanged:: 3.5
412      Added the *numeric_owner* parameter.
413
414   .. versionchanged:: 3.6
415      The *path* parameter accepts a :term:`path-like object`.
416
417
418.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
419
420   Extract a member from the archive to the current working directory, using its
421   full name. Its file information is extracted as accurately as possible. *member*
422   may be a filename or a :class:`TarInfo` object. You can specify a different
423   directory using *path*. *path* may be a :term:`path-like object`.
424   File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
425
426   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
427   are used to set the owner/group for the extracted files. Otherwise, the named
428   values from the tarfile are used.
429
430   .. note::
431
432      The :meth:`extract` method does not take care of several extraction issues.
433      In most cases you should consider using the :meth:`extractall` method.
434
435   .. warning::
436
437      See the warning for :meth:`extractall`.
438
439   .. versionchanged:: 3.2
440      Added the *set_attrs* parameter.
441
442   .. versionchanged:: 3.5
443      Added the *numeric_owner* parameter.
444
445   .. versionchanged:: 3.6
446      The *path* parameter accepts a :term:`path-like object`.
447
448
449.. method:: TarFile.extractfile(member)
450
451   Extract a member from the archive as a file object. *member* may be
452   a filename or a :class:`TarInfo` object. If *member* is a regular file or
453   a link, an :class:`io.BufferedReader` object is returned. For all other
454   existing members, :const:`None` is returned. If *member* does not appear
455   in the archive, :exc:`KeyError` is raised.
456
457   .. versionchanged:: 3.3
458      Return an :class:`io.BufferedReader` object.
459
460
461.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
462
463   Add the file *name* to the archive. *name* may be any type of file
464   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
465   alternative name for the file in the archive. Directories are added
466   recursively by default. This can be avoided by setting *recursive* to
467   :const:`False`. Recursion adds entries in sorted order.
468   If *filter* is given, it
469   should be a function that takes a :class:`TarInfo` object argument and
470   returns the changed :class:`TarInfo` object. If it instead returns
471   :const:`None` the :class:`TarInfo` object will be excluded from the
472   archive. See :ref:`tar-examples` for an example.
473
474   .. versionchanged:: 3.2
475      Added the *filter* parameter.
476
477   .. versionchanged:: 3.7
478      Recursion adds entries in sorted order.
479
480
481.. method:: TarFile.addfile(tarinfo, fileobj=None)
482
483   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
484   it should be a :term:`binary file`, and
485   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
486   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
487
488
489.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
490
491   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
492   equivalent on an existing file.  The file is either named by *name*, or
493   specified as a :term:`file object` *fileobj* with a file descriptor.
494   *name* may be a :term:`path-like object`.  If
495   given, *arcname* specifies an alternative name for the file in the
496   archive, otherwise, the name is taken from *fileobj*’s
497   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
498   should be a text string.
499
500   You can modify
501   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
502   If the file object is not an ordinary file object positioned at the
503   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
504   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
505   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
506   could be a dummy string.
507
508   .. versionchanged:: 3.6
509      The *name* parameter accepts a :term:`path-like object`.
510
511
512.. method:: TarFile.close()
513
514   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
515   appended to the archive.
516
517
518.. attribute:: TarFile.pax_headers
519
520   A dictionary containing key-value pairs of pax global headers.
521
522
523
524.. _tarinfo-objects:
525
526TarInfo Objects
527---------------
528
529A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
530from storing all required attributes of a file (like file type, size, time,
531permissions, owner etc.), it provides some useful methods to determine its type.
532It does *not* contain the file's data itself.
533
534:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
535:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
536
537
538.. class:: TarInfo(name="")
539
540   Create a :class:`TarInfo` object.
541
542
543.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
544
545   Create and return a :class:`TarInfo` object from string buffer *buf*.
546
547   Raises :exc:`HeaderError` if the buffer is invalid.
548
549
550.. classmethod:: TarInfo.fromtarfile(tarfile)
551
552   Read the next member from the :class:`TarFile` object *tarfile* and return it as
553   a :class:`TarInfo` object.
554
555
556.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
557
558   Create a string buffer from a :class:`TarInfo` object. For information on the
559   arguments see the constructor of the :class:`TarFile` class.
560
561   .. versionchanged:: 3.2
562      Use ``'surrogateescape'`` as the default for the *errors* argument.
563
564
565A ``TarInfo`` object has the following public data attributes:
566
567
568.. attribute:: TarInfo.name
569
570   Name of the archive member.
571
572
573.. attribute:: TarInfo.size
574
575   Size in bytes.
576
577
578.. attribute:: TarInfo.mtime
579
580   Time of last modification.
581
582
583.. attribute:: TarInfo.mode
584
585   Permission bits.
586
587
588.. attribute:: TarInfo.type
589
590   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
591   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
592   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
593   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
594   more conveniently, use the ``is*()`` methods below.
595
596
597.. attribute:: TarInfo.linkname
598
599   Name of the target file name, which is only present in :class:`TarInfo` objects
600   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
601
602
603.. attribute:: TarInfo.uid
604
605   User ID of the user who originally stored this member.
606
607
608.. attribute:: TarInfo.gid
609
610   Group ID of the user who originally stored this member.
611
612
613.. attribute:: TarInfo.uname
614
615   User name.
616
617
618.. attribute:: TarInfo.gname
619
620   Group name.
621
622
623.. attribute:: TarInfo.pax_headers
624
625   A dictionary containing key-value pairs of an associated pax extended header.
626
627
628A :class:`TarInfo` object also provides some convenient query methods:
629
630
631.. method:: TarInfo.isfile()
632
633   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
634
635
636.. method:: TarInfo.isreg()
637
638   Same as :meth:`isfile`.
639
640
641.. method:: TarInfo.isdir()
642
643   Return :const:`True` if it is a directory.
644
645
646.. method:: TarInfo.issym()
647
648   Return :const:`True` if it is a symbolic link.
649
650
651.. method:: TarInfo.islnk()
652
653   Return :const:`True` if it is a hard link.
654
655
656.. method:: TarInfo.ischr()
657
658   Return :const:`True` if it is a character device.
659
660
661.. method:: TarInfo.isblk()
662
663   Return :const:`True` if it is a block device.
664
665
666.. method:: TarInfo.isfifo()
667
668   Return :const:`True` if it is a FIFO.
669
670
671.. method:: TarInfo.isdev()
672
673   Return :const:`True` if it is one of character device, block device or FIFO.
674
675
676.. _tarfile-commandline:
677.. program:: tarfile
678
679Command-Line Interface
680----------------------
681
682.. versionadded:: 3.4
683
684The :mod:`tarfile` module provides a simple command-line interface to interact
685with tar archives.
686
687If you want to create a new tar archive, specify its name after the :option:`-c`
688option and then list the filename(s) that should be included:
689
690.. code-block:: shell-session
691
692    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
693
694Passing a directory is also acceptable:
695
696.. code-block:: shell-session
697
698    $ python -m tarfile -c monty.tar life-of-brian_1979/
699
700If you want to extract a tar archive into the current directory, use
701the :option:`-e` option:
702
703.. code-block:: shell-session
704
705    $ python -m tarfile -e monty.tar
706
707You can also extract a tar archive into a different directory by passing the
708directory's name:
709
710.. code-block:: shell-session
711
712    $ python -m tarfile -e monty.tar  other-dir/
713
714For a list of the files in a tar archive, use the :option:`-l` option:
715
716.. code-block:: shell-session
717
718    $ python -m tarfile -l monty.tar
719
720
721Command-line options
722~~~~~~~~~~~~~~~~~~~~
723
724.. cmdoption:: -l <tarfile>
725               --list <tarfile>
726
727   List files in a tarfile.
728
729.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
730               --create <tarfile> <source1> ... <sourceN>
731
732   Create tarfile from source files.
733
734.. cmdoption:: -e <tarfile> [<output_dir>]
735               --extract <tarfile> [<output_dir>]
736
737   Extract tarfile into the current directory if *output_dir* is not specified.
738
739.. cmdoption:: -t <tarfile>
740               --test <tarfile>
741
742   Test whether the tarfile is valid or not.
743
744.. cmdoption:: -v, --verbose
745
746   Verbose output.
747
748.. _tar-examples:
749
750Examples
751--------
752
753How to extract an entire tar archive to the current working directory::
754
755   import tarfile
756   tar = tarfile.open("sample.tar.gz")
757   tar.extractall()
758   tar.close()
759
760How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
761a generator function instead of a list::
762
763   import os
764   import tarfile
765
766   def py_files(members):
767       for tarinfo in members:
768           if os.path.splitext(tarinfo.name)[1] == ".py":
769               yield tarinfo
770
771   tar = tarfile.open("sample.tar.gz")
772   tar.extractall(members=py_files(tar))
773   tar.close()
774
775How to create an uncompressed tar archive from a list of filenames::
776
777   import tarfile
778   tar = tarfile.open("sample.tar", "w")
779   for name in ["foo", "bar", "quux"]:
780       tar.add(name)
781   tar.close()
782
783The same example using the :keyword:`with` statement::
784
785    import tarfile
786    with tarfile.open("sample.tar", "w") as tar:
787        for name in ["foo", "bar", "quux"]:
788            tar.add(name)
789
790How to read a gzip compressed tar archive and display some member information::
791
792   import tarfile
793   tar = tarfile.open("sample.tar.gz", "r:gz")
794   for tarinfo in tar:
795       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="")
796       if tarinfo.isreg():
797           print("a regular file.")
798       elif tarinfo.isdir():
799           print("a directory.")
800       else:
801           print("something else.")
802   tar.close()
803
804How to create an archive and reset the user information using the *filter*
805parameter in :meth:`TarFile.add`::
806
807    import tarfile
808    def reset(tarinfo):
809        tarinfo.uid = tarinfo.gid = 0
810        tarinfo.uname = tarinfo.gname = "root"
811        return tarinfo
812    tar = tarfile.open("sample.tar.gz", "w:gz")
813    tar.add("foo", filter=reset)
814    tar.close()
815
816
817.. _tar-formats:
818
819Supported tar formats
820---------------------
821
822There are three tar formats that can be created with the :mod:`tarfile` module:
823
824* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
825  up to a length of at best 256 characters and linknames up to 100 characters.
826  The maximum file size is 8 GiB. This is an old and limited but widely
827  supported format.
828
829* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
830  linknames, files bigger than 8 GiB and sparse files. It is the de facto
831  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
832  extensions for long names, sparse file support is read-only.
833
834* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
835  format with virtually no limits. It supports long filenames and linknames, large
836  files and stores pathnames in a portable way. Modern tar implementations,
837  including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
838  features; some old or unmaintained libraries may not, but should treat
839  *pax* archives as if they were in the universally-supported *ustar* format.
840  It is the current default format for new archives.
841
842  It extends the existing *ustar* format with extra headers for information
843  that cannot be stored otherwise. There are two flavours of pax headers:
844  Extended headers only affect the subsequent file header, global
845  headers are valid for the complete archive and affect all following files.
846  All the data in a pax header is encoded in *UTF-8* for portability reasons.
847
848There are some more variants of the tar format which can be read, but not
849created:
850
851* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
852  storing only regular files and directories. Names must not be longer than 100
853  characters, there is no user/group name information. Some archives have
854  miscalculated header checksums in case of fields with non-ASCII characters.
855
856* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
857  pax format, but is not compatible.
858
859.. _tar-unicode:
860
861Unicode issues
862--------------
863
864The tar format was originally conceived to make backups on tape drives with the
865main focus on preserving file system information. Nowadays tar archives are
866commonly used for file distribution and exchanging archives over networks. One
867problem of the original format (which is the basis of all other formats) is
868that there is no concept of supporting different character encodings. For
869example, an ordinary tar archive created on a *UTF-8* system cannot be read
870correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
871metadata (like filenames, linknames, user/group names) will appear damaged.
872Unfortunately, there is no way to autodetect the encoding of an archive. The
873pax format was designed to solve this problem. It stores non-ASCII metadata
874using the universal character encoding *UTF-8*.
875
876The details of character conversion in :mod:`tarfile` are controlled by the
877*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
878
879*encoding* defines the character encoding to use for the metadata in the
880archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
881as a fallback. Depending on whether the archive is read or written, the
882metadata must be either decoded or encoded. If *encoding* is not set
883appropriately, this conversion may fail.
884
885The *errors* argument defines how characters are treated that cannot be
886converted. Possible values are listed in section :ref:`error-handlers`.
887The default scheme is ``'surrogateescape'`` which Python also uses for its
888file system calls, see :ref:`os-filenames`.
889
890For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
891because all the metadata is stored using *UTF-8*. *encoding* is only used in
892the rare cases when binary pax headers are decoded or when strings with
893surrogate characters are stored.
894