• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39
40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
41
42   Return a :class:`TarFile` object for the pathname *name*. For detailed
43   information on :class:`TarFile` objects and the keyword arguments that are
44   allowed, see :ref:`tarfile-objects`.
45
46   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47   to ``'r'``. Here is a full list of mode combinations:
48
49   +------------------+---------------------------------------------+
50   | mode             | action                                      |
51   +==================+=============================================+
52   | ``'r' or 'r:*'`` | Open for reading with transparent           |
53   |                  | compression (recommended).                  |
54   +------------------+---------------------------------------------+
55   | ``'r:'``         | Open for reading exclusively without        |
56   |                  | compression.                                |
57   +------------------+---------------------------------------------+
58   | ``'r:gz'``       | Open for reading with gzip compression.     |
59   +------------------+---------------------------------------------+
60   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
61   +------------------+---------------------------------------------+
62   | ``'r:xz'``       | Open for reading with lzma compression.     |
63   +------------------+---------------------------------------------+
64   | ``'x'`` or       | Create a tarfile exclusively without        |
65   | ``'x:'``         | compression.                                |
66   |                  | Raise an :exc:`FileExistsError` exception   |
67   |                  | if it already exists.                       |
68   +------------------+---------------------------------------------+
69   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
70   |                  | Raise an :exc:`FileExistsError` exception   |
71   |                  | if it already exists.                       |
72   +------------------+---------------------------------------------+
73   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
74   |                  | Raise an :exc:`FileExistsError` exception   |
75   |                  | if it already exists.                       |
76   +------------------+---------------------------------------------+
77   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
78   |                  | Raise an :exc:`FileExistsError` exception   |
79   |                  | if it already exists.                       |
80   +------------------+---------------------------------------------+
81   | ``'a' or 'a:'``  | Open for appending with no compression. The |
82   |                  | file is created if it does not exist.       |
83   +------------------+---------------------------------------------+
84   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
85   +------------------+---------------------------------------------+
86   | ``'w:gz'``       | Open for gzip compressed writing.           |
87   +------------------+---------------------------------------------+
88   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
89   +------------------+---------------------------------------------+
90   | ``'w:xz'``       | Open for lzma compressed writing.           |
91   +------------------+---------------------------------------------+
92
93   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94   is not suitable to open a certain (compressed) file for reading,
95   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
96   compression method is not supported, :exc:`CompressionError` is raised.
97
98   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99   opened in binary mode for *name*. It is supposed to be at position 0.
100
101   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102   ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
103   *compresslevel* (default ``9``) to specify the compression level of the file.
104
105   For special purposes, there is a second format for *mode*:
106   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
107   object that processes its data as a stream of blocks.  No random seeking will
108   be done on the file. If given, *fileobj* may be any object that has a
109   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
111   in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
112   device. However, such a :class:`TarFile` object is limited in that it does
113   not allow random access, see :ref:`tar-examples`.  The currently
114   possible modes:
115
116   +-------------+--------------------------------------------+
117   | Mode        | Action                                     |
118   +=============+============================================+
119   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
120   |             | with transparent compression.              |
121   +-------------+--------------------------------------------+
122   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
123   |             | for reading.                               |
124   +-------------+--------------------------------------------+
125   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
126   |             | reading.                                   |
127   +-------------+--------------------------------------------+
128   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
129   |             | reading.                                   |
130   +-------------+--------------------------------------------+
131   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
132   |             | reading.                                   |
133   +-------------+--------------------------------------------+
134   | ``'w|'``    | Open an uncompressed *stream* for writing. |
135   +-------------+--------------------------------------------+
136   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
137   |             | writing.                                   |
138   +-------------+--------------------------------------------+
139   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
140   |             | writing.                                   |
141   +-------------+--------------------------------------------+
142   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
143   |             | writing.                                   |
144   +-------------+--------------------------------------------+
145
146   .. versionchanged:: 3.5
147      The ``'x'`` (exclusive creation) mode was added.
148
149   .. versionchanged:: 3.6
150      The *name* parameter accepts a :term:`path-like object`.
151
152
153.. class:: TarFile
154
155   Class for reading and writing tar archives. Do not use this class directly:
156   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
157
158
159.. function:: is_tarfile(name)
160
161   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
162   module can read.
163
164
165The :mod:`tarfile` module defines the following exceptions:
166
167
168.. exception:: TarError
169
170   Base class for all :mod:`tarfile` exceptions.
171
172
173.. exception:: ReadError
174
175   Is raised when a tar archive is opened, that either cannot be handled by the
176   :mod:`tarfile` module or is somehow invalid.
177
178
179.. exception:: CompressionError
180
181   Is raised when a compression method is not supported or when the data cannot be
182   decoded properly.
183
184
185.. exception:: StreamError
186
187   Is raised for the limitations that are typical for stream-like :class:`TarFile`
188   objects.
189
190
191.. exception:: ExtractError
192
193   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
194   :attr:`TarFile.errorlevel`\ ``== 2``.
195
196
197.. exception:: HeaderError
198
199   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
200
201
202The following constants are available at the module level:
203
204.. data:: ENCODING
205
206   The default character encoding: ``'utf-8'`` on Windows, the value returned by
207   :func:`sys.getfilesystemencoding` otherwise.
208
209
210Each of the following constants defines a tar archive format that the
211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
212details.
213
214
215.. data:: USTAR_FORMAT
216
217   POSIX.1-1988 (ustar) format.
218
219
220.. data:: GNU_FORMAT
221
222   GNU tar format.
223
224
225.. data:: PAX_FORMAT
226
227   POSIX.1-2001 (pax) format.
228
229
230.. data:: DEFAULT_FORMAT
231
232   The default format for creating archives. This is currently :const:`PAX_FORMAT`.
233
234   .. versionchanged:: 3.8
235      The default format for new archives was changed to
236      :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
237
238
239.. seealso::
240
241   Module :mod:`zipfile`
242      Documentation of the :mod:`zipfile` standard module.
243
244   :ref:`archiving-operations`
245      Documentation of the higher-level archiving facilities provided by the
246      standard :mod:`shutil` module.
247
248   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
249      Documentation for tar archive files, including GNU tar extensions.
250
251
252.. _tarfile-objects:
253
254TarFile Objects
255---------------
256
257The :class:`TarFile` object provides an interface to a tar archive. A tar
258archive is a sequence of blocks. An archive member (a stored file) is made up of
259a header block followed by data blocks. It is possible to store a file in a tar
260archive several times. Each archive member is represented by a :class:`TarInfo`
261object, see :ref:`tarinfo-objects` for details.
262
263A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
264statement. It will automatically be closed when the block is completed. Please
265note that in the event of an exception an archive opened for writing will not
266be finalized; only the internally used file object will be closed. See the
267:ref:`tar-examples` section for a use case.
268
269.. versionadded:: 3.2
270   Added support for the context management protocol.
271
272.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
273
274   All following arguments are optional and can be accessed as instance attributes
275   as well.
276
277   *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
278   It can be omitted if *fileobj* is given.
279   In this case, the file object's :attr:`name` attribute is used if it exists.
280
281   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
282   data to an existing file, ``'w'`` to create a new file overwriting an existing
283   one, or ``'x'`` to create a new file only if it does not already exist.
284
285   If *fileobj* is given, it is used for reading or writing data. If it can be
286   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
287   from position 0.
288
289   .. note::
290
291      *fileobj* is not closed, when :class:`TarFile` is closed.
292
293   *format* controls the archive format for writing. It must be one of the constants
294   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
295   defined at module level. When reading, format will be automatically detected, even
296   if different formats are present in a single archive.
297
298   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
299   with a different one.
300
301   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
302   is :const:`True`, add the content of the target files to the archive. This has no
303   effect on systems that do not support symbolic links.
304
305   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
306   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
307   as possible. This is only useful for reading concatenated or damaged archives.
308
309   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
310   messages). The messages are written to ``sys.stderr``.
311
312   If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
313   Nevertheless, they appear as error messages in the debug output, when debugging
314   is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError`
315   exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
316   exceptions as well.
317
318   The *encoding* and *errors* arguments define the character encoding to be
319   used for reading or writing the archive and how conversion errors are going
320   to be handled. The default settings will work for most users.
321   See section :ref:`tar-unicode` for in-depth information.
322
323   The *pax_headers* argument is an optional dictionary of strings which
324   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
325
326   .. versionchanged:: 3.2
327      Use ``'surrogateescape'`` as the default for the *errors* argument.
328
329   .. versionchanged:: 3.5
330      The ``'x'`` (exclusive creation) mode was added.
331
332   .. versionchanged:: 3.6
333      The *name* parameter accepts a :term:`path-like object`.
334
335
336.. classmethod:: TarFile.open(...)
337
338   Alternative constructor. The :func:`tarfile.open` function is actually a
339   shortcut to this classmethod.
340
341
342.. method:: TarFile.getmember(name)
343
344   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
345   in the archive, :exc:`KeyError` is raised.
346
347   .. note::
348
349      If a member occurs more than once in the archive, its last occurrence is assumed
350      to be the most up-to-date version.
351
352
353.. method:: TarFile.getmembers()
354
355   Return the members of the archive as a list of :class:`TarInfo` objects. The
356   list has the same order as the members in the archive.
357
358
359.. method:: TarFile.getnames()
360
361   Return the members as a list of their names. It has the same order as the list
362   returned by :meth:`getmembers`.
363
364
365.. method:: TarFile.list(verbose=True, *, members=None)
366
367   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
368   only the names of the members are printed. If it is :const:`True`, output
369   similar to that of :program:`ls -l` is produced. If optional *members* is
370   given, it must be a subset of the list returned by :meth:`getmembers`.
371
372   .. versionchanged:: 3.5
373      Added the *members* parameter.
374
375
376.. method:: TarFile.next()
377
378   Return the next member of the archive as a :class:`TarInfo` object, when
379   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
380   available.
381
382
383.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
384
385   Extract all members from the archive to the current working directory or
386   directory *path*. If optional *members* is given, it must be a subset of the
387   list returned by :meth:`getmembers`. Directory information like owner,
388   modification time and permissions are set after all members have been extracted.
389   This is done to work around two problems: A directory's modification time is
390   reset each time a file is created in it. And, if a directory's permissions do
391   not allow writing, extracting files to it will fail.
392
393   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
394   are used to set the owner/group for the extracted files. Otherwise, the named
395   values from the tarfile are used.
396
397   .. warning::
398
399      Never extract archives from untrusted sources without prior inspection.
400      It is possible that files are created outside of *path*, e.g. members
401      that have absolute filenames starting with ``"/"`` or filenames with two
402      dots ``".."``.
403
404   .. versionchanged:: 3.5
405      Added the *numeric_owner* parameter.
406
407   .. versionchanged:: 3.6
408      The *path* parameter accepts a :term:`path-like object`.
409
410
411.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
412
413   Extract a member from the archive to the current working directory, using its
414   full name. Its file information is extracted as accurately as possible. *member*
415   may be a filename or a :class:`TarInfo` object. You can specify a different
416   directory using *path*. *path* may be a :term:`path-like object`.
417   File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
418
419   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
420   are used to set the owner/group for the extracted files. Otherwise, the named
421   values from the tarfile are used.
422
423   .. note::
424
425      The :meth:`extract` method does not take care of several extraction issues.
426      In most cases you should consider using the :meth:`extractall` method.
427
428   .. warning::
429
430      See the warning for :meth:`extractall`.
431
432   .. versionchanged:: 3.2
433      Added the *set_attrs* parameter.
434
435   .. versionchanged:: 3.5
436      Added the *numeric_owner* parameter.
437
438   .. versionchanged:: 3.6
439      The *path* parameter accepts a :term:`path-like object`.
440
441
442.. method:: TarFile.extractfile(member)
443
444   Extract a member from the archive as a file object. *member* may be a filename
445   or a :class:`TarInfo` object. If *member* is a regular file or a link, an
446   :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
447   returned.
448
449   .. versionchanged:: 3.3
450      Return an :class:`io.BufferedReader` object.
451
452
453.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
454
455   Add the file *name* to the archive. *name* may be any type of file
456   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
457   alternative name for the file in the archive. Directories are added
458   recursively by default. This can be avoided by setting *recursive* to
459   :const:`False`. Recursion adds entries in sorted order.
460   If *filter* is given, it
461   should be a function that takes a :class:`TarInfo` object argument and
462   returns the changed :class:`TarInfo` object. If it instead returns
463   :const:`None` the :class:`TarInfo` object will be excluded from the
464   archive. See :ref:`tar-examples` for an example.
465
466   .. versionchanged:: 3.2
467      Added the *filter* parameter.
468
469   .. versionchanged:: 3.7
470      Recursion adds entries in sorted order.
471
472
473.. method:: TarFile.addfile(tarinfo, fileobj=None)
474
475   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
476   it should be a :term:`binary file`, and
477   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
478   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
479
480
481.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
482
483   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
484   equivalent on an existing file.  The file is either named by *name*, or
485   specified as a :term:`file object` *fileobj* with a file descriptor.
486   *name* may be a :term:`path-like object`.  If
487   given, *arcname* specifies an alternative name for the file in the
488   archive, otherwise, the name is taken from *fileobj*’s
489   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
490   should be a text string.
491
492   You can modify
493   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
494   If the file object is not an ordinary file object positioned at the
495   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
496   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
497   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
498   could be a dummy string.
499
500   .. versionchanged:: 3.6
501      The *name* parameter accepts a :term:`path-like object`.
502
503
504.. method:: TarFile.close()
505
506   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
507   appended to the archive.
508
509
510.. attribute:: TarFile.pax_headers
511
512   A dictionary containing key-value pairs of pax global headers.
513
514
515
516.. _tarinfo-objects:
517
518TarInfo Objects
519---------------
520
521A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
522from storing all required attributes of a file (like file type, size, time,
523permissions, owner etc.), it provides some useful methods to determine its type.
524It does *not* contain the file's data itself.
525
526:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
527:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
528
529
530.. class:: TarInfo(name="")
531
532   Create a :class:`TarInfo` object.
533
534
535.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
536
537   Create and return a :class:`TarInfo` object from string buffer *buf*.
538
539   Raises :exc:`HeaderError` if the buffer is invalid.
540
541
542.. classmethod:: TarInfo.fromtarfile(tarfile)
543
544   Read the next member from the :class:`TarFile` object *tarfile* and return it as
545   a :class:`TarInfo` object.
546
547
548.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
549
550   Create a string buffer from a :class:`TarInfo` object. For information on the
551   arguments see the constructor of the :class:`TarFile` class.
552
553   .. versionchanged:: 3.2
554      Use ``'surrogateescape'`` as the default for the *errors* argument.
555
556
557A ``TarInfo`` object has the following public data attributes:
558
559
560.. attribute:: TarInfo.name
561
562   Name of the archive member.
563
564
565.. attribute:: TarInfo.size
566
567   Size in bytes.
568
569
570.. attribute:: TarInfo.mtime
571
572   Time of last modification.
573
574
575.. attribute:: TarInfo.mode
576
577   Permission bits.
578
579
580.. attribute:: TarInfo.type
581
582   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
583   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
584   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
585   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
586   more conveniently, use the ``is*()`` methods below.
587
588
589.. attribute:: TarInfo.linkname
590
591   Name of the target file name, which is only present in :class:`TarInfo` objects
592   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
593
594
595.. attribute:: TarInfo.uid
596
597   User ID of the user who originally stored this member.
598
599
600.. attribute:: TarInfo.gid
601
602   Group ID of the user who originally stored this member.
603
604
605.. attribute:: TarInfo.uname
606
607   User name.
608
609
610.. attribute:: TarInfo.gname
611
612   Group name.
613
614
615.. attribute:: TarInfo.pax_headers
616
617   A dictionary containing key-value pairs of an associated pax extended header.
618
619
620A :class:`TarInfo` object also provides some convenient query methods:
621
622
623.. method:: TarInfo.isfile()
624
625   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
626
627
628.. method:: TarInfo.isreg()
629
630   Same as :meth:`isfile`.
631
632
633.. method:: TarInfo.isdir()
634
635   Return :const:`True` if it is a directory.
636
637
638.. method:: TarInfo.issym()
639
640   Return :const:`True` if it is a symbolic link.
641
642
643.. method:: TarInfo.islnk()
644
645   Return :const:`True` if it is a hard link.
646
647
648.. method:: TarInfo.ischr()
649
650   Return :const:`True` if it is a character device.
651
652
653.. method:: TarInfo.isblk()
654
655   Return :const:`True` if it is a block device.
656
657
658.. method:: TarInfo.isfifo()
659
660   Return :const:`True` if it is a FIFO.
661
662
663.. method:: TarInfo.isdev()
664
665   Return :const:`True` if it is one of character device, block device or FIFO.
666
667
668.. _tarfile-commandline:
669.. program:: tarfile
670
671Command-Line Interface
672----------------------
673
674.. versionadded:: 3.4
675
676The :mod:`tarfile` module provides a simple command-line interface to interact
677with tar archives.
678
679If you want to create a new tar archive, specify its name after the :option:`-c`
680option and then list the filename(s) that should be included:
681
682.. code-block:: shell-session
683
684    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
685
686Passing a directory is also acceptable:
687
688.. code-block:: shell-session
689
690    $ python -m tarfile -c monty.tar life-of-brian_1979/
691
692If you want to extract a tar archive into the current directory, use
693the :option:`-e` option:
694
695.. code-block:: shell-session
696
697    $ python -m tarfile -e monty.tar
698
699You can also extract a tar archive into a different directory by passing the
700directory's name:
701
702.. code-block:: shell-session
703
704    $ python -m tarfile -e monty.tar  other-dir/
705
706For a list of the files in a tar archive, use the :option:`-l` option:
707
708.. code-block:: shell-session
709
710    $ python -m tarfile -l monty.tar
711
712
713Command-line options
714~~~~~~~~~~~~~~~~~~~~
715
716.. cmdoption:: -l <tarfile>
717               --list <tarfile>
718
719   List files in a tarfile.
720
721.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
722               --create <tarfile> <source1> ... <sourceN>
723
724   Create tarfile from source files.
725
726.. cmdoption:: -e <tarfile> [<output_dir>]
727               --extract <tarfile> [<output_dir>]
728
729   Extract tarfile into the current directory if *output_dir* is not specified.
730
731.. cmdoption:: -t <tarfile>
732               --test <tarfile>
733
734   Test whether the tarfile is valid or not.
735
736.. cmdoption:: -v, --verbose
737
738   Verbose output.
739
740.. _tar-examples:
741
742Examples
743--------
744
745How to extract an entire tar archive to the current working directory::
746
747   import tarfile
748   tar = tarfile.open("sample.tar.gz")
749   tar.extractall()
750   tar.close()
751
752How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
753a generator function instead of a list::
754
755   import os
756   import tarfile
757
758   def py_files(members):
759       for tarinfo in members:
760           if os.path.splitext(tarinfo.name)[1] == ".py":
761               yield tarinfo
762
763   tar = tarfile.open("sample.tar.gz")
764   tar.extractall(members=py_files(tar))
765   tar.close()
766
767How to create an uncompressed tar archive from a list of filenames::
768
769   import tarfile
770   tar = tarfile.open("sample.tar", "w")
771   for name in ["foo", "bar", "quux"]:
772       tar.add(name)
773   tar.close()
774
775The same example using the :keyword:`with` statement::
776
777    import tarfile
778    with tarfile.open("sample.tar", "w") as tar:
779        for name in ["foo", "bar", "quux"]:
780            tar.add(name)
781
782How to read a gzip compressed tar archive and display some member information::
783
784   import tarfile
785   tar = tarfile.open("sample.tar.gz", "r:gz")
786   for tarinfo in tar:
787       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="")
788       if tarinfo.isreg():
789           print("a regular file.")
790       elif tarinfo.isdir():
791           print("a directory.")
792       else:
793           print("something else.")
794   tar.close()
795
796How to create an archive and reset the user information using the *filter*
797parameter in :meth:`TarFile.add`::
798
799    import tarfile
800    def reset(tarinfo):
801        tarinfo.uid = tarinfo.gid = 0
802        tarinfo.uname = tarinfo.gname = "root"
803        return tarinfo
804    tar = tarfile.open("sample.tar.gz", "w:gz")
805    tar.add("foo", filter=reset)
806    tar.close()
807
808
809.. _tar-formats:
810
811Supported tar formats
812---------------------
813
814There are three tar formats that can be created with the :mod:`tarfile` module:
815
816* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
817  up to a length of at best 256 characters and linknames up to 100 characters.
818  The maximum file size is 8 GiB. This is an old and limited but widely
819  supported format.
820
821* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
822  linknames, files bigger than 8 GiB and sparse files. It is the de facto
823  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
824  extensions for long names, sparse file support is read-only.
825
826* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
827  format with virtually no limits. It supports long filenames and linknames, large
828  files and stores pathnames in a portable way. Modern tar implementations,
829  including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
830  features; some old or unmaintained libraries may not, but should treat
831  *pax* archives as if they were in the universally-supported *ustar* format.
832  It is the current default format for new archives.
833
834  It extends the existing *ustar* format with extra headers for information
835  that cannot be stored otherwise. There are two flavours of pax headers:
836  Extended headers only affect the subsequent file header, global
837  headers are valid for the complete archive and affect all following files.
838  All the data in a pax header is encoded in *UTF-8* for portability reasons.
839
840There are some more variants of the tar format which can be read, but not
841created:
842
843* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
844  storing only regular files and directories. Names must not be longer than 100
845  characters, there is no user/group name information. Some archives have
846  miscalculated header checksums in case of fields with non-ASCII characters.
847
848* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
849  pax format, but is not compatible.
850
851.. _tar-unicode:
852
853Unicode issues
854--------------
855
856The tar format was originally conceived to make backups on tape drives with the
857main focus on preserving file system information. Nowadays tar archives are
858commonly used for file distribution and exchanging archives over networks. One
859problem of the original format (which is the basis of all other formats) is
860that there is no concept of supporting different character encodings. For
861example, an ordinary tar archive created on a *UTF-8* system cannot be read
862correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
863metadata (like filenames, linknames, user/group names) will appear damaged.
864Unfortunately, there is no way to autodetect the encoding of an archive. The
865pax format was designed to solve this problem. It stores non-ASCII metadata
866using the universal character encoding *UTF-8*.
867
868The details of character conversion in :mod:`tarfile` are controlled by the
869*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
870
871*encoding* defines the character encoding to use for the metadata in the
872archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
873as a fallback. Depending on whether the archive is read or written, the
874metadata must be either decoded or encoded. If *encoding* is not set
875appropriately, this conversion may fail.
876
877The *errors* argument defines how characters are treated that cannot be
878converted. Possible values are listed in section :ref:`error-handlers`.
879The default scheme is ``'surrogateescape'`` which Python also uses for its
880file system calls, see :ref:`os-filenames`.
881
882For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
883because all the metadata is stored using *UTF-8*. *encoding* is only used in
884the rare cases when binary pax headers are decoded or when strings with
885surrogate characters are stored.
886