• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39
40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
41
42   Return a :class:`TarFile` object for the pathname *name*. For detailed
43   information on :class:`TarFile` objects and the keyword arguments that are
44   allowed, see :ref:`tarfile-objects`.
45
46   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47   to ``'r'``. Here is a full list of mode combinations:
48
49   +------------------+---------------------------------------------+
50   | mode             | action                                      |
51   +==================+=============================================+
52   | ``'r' or 'r:*'`` | Open for reading with transparent           |
53   |                  | compression (recommended).                  |
54   +------------------+---------------------------------------------+
55   | ``'r:'``         | Open for reading exclusively without        |
56   |                  | compression.                                |
57   +------------------+---------------------------------------------+
58   | ``'r:gz'``       | Open for reading with gzip compression.     |
59   +------------------+---------------------------------------------+
60   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
61   +------------------+---------------------------------------------+
62   | ``'r:xz'``       | Open for reading with lzma compression.     |
63   +------------------+---------------------------------------------+
64   | ``'x'`` or       | Create a tarfile exclusively without        |
65   | ``'x:'``         | compression.                                |
66   |                  | Raise an :exc:`FileExistsError` exception   |
67   |                  | if it already exists.                       |
68   +------------------+---------------------------------------------+
69   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
70   |                  | Raise an :exc:`FileExistsError` exception   |
71   |                  | if it already exists.                       |
72   +------------------+---------------------------------------------+
73   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
74   |                  | Raise an :exc:`FileExistsError` exception   |
75   |                  | if it already exists.                       |
76   +------------------+---------------------------------------------+
77   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
78   |                  | Raise an :exc:`FileExistsError` exception   |
79   |                  | if it already exists.                       |
80   +------------------+---------------------------------------------+
81   | ``'a' or 'a:'``  | Open for appending with no compression. The |
82   |                  | file is created if it does not exist.       |
83   +------------------+---------------------------------------------+
84   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
85   +------------------+---------------------------------------------+
86   | ``'w:gz'``       | Open for gzip compressed writing.           |
87   +------------------+---------------------------------------------+
88   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
89   +------------------+---------------------------------------------+
90   | ``'w:xz'``       | Open for lzma compressed writing.           |
91   +------------------+---------------------------------------------+
92
93   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94   is not suitable to open a certain (compressed) file for reading,
95   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
96   compression method is not supported, :exc:`CompressionError` is raised.
97
98   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99   opened in binary mode for *name*. It is supposed to be at position 0.
100
101   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102   ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
103   *compresslevel* (default ``9``) to specify the compression level of the file.
104
105   For special purposes, there is a second format for *mode*:
106   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
107   object that processes its data as a stream of blocks.  No random seeking will
108   be done on the file. If given, *fileobj* may be any object that has a
109   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
111   in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
112   device. However, such a :class:`TarFile` object is limited in that it does
113   not allow random access, see :ref:`tar-examples`.  The currently
114   possible modes:
115
116   +-------------+--------------------------------------------+
117   | Mode        | Action                                     |
118   +=============+============================================+
119   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
120   |             | with transparent compression.              |
121   +-------------+--------------------------------------------+
122   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
123   |             | for reading.                               |
124   +-------------+--------------------------------------------+
125   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
126   |             | reading.                                   |
127   +-------------+--------------------------------------------+
128   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
129   |             | reading.                                   |
130   +-------------+--------------------------------------------+
131   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
132   |             | reading.                                   |
133   +-------------+--------------------------------------------+
134   | ``'w|'``    | Open an uncompressed *stream* for writing. |
135   +-------------+--------------------------------------------+
136   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
137   |             | writing.                                   |
138   +-------------+--------------------------------------------+
139   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
140   |             | writing.                                   |
141   +-------------+--------------------------------------------+
142   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
143   |             | writing.                                   |
144   +-------------+--------------------------------------------+
145
146   .. versionchanged:: 3.5
147      The ``'x'`` (exclusive creation) mode was added.
148
149   .. versionchanged:: 3.6
150      The *name* parameter accepts a :term:`path-like object`.
151
152
153.. class:: TarFile
154
155   Class for reading and writing tar archives. Do not use this class directly:
156   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
157
158
159.. function:: is_tarfile(name)
160
161   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
162   module can read.
163
164
165The :mod:`tarfile` module defines the following exceptions:
166
167
168.. exception:: TarError
169
170   Base class for all :mod:`tarfile` exceptions.
171
172
173.. exception:: ReadError
174
175   Is raised when a tar archive is opened, that either cannot be handled by the
176   :mod:`tarfile` module or is somehow invalid.
177
178
179.. exception:: CompressionError
180
181   Is raised when a compression method is not supported or when the data cannot be
182   decoded properly.
183
184
185.. exception:: StreamError
186
187   Is raised for the limitations that are typical for stream-like :class:`TarFile`
188   objects.
189
190
191.. exception:: ExtractError
192
193   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
194   :attr:`TarFile.errorlevel`\ ``== 2``.
195
196
197.. exception:: HeaderError
198
199   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
200
201
202The following constants are available at the module level:
203
204.. data:: ENCODING
205
206   The default character encoding: ``'utf-8'`` on Windows, the value returned by
207   :func:`sys.getfilesystemencoding` otherwise.
208
209
210Each of the following constants defines a tar archive format that the
211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
212details.
213
214
215.. data:: USTAR_FORMAT
216
217   POSIX.1-1988 (ustar) format.
218
219
220.. data:: GNU_FORMAT
221
222   GNU tar format.
223
224
225.. data:: PAX_FORMAT
226
227   POSIX.1-2001 (pax) format.
228
229
230.. data:: DEFAULT_FORMAT
231
232   The default format for creating archives. This is currently :const:`GNU_FORMAT`.
233
234
235.. seealso::
236
237   Module :mod:`zipfile`
238      Documentation of the :mod:`zipfile` standard module.
239
240   :ref:`archiving-operations`
241      Documentation of the higher-level archiving facilities provided by the
242      standard :mod:`shutil` module.
243
244   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
245      Documentation for tar archive files, including GNU tar extensions.
246
247
248.. _tarfile-objects:
249
250TarFile Objects
251---------------
252
253The :class:`TarFile` object provides an interface to a tar archive. A tar
254archive is a sequence of blocks. An archive member (a stored file) is made up of
255a header block followed by data blocks. It is possible to store a file in a tar
256archive several times. Each archive member is represented by a :class:`TarInfo`
257object, see :ref:`tarinfo-objects` for details.
258
259A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
260statement. It will automatically be closed when the block is completed. Please
261note that in the event of an exception an archive opened for writing will not
262be finalized; only the internally used file object will be closed. See the
263:ref:`tar-examples` section for a use case.
264
265.. versionadded:: 3.2
266   Added support for the context management protocol.
267
268.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
269
270   All following arguments are optional and can be accessed as instance attributes
271   as well.
272
273   *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
274   It can be omitted if *fileobj* is given.
275   In this case, the file object's :attr:`name` attribute is used if it exists.
276
277   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
278   data to an existing file, ``'w'`` to create a new file overwriting an existing
279   one, or ``'x'`` to create a new file only if it does not already exist.
280
281   If *fileobj* is given, it is used for reading or writing data. If it can be
282   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
283   from position 0.
284
285   .. note::
286
287      *fileobj* is not closed, when :class:`TarFile` is closed.
288
289   *format* controls the archive format. It must be one of the constants
290   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
291   defined at module level.
292
293   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
294   with a different one.
295
296   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
297   is :const:`True`, add the content of the target files to the archive. This has no
298   effect on systems that do not support symbolic links.
299
300   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
301   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
302   as possible. This is only useful for reading concatenated or damaged archives.
303
304   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
305   messages). The messages are written to ``sys.stderr``.
306
307   If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
308   Nevertheless, they appear as error messages in the debug output, when debugging
309   is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError`
310   exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
311   exceptions as well.
312
313   The *encoding* and *errors* arguments define the character encoding to be
314   used for reading or writing the archive and how conversion errors are going
315   to be handled. The default settings will work for most users.
316   See section :ref:`tar-unicode` for in-depth information.
317
318   The *pax_headers* argument is an optional dictionary of strings which
319   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
320
321   .. versionchanged:: 3.2
322      Use ``'surrogateescape'`` as the default for the *errors* argument.
323
324   .. versionchanged:: 3.5
325      The ``'x'`` (exclusive creation) mode was added.
326
327   .. versionchanged:: 3.6
328      The *name* parameter accepts a :term:`path-like object`.
329
330
331.. classmethod:: TarFile.open(...)
332
333   Alternative constructor. The :func:`tarfile.open` function is actually a
334   shortcut to this classmethod.
335
336
337.. method:: TarFile.getmember(name)
338
339   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
340   in the archive, :exc:`KeyError` is raised.
341
342   .. note::
343
344      If a member occurs more than once in the archive, its last occurrence is assumed
345      to be the most up-to-date version.
346
347
348.. method:: TarFile.getmembers()
349
350   Return the members of the archive as a list of :class:`TarInfo` objects. The
351   list has the same order as the members in the archive.
352
353
354.. method:: TarFile.getnames()
355
356   Return the members as a list of their names. It has the same order as the list
357   returned by :meth:`getmembers`.
358
359
360.. method:: TarFile.list(verbose=True, *, members=None)
361
362   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
363   only the names of the members are printed. If it is :const:`True`, output
364   similar to that of :program:`ls -l` is produced. If optional *members* is
365   given, it must be a subset of the list returned by :meth:`getmembers`.
366
367   .. versionchanged:: 3.5
368      Added the *members* parameter.
369
370
371.. method:: TarFile.next()
372
373   Return the next member of the archive as a :class:`TarInfo` object, when
374   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
375   available.
376
377
378.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
379
380   Extract all members from the archive to the current working directory or
381   directory *path*. If optional *members* is given, it must be a subset of the
382   list returned by :meth:`getmembers`. Directory information like owner,
383   modification time and permissions are set after all members have been extracted.
384   This is done to work around two problems: A directory's modification time is
385   reset each time a file is created in it. And, if a directory's permissions do
386   not allow writing, extracting files to it will fail.
387
388   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
389   are used to set the owner/group for the extracted files. Otherwise, the named
390   values from the tarfile are used.
391
392   .. warning::
393
394      Never extract archives from untrusted sources without prior inspection.
395      It is possible that files are created outside of *path*, e.g. members
396      that have absolute filenames starting with ``"/"`` or filenames with two
397      dots ``".."``.
398
399   .. versionchanged:: 3.5
400      Added the *numeric_owner* parameter.
401
402   .. versionchanged:: 3.6
403      The *path* parameter accepts a :term:`path-like object`.
404
405
406.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
407
408   Extract a member from the archive to the current working directory, using its
409   full name. Its file information is extracted as accurately as possible. *member*
410   may be a filename or a :class:`TarInfo` object. You can specify a different
411   directory using *path*. *path* may be a :term:`path-like object`.
412   File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
413
414   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
415   are used to set the owner/group for the extracted files. Otherwise, the named
416   values from the tarfile are used.
417
418   .. note::
419
420      The :meth:`extract` method does not take care of several extraction issues.
421      In most cases you should consider using the :meth:`extractall` method.
422
423   .. warning::
424
425      See the warning for :meth:`extractall`.
426
427   .. versionchanged:: 3.2
428      Added the *set_attrs* parameter.
429
430   .. versionchanged:: 3.5
431      Added the *numeric_owner* parameter.
432
433   .. versionchanged:: 3.6
434      The *path* parameter accepts a :term:`path-like object`.
435
436
437.. method:: TarFile.extractfile(member)
438
439   Extract a member from the archive as a file object. *member* may be a filename
440   or a :class:`TarInfo` object. If *member* is a regular file or a link, an
441   :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
442   returned.
443
444   .. versionchanged:: 3.3
445      Return an :class:`io.BufferedReader` object.
446
447
448.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
449
450   Add the file *name* to the archive. *name* may be any type of file
451   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
452   alternative name for the file in the archive. Directories are added
453   recursively by default. This can be avoided by setting *recursive* to
454   :const:`False`. Recursion adds entries in sorted order.
455   If *filter* is given, it
456   should be a function that takes a :class:`TarInfo` object argument and
457   returns the changed :class:`TarInfo` object. If it instead returns
458   :const:`None` the :class:`TarInfo` object will be excluded from the
459   archive. See :ref:`tar-examples` for an example.
460
461   .. versionchanged:: 3.2
462      Added the *filter* parameter.
463
464   .. versionchanged:: 3.7
465      Recursion adds entries in sorted order.
466
467
468.. method:: TarFile.addfile(tarinfo, fileobj=None)
469
470   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
471   it should be a :term:`binary file`, and
472   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
473   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
474
475
476.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
477
478   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
479   equivalent on an existing file.  The file is either named by *name*, or
480   specified as a :term:`file object` *fileobj* with a file descriptor.
481   *name* may be a :term:`path-like object`.  If
482   given, *arcname* specifies an alternative name for the file in the
483   archive, otherwise, the name is taken from *fileobj*’s
484   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
485   should be a text string.
486
487   You can modify
488   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
489   If the file object is not an ordinary file object positioned at the
490   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
491   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
492   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
493   could be a dummy string.
494
495   .. versionchanged:: 3.6
496      The *name* parameter accepts a :term:`path-like object`.
497
498
499.. method:: TarFile.close()
500
501   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
502   appended to the archive.
503
504
505.. attribute:: TarFile.pax_headers
506
507   A dictionary containing key-value pairs of pax global headers.
508
509
510
511.. _tarinfo-objects:
512
513TarInfo Objects
514---------------
515
516A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
517from storing all required attributes of a file (like file type, size, time,
518permissions, owner etc.), it provides some useful methods to determine its type.
519It does *not* contain the file's data itself.
520
521:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
522:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
523
524
525.. class:: TarInfo(name="")
526
527   Create a :class:`TarInfo` object.
528
529
530.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
531
532   Create and return a :class:`TarInfo` object from string buffer *buf*.
533
534   Raises :exc:`HeaderError` if the buffer is invalid.
535
536
537.. classmethod:: TarInfo.fromtarfile(tarfile)
538
539   Read the next member from the :class:`TarFile` object *tarfile* and return it as
540   a :class:`TarInfo` object.
541
542
543.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
544
545   Create a string buffer from a :class:`TarInfo` object. For information on the
546   arguments see the constructor of the :class:`TarFile` class.
547
548   .. versionchanged:: 3.2
549      Use ``'surrogateescape'`` as the default for the *errors* argument.
550
551
552A ``TarInfo`` object has the following public data attributes:
553
554
555.. attribute:: TarInfo.name
556
557   Name of the archive member.
558
559
560.. attribute:: TarInfo.size
561
562   Size in bytes.
563
564
565.. attribute:: TarInfo.mtime
566
567   Time of last modification.
568
569
570.. attribute:: TarInfo.mode
571
572   Permission bits.
573
574
575.. attribute:: TarInfo.type
576
577   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
578   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
579   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
580   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
581   more conveniently, use the ``is*()`` methods below.
582
583
584.. attribute:: TarInfo.linkname
585
586   Name of the target file name, which is only present in :class:`TarInfo` objects
587   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
588
589
590.. attribute:: TarInfo.uid
591
592   User ID of the user who originally stored this member.
593
594
595.. attribute:: TarInfo.gid
596
597   Group ID of the user who originally stored this member.
598
599
600.. attribute:: TarInfo.uname
601
602   User name.
603
604
605.. attribute:: TarInfo.gname
606
607   Group name.
608
609
610.. attribute:: TarInfo.pax_headers
611
612   A dictionary containing key-value pairs of an associated pax extended header.
613
614
615A :class:`TarInfo` object also provides some convenient query methods:
616
617
618.. method:: TarInfo.isfile()
619
620   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
621
622
623.. method:: TarInfo.isreg()
624
625   Same as :meth:`isfile`.
626
627
628.. method:: TarInfo.isdir()
629
630   Return :const:`True` if it is a directory.
631
632
633.. method:: TarInfo.issym()
634
635   Return :const:`True` if it is a symbolic link.
636
637
638.. method:: TarInfo.islnk()
639
640   Return :const:`True` if it is a hard link.
641
642
643.. method:: TarInfo.ischr()
644
645   Return :const:`True` if it is a character device.
646
647
648.. method:: TarInfo.isblk()
649
650   Return :const:`True` if it is a block device.
651
652
653.. method:: TarInfo.isfifo()
654
655   Return :const:`True` if it is a FIFO.
656
657
658.. method:: TarInfo.isdev()
659
660   Return :const:`True` if it is one of character device, block device or FIFO.
661
662
663.. _tarfile-commandline:
664.. program:: tarfile
665
666Command-Line Interface
667----------------------
668
669.. versionadded:: 3.4
670
671The :mod:`tarfile` module provides a simple command-line interface to interact
672with tar archives.
673
674If you want to create a new tar archive, specify its name after the :option:`-c`
675option and then list the filename(s) that should be included:
676
677.. code-block:: shell-session
678
679    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
680
681Passing a directory is also acceptable:
682
683.. code-block:: shell-session
684
685    $ python -m tarfile -c monty.tar life-of-brian_1979/
686
687If you want to extract a tar archive into the current directory, use
688the :option:`-e` option:
689
690.. code-block:: shell-session
691
692    $ python -m tarfile -e monty.tar
693
694You can also extract a tar archive into a different directory by passing the
695directory's name:
696
697.. code-block:: shell-session
698
699    $ python -m tarfile -e monty.tar  other-dir/
700
701For a list of the files in a tar archive, use the :option:`-l` option:
702
703.. code-block:: shell-session
704
705    $ python -m tarfile -l monty.tar
706
707
708Command-line options
709~~~~~~~~~~~~~~~~~~~~
710
711.. cmdoption:: -l <tarfile>
712               --list <tarfile>
713
714   List files in a tarfile.
715
716.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
717               --create <tarfile> <source1> ... <sourceN>
718
719   Create tarfile from source files.
720
721.. cmdoption:: -e <tarfile> [<output_dir>]
722               --extract <tarfile> [<output_dir>]
723
724   Extract tarfile into the current directory if *output_dir* is not specified.
725
726.. cmdoption:: -t <tarfile>
727               --test <tarfile>
728
729   Test whether the tarfile is valid or not.
730
731.. cmdoption:: -v, --verbose
732
733   Verbose output.
734
735.. _tar-examples:
736
737Examples
738--------
739
740How to extract an entire tar archive to the current working directory::
741
742   import tarfile
743   tar = tarfile.open("sample.tar.gz")
744   tar.extractall()
745   tar.close()
746
747How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
748a generator function instead of a list::
749
750   import os
751   import tarfile
752
753   def py_files(members):
754       for tarinfo in members:
755           if os.path.splitext(tarinfo.name)[1] == ".py":
756               yield tarinfo
757
758   tar = tarfile.open("sample.tar.gz")
759   tar.extractall(members=py_files(tar))
760   tar.close()
761
762How to create an uncompressed tar archive from a list of filenames::
763
764   import tarfile
765   tar = tarfile.open("sample.tar", "w")
766   for name in ["foo", "bar", "quux"]:
767       tar.add(name)
768   tar.close()
769
770The same example using the :keyword:`with` statement::
771
772    import tarfile
773    with tarfile.open("sample.tar", "w") as tar:
774        for name in ["foo", "bar", "quux"]:
775            tar.add(name)
776
777How to read a gzip compressed tar archive and display some member information::
778
779   import tarfile
780   tar = tarfile.open("sample.tar.gz", "r:gz")
781   for tarinfo in tar:
782       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
783       if tarinfo.isreg():
784           print("a regular file.")
785       elif tarinfo.isdir():
786           print("a directory.")
787       else:
788           print("something else.")
789   tar.close()
790
791How to create an archive and reset the user information using the *filter*
792parameter in :meth:`TarFile.add`::
793
794    import tarfile
795    def reset(tarinfo):
796        tarinfo.uid = tarinfo.gid = 0
797        tarinfo.uname = tarinfo.gname = "root"
798        return tarinfo
799    tar = tarfile.open("sample.tar.gz", "w:gz")
800    tar.add("foo", filter=reset)
801    tar.close()
802
803
804.. _tar-formats:
805
806Supported tar formats
807---------------------
808
809There are three tar formats that can be created with the :mod:`tarfile` module:
810
811* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
812  up to a length of at best 256 characters and linknames up to 100 characters. The
813  maximum file size is 8 GiB. This is an old and limited but widely
814  supported format.
815
816* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
817  linknames, files bigger than 8 GiB and sparse files. It is the de facto
818  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
819  extensions for long names, sparse file support is read-only.
820
821* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
822  format with virtually no limits. It supports long filenames and linknames, large
823  files and stores pathnames in a portable way. However, not all tar
824  implementations today are able to handle pax archives properly.
825
826  The *pax* format is an extension to the existing *ustar* format. It uses extra
827  headers for information that cannot be stored otherwise. There are two flavours
828  of pax headers: Extended headers only affect the subsequent file header, global
829  headers are valid for the complete archive and affect all following files. All
830  the data in a pax header is encoded in *UTF-8* for portability reasons.
831
832There are some more variants of the tar format which can be read, but not
833created:
834
835* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
836  storing only regular files and directories. Names must not be longer than 100
837  characters, there is no user/group name information. Some archives have
838  miscalculated header checksums in case of fields with non-ASCII characters.
839
840* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
841  pax format, but is not compatible.
842
843.. _tar-unicode:
844
845Unicode issues
846--------------
847
848The tar format was originally conceived to make backups on tape drives with the
849main focus on preserving file system information. Nowadays tar archives are
850commonly used for file distribution and exchanging archives over networks. One
851problem of the original format (which is the basis of all other formats) is
852that there is no concept of supporting different character encodings. For
853example, an ordinary tar archive created on a *UTF-8* system cannot be read
854correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
855metadata (like filenames, linknames, user/group names) will appear damaged.
856Unfortunately, there is no way to autodetect the encoding of an archive. The
857pax format was designed to solve this problem. It stores non-ASCII metadata
858using the universal character encoding *UTF-8*.
859
860The details of character conversion in :mod:`tarfile` are controlled by the
861*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
862
863*encoding* defines the character encoding to use for the metadata in the
864archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
865as a fallback. Depending on whether the archive is read or written, the
866metadata must be either decoded or encoded. If *encoding* is not set
867appropriately, this conversion may fail.
868
869The *errors* argument defines how characters are treated that cannot be
870converted. Possible values are listed in section :ref:`error-handlers`.
871The default scheme is ``'surrogateescape'`` which Python also uses for its
872file system calls, see :ref:`os-filenames`.
873
874In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
875because all the metadata is stored using *UTF-8*. *encoding* is only used in
876the rare cases when binary pax headers are decoded or when strings with
877surrogate characters are stored.
878