• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39
40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
41
42   Return a :class:`TarFile` object for the pathname *name*. For detailed
43   information on :class:`TarFile` objects and the keyword arguments that are
44   allowed, see :ref:`tarfile-objects`.
45
46   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47   to ``'r'``. Here is a full list of mode combinations:
48
49   +------------------+---------------------------------------------+
50   | mode             | action                                      |
51   +==================+=============================================+
52   | ``'r' or 'r:*'`` | Open for reading with transparent           |
53   |                  | compression (recommended).                  |
54   +------------------+---------------------------------------------+
55   | ``'r:'``         | Open for reading exclusively without        |
56   |                  | compression.                                |
57   +------------------+---------------------------------------------+
58   | ``'r:gz'``       | Open for reading with gzip compression.     |
59   +------------------+---------------------------------------------+
60   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
61   +------------------+---------------------------------------------+
62   | ``'r:xz'``       | Open for reading with lzma compression.     |
63   +------------------+---------------------------------------------+
64   | ``'x'`` or       | Create a tarfile exclusively without        |
65   | ``'x:'``         | compression.                                |
66   |                  | Raise an :exc:`FileExistsError` exception   |
67   |                  | if it already exists.                       |
68   +------------------+---------------------------------------------+
69   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
70   |                  | Raise an :exc:`FileExistsError` exception   |
71   |                  | if it already exists.                       |
72   +------------------+---------------------------------------------+
73   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
74   |                  | Raise an :exc:`FileExistsError` exception   |
75   |                  | if it already exists.                       |
76   +------------------+---------------------------------------------+
77   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
78   |                  | Raise an :exc:`FileExistsError` exception   |
79   |                  | if it already exists.                       |
80   +------------------+---------------------------------------------+
81   | ``'a' or 'a:'``  | Open for appending with no compression. The |
82   |                  | file is created if it does not exist.       |
83   +------------------+---------------------------------------------+
84   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
85   +------------------+---------------------------------------------+
86   | ``'w:gz'``       | Open for gzip compressed writing.           |
87   +------------------+---------------------------------------------+
88   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
89   +------------------+---------------------------------------------+
90   | ``'w:xz'``       | Open for lzma compressed writing.           |
91   +------------------+---------------------------------------------+
92
93   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94   is not suitable to open a certain (compressed) file for reading,
95   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
96   compression method is not supported, :exc:`CompressionError` is raised.
97
98   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99   opened in binary mode for *name*. It is supposed to be at position 0.
100
101   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102   ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
103   *compresslevel* (default ``9``) to specify the compression level of the file.
104
105   For special purposes, there is a second format for *mode*:
106   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
107   object that processes its data as a stream of blocks.  No random seeking will
108   be done on the file. If given, *fileobj* may be any object that has a
109   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
111   in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
112   device. However, such a :class:`TarFile` object is limited in that it does
113   not allow random access, see :ref:`tar-examples`.  The currently
114   possible modes:
115
116   +-------------+--------------------------------------------+
117   | Mode        | Action                                     |
118   +=============+============================================+
119   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
120   |             | with transparent compression.              |
121   +-------------+--------------------------------------------+
122   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
123   |             | for reading.                               |
124   +-------------+--------------------------------------------+
125   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
126   |             | reading.                                   |
127   +-------------+--------------------------------------------+
128   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
129   |             | reading.                                   |
130   +-------------+--------------------------------------------+
131   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
132   |             | reading.                                   |
133   +-------------+--------------------------------------------+
134   | ``'w|'``    | Open an uncompressed *stream* for writing. |
135   +-------------+--------------------------------------------+
136   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
137   |             | writing.                                   |
138   +-------------+--------------------------------------------+
139   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
140   |             | writing.                                   |
141   +-------------+--------------------------------------------+
142   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
143   |             | writing.                                   |
144   +-------------+--------------------------------------------+
145
146   .. versionchanged:: 3.5
147      The ``'x'`` (exclusive creation) mode was added.
148
149.. class:: TarFile
150
151   Class for reading and writing tar archives. Do not use this class directly:
152   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
153
154
155.. function:: is_tarfile(name)
156
157   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
158   module can read.
159
160
161The :mod:`tarfile` module defines the following exceptions:
162
163
164.. exception:: TarError
165
166   Base class for all :mod:`tarfile` exceptions.
167
168
169.. exception:: ReadError
170
171   Is raised when a tar archive is opened, that either cannot be handled by the
172   :mod:`tarfile` module or is somehow invalid.
173
174
175.. exception:: CompressionError
176
177   Is raised when a compression method is not supported or when the data cannot be
178   decoded properly.
179
180
181.. exception:: StreamError
182
183   Is raised for the limitations that are typical for stream-like :class:`TarFile`
184   objects.
185
186
187.. exception:: ExtractError
188
189   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
190   :attr:`TarFile.errorlevel`\ ``== 2``.
191
192
193.. exception:: HeaderError
194
195   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
196
197
198The following constants are available at the module level:
199
200.. data:: ENCODING
201
202   The default character encoding: ``'utf-8'`` on Windows, the value returned by
203   :func:`sys.getfilesystemencoding` otherwise.
204
205
206Each of the following constants defines a tar archive format that the
207:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
208details.
209
210
211.. data:: USTAR_FORMAT
212
213   POSIX.1-1988 (ustar) format.
214
215
216.. data:: GNU_FORMAT
217
218   GNU tar format.
219
220
221.. data:: PAX_FORMAT
222
223   POSIX.1-2001 (pax) format.
224
225
226.. data:: DEFAULT_FORMAT
227
228   The default format for creating archives. This is currently :const:`GNU_FORMAT`.
229
230
231.. seealso::
232
233   Module :mod:`zipfile`
234      Documentation of the :mod:`zipfile` standard module.
235
236   :ref:`archiving-operations`
237      Documentation of the higher-level archiving facilities provided by the
238      standard :mod:`shutil` module.
239
240   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
241      Documentation for tar archive files, including GNU tar extensions.
242
243
244.. _tarfile-objects:
245
246TarFile Objects
247---------------
248
249The :class:`TarFile` object provides an interface to a tar archive. A tar
250archive is a sequence of blocks. An archive member (a stored file) is made up of
251a header block followed by data blocks. It is possible to store a file in a tar
252archive several times. Each archive member is represented by a :class:`TarInfo`
253object, see :ref:`tarinfo-objects` for details.
254
255A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
256statement. It will automatically be closed when the block is completed. Please
257note that in the event of an exception an archive opened for writing will not
258be finalized; only the internally used file object will be closed. See the
259:ref:`tar-examples` section for a use case.
260
261.. versionadded:: 3.2
262   Added support for the context management protocol.
263
264.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
265
266   All following arguments are optional and can be accessed as instance attributes
267   as well.
268
269   *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
270   In this case, the file object's :attr:`name` attribute is used if it exists.
271
272   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
273   data to an existing file, ``'w'`` to create a new file overwriting an existing
274   one, or ``'x'`` to create a new file only if it does not already exist.
275
276   If *fileobj* is given, it is used for reading or writing data. If it can be
277   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
278   from position 0.
279
280   .. note::
281
282      *fileobj* is not closed, when :class:`TarFile` is closed.
283
284   *format* controls the archive format. It must be one of the constants
285   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
286   defined at module level.
287
288   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
289   with a different one.
290
291   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
292   is :const:`True`, add the content of the target files to the archive. This has no
293   effect on systems that do not support symbolic links.
294
295   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
296   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
297   as possible. This is only useful for reading concatenated or damaged archives.
298
299   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
300   messages). The messages are written to ``sys.stderr``.
301
302   If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
303   Nevertheless, they appear as error messages in the debug output, when debugging
304   is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError`
305   exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
306   exceptions as well.
307
308   The *encoding* and *errors* arguments define the character encoding to be
309   used for reading or writing the archive and how conversion errors are going
310   to be handled. The default settings will work for most users.
311   See section :ref:`tar-unicode` for in-depth information.
312
313   The *pax_headers* argument is an optional dictionary of strings which
314   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
315
316   .. versionchanged:: 3.2
317      Use ``'surrogateescape'`` as the default for the *errors* argument.
318
319   .. versionchanged:: 3.5
320      The ``'x'`` (exclusive creation) mode was added.
321
322.. classmethod:: TarFile.open(...)
323
324   Alternative constructor. The :func:`tarfile.open` function is actually a
325   shortcut to this classmethod.
326
327
328.. method:: TarFile.getmember(name)
329
330   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
331   in the archive, :exc:`KeyError` is raised.
332
333   .. note::
334
335      If a member occurs more than once in the archive, its last occurrence is assumed
336      to be the most up-to-date version.
337
338
339.. method:: TarFile.getmembers()
340
341   Return the members of the archive as a list of :class:`TarInfo` objects. The
342   list has the same order as the members in the archive.
343
344
345.. method:: TarFile.getnames()
346
347   Return the members as a list of their names. It has the same order as the list
348   returned by :meth:`getmembers`.
349
350
351.. method:: TarFile.list(verbose=True, *, members=None)
352
353   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
354   only the names of the members are printed. If it is :const:`True`, output
355   similar to that of :program:`ls -l` is produced. If optional *members* is
356   given, it must be a subset of the list returned by :meth:`getmembers`.
357
358   .. versionchanged:: 3.5
359      Added the *members* parameter.
360
361
362.. method:: TarFile.next()
363
364   Return the next member of the archive as a :class:`TarInfo` object, when
365   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
366   available.
367
368
369.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
370
371   Extract all members from the archive to the current working directory or
372   directory *path*. If optional *members* is given, it must be a subset of the
373   list returned by :meth:`getmembers`. Directory information like owner,
374   modification time and permissions are set after all members have been extracted.
375   This is done to work around two problems: A directory's modification time is
376   reset each time a file is created in it. And, if a directory's permissions do
377   not allow writing, extracting files to it will fail.
378
379   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
380   are used to set the owner/group for the extracted files. Otherwise, the named
381   values from the tarfile are used.
382
383   .. warning::
384
385      Never extract archives from untrusted sources without prior inspection.
386      It is possible that files are created outside of *path*, e.g. members
387      that have absolute filenames starting with ``"/"`` or filenames with two
388      dots ``".."``.
389
390   .. versionchanged:: 3.5
391      Added the *numeric_owner* parameter.
392
393
394.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
395
396   Extract a member from the archive to the current working directory, using its
397   full name. Its file information is extracted as accurately as possible. *member*
398   may be a filename or a :class:`TarInfo` object. You can specify a different
399   directory using *path*. File attributes (owner, mtime, mode) are set unless
400   *set_attrs* is false.
401
402   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
403   are used to set the owner/group for the extracted files. Otherwise, the named
404   values from the tarfile are used.
405
406   .. note::
407
408      The :meth:`extract` method does not take care of several extraction issues.
409      In most cases you should consider using the :meth:`extractall` method.
410
411   .. warning::
412
413      See the warning for :meth:`extractall`.
414
415   .. versionchanged:: 3.2
416      Added the *set_attrs* parameter.
417
418   .. versionchanged:: 3.5
419      Added the *numeric_owner* parameter.
420
421.. method:: TarFile.extractfile(member)
422
423   Extract a member from the archive as a file object. *member* may be a filename
424   or a :class:`TarInfo` object. If *member* is a regular file or a link, an
425   :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
426   returned.
427
428   .. versionchanged:: 3.3
429      Return an :class:`io.BufferedReader` object.
430
431
432.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
433
434   Add the file *name* to the archive. *name* may be any type of file
435   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
436   alternative name for the file in the archive. Directories are added
437   recursively by default. This can be avoided by setting *recursive* to
438   :const:`False`. If *exclude* is given, it must be a function that takes one
439   filename argument and returns a boolean value. Depending on this value the
440   respective file is either excluded (:const:`True`) or added
441   (:const:`False`). If *filter* is specified it must be a keyword argument.  It
442   should be a function that takes a :class:`TarInfo` object argument and
443   returns the changed :class:`TarInfo` object. If it instead returns
444   :const:`None` the :class:`TarInfo` object will be excluded from the
445   archive. See :ref:`tar-examples` for an example.
446
447   .. versionchanged:: 3.2
448      Added the *filter* parameter.
449
450   .. deprecated:: 3.2
451      The *exclude* parameter is deprecated, please use the *filter* parameter
452      instead.
453
454
455.. method:: TarFile.addfile(tarinfo, fileobj=None)
456
457   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
458   it should be a :term:`binary file`, and
459   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
460   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
461
462
463.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
464
465   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
466   equivalent on an existing file.  The file is either named by *name*, or
467   specified as a :term:`file object` *fileobj* with a file descriptor.  If
468   given, *arcname* specifies an alternative name for the file in the
469   archive, otherwise, the name is taken from *fileobj*’s
470   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
471   should be a text string.
472
473   You can modify
474   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
475   If the file object is not an ordinary file object positioned at the
476   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
477   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
478   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
479   could be a dummy string.
480
481
482.. method:: TarFile.close()
483
484   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
485   appended to the archive.
486
487
488.. attribute:: TarFile.pax_headers
489
490   A dictionary containing key-value pairs of pax global headers.
491
492
493
494.. _tarinfo-objects:
495
496TarInfo Objects
497---------------
498
499A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
500from storing all required attributes of a file (like file type, size, time,
501permissions, owner etc.), it provides some useful methods to determine its type.
502It does *not* contain the file's data itself.
503
504:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
505:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
506
507
508.. class:: TarInfo(name="")
509
510   Create a :class:`TarInfo` object.
511
512
513.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
514
515   Create and return a :class:`TarInfo` object from string buffer *buf*.
516
517   Raises :exc:`HeaderError` if the buffer is invalid.
518
519
520.. classmethod:: TarInfo.fromtarfile(tarfile)
521
522   Read the next member from the :class:`TarFile` object *tarfile* and return it as
523   a :class:`TarInfo` object.
524
525
526.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
527
528   Create a string buffer from a :class:`TarInfo` object. For information on the
529   arguments see the constructor of the :class:`TarFile` class.
530
531   .. versionchanged:: 3.2
532      Use ``'surrogateescape'`` as the default for the *errors* argument.
533
534
535A ``TarInfo`` object has the following public data attributes:
536
537
538.. attribute:: TarInfo.name
539
540   Name of the archive member.
541
542
543.. attribute:: TarInfo.size
544
545   Size in bytes.
546
547
548.. attribute:: TarInfo.mtime
549
550   Time of last modification.
551
552
553.. attribute:: TarInfo.mode
554
555   Permission bits.
556
557
558.. attribute:: TarInfo.type
559
560   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
561   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
562   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
563   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
564   more conveniently, use the ``is*()`` methods below.
565
566
567.. attribute:: TarInfo.linkname
568
569   Name of the target file name, which is only present in :class:`TarInfo` objects
570   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
571
572
573.. attribute:: TarInfo.uid
574
575   User ID of the user who originally stored this member.
576
577
578.. attribute:: TarInfo.gid
579
580   Group ID of the user who originally stored this member.
581
582
583.. attribute:: TarInfo.uname
584
585   User name.
586
587
588.. attribute:: TarInfo.gname
589
590   Group name.
591
592
593.. attribute:: TarInfo.pax_headers
594
595   A dictionary containing key-value pairs of an associated pax extended header.
596
597
598A :class:`TarInfo` object also provides some convenient query methods:
599
600
601.. method:: TarInfo.isfile()
602
603   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
604
605
606.. method:: TarInfo.isreg()
607
608   Same as :meth:`isfile`.
609
610
611.. method:: TarInfo.isdir()
612
613   Return :const:`True` if it is a directory.
614
615
616.. method:: TarInfo.issym()
617
618   Return :const:`True` if it is a symbolic link.
619
620
621.. method:: TarInfo.islnk()
622
623   Return :const:`True` if it is a hard link.
624
625
626.. method:: TarInfo.ischr()
627
628   Return :const:`True` if it is a character device.
629
630
631.. method:: TarInfo.isblk()
632
633   Return :const:`True` if it is a block device.
634
635
636.. method:: TarInfo.isfifo()
637
638   Return :const:`True` if it is a FIFO.
639
640
641.. method:: TarInfo.isdev()
642
643   Return :const:`True` if it is one of character device, block device or FIFO.
644
645
646.. _tarfile-commandline:
647.. program:: tarfile
648
649Command-Line Interface
650----------------------
651
652.. versionadded:: 3.4
653
654The :mod:`tarfile` module provides a simple command-line interface to interact
655with tar archives.
656
657If you want to create a new tar archive, specify its name after the :option:`-c`
658option and then list the filename(s) that should be included:
659
660.. code-block:: shell-session
661
662    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
663
664Passing a directory is also acceptable:
665
666.. code-block:: shell-session
667
668    $ python -m tarfile -c monty.tar life-of-brian_1979/
669
670If you want to extract a tar archive into the current directory, use
671the :option:`-e` option:
672
673.. code-block:: shell-session
674
675    $ python -m tarfile -e monty.tar
676
677You can also extract a tar archive into a different directory by passing the
678directory's name:
679
680.. code-block:: shell-session
681
682    $ python -m tarfile -e monty.tar  other-dir/
683
684For a list of the files in a tar archive, use the :option:`-l` option:
685
686.. code-block:: shell-session
687
688    $ python -m tarfile -l monty.tar
689
690
691Command-line options
692~~~~~~~~~~~~~~~~~~~~
693
694.. cmdoption:: -l <tarfile>
695               --list <tarfile>
696
697   List files in a tarfile.
698
699.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
700               --create <tarfile> <source1> ... <sourceN>
701
702   Create tarfile from source files.
703
704.. cmdoption:: -e <tarfile> [<output_dir>]
705               --extract <tarfile> [<output_dir>]
706
707   Extract tarfile into the current directory if *output_dir* is not specified.
708
709.. cmdoption:: -t <tarfile>
710               --test <tarfile>
711
712   Test whether the tarfile is valid or not.
713
714.. cmdoption:: -v, --verbose
715
716   Verbose output.
717
718.. _tar-examples:
719
720Examples
721--------
722
723How to extract an entire tar archive to the current working directory::
724
725   import tarfile
726   tar = tarfile.open("sample.tar.gz")
727   tar.extractall()
728   tar.close()
729
730How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
731a generator function instead of a list::
732
733   import os
734   import tarfile
735
736   def py_files(members):
737       for tarinfo in members:
738           if os.path.splitext(tarinfo.name)[1] == ".py":
739               yield tarinfo
740
741   tar = tarfile.open("sample.tar.gz")
742   tar.extractall(members=py_files(tar))
743   tar.close()
744
745How to create an uncompressed tar archive from a list of filenames::
746
747   import tarfile
748   tar = tarfile.open("sample.tar", "w")
749   for name in ["foo", "bar", "quux"]:
750       tar.add(name)
751   tar.close()
752
753The same example using the :keyword:`with` statement::
754
755    import tarfile
756    with tarfile.open("sample.tar", "w") as tar:
757        for name in ["foo", "bar", "quux"]:
758            tar.add(name)
759
760How to read a gzip compressed tar archive and display some member information::
761
762   import tarfile
763   tar = tarfile.open("sample.tar.gz", "r:gz")
764   for tarinfo in tar:
765       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
766       if tarinfo.isreg():
767           print("a regular file.")
768       elif tarinfo.isdir():
769           print("a directory.")
770       else:
771           print("something else.")
772   tar.close()
773
774How to create an archive and reset the user information using the *filter*
775parameter in :meth:`TarFile.add`::
776
777    import tarfile
778    def reset(tarinfo):
779        tarinfo.uid = tarinfo.gid = 0
780        tarinfo.uname = tarinfo.gname = "root"
781        return tarinfo
782    tar = tarfile.open("sample.tar.gz", "w:gz")
783    tar.add("foo", filter=reset)
784    tar.close()
785
786
787.. _tar-formats:
788
789Supported tar formats
790---------------------
791
792There are three tar formats that can be created with the :mod:`tarfile` module:
793
794* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
795  up to a length of at best 256 characters and linknames up to 100 characters. The
796  maximum file size is 8 GiB. This is an old and limited but widely
797  supported format.
798
799* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
800  linknames, files bigger than 8 GiB and sparse files. It is the de facto
801  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
802  extensions for long names, sparse file support is read-only.
803
804* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
805  format with virtually no limits. It supports long filenames and linknames, large
806  files and stores pathnames in a portable way. However, not all tar
807  implementations today are able to handle pax archives properly.
808
809  The *pax* format is an extension to the existing *ustar* format. It uses extra
810  headers for information that cannot be stored otherwise. There are two flavours
811  of pax headers: Extended headers only affect the subsequent file header, global
812  headers are valid for the complete archive and affect all following files. All
813  the data in a pax header is encoded in *UTF-8* for portability reasons.
814
815There are some more variants of the tar format which can be read, but not
816created:
817
818* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
819  storing only regular files and directories. Names must not be longer than 100
820  characters, there is no user/group name information. Some archives have
821  miscalculated header checksums in case of fields with non-ASCII characters.
822
823* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
824  pax format, but is not compatible.
825
826.. _tar-unicode:
827
828Unicode issues
829--------------
830
831The tar format was originally conceived to make backups on tape drives with the
832main focus on preserving file system information. Nowadays tar archives are
833commonly used for file distribution and exchanging archives over networks. One
834problem of the original format (which is the basis of all other formats) is
835that there is no concept of supporting different character encodings. For
836example, an ordinary tar archive created on a *UTF-8* system cannot be read
837correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
838metadata (like filenames, linknames, user/group names) will appear damaged.
839Unfortunately, there is no way to autodetect the encoding of an archive. The
840pax format was designed to solve this problem. It stores non-ASCII metadata
841using the universal character encoding *UTF-8*.
842
843The details of character conversion in :mod:`tarfile` are controlled by the
844*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
845
846*encoding* defines the character encoding to use for the metadata in the
847archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
848as a fallback. Depending on whether the archive is read or written, the
849metadata must be either decoded or encoded. If *encoding* is not set
850appropriately, this conversion may fail.
851
852The *errors* argument defines how characters are treated that cannot be
853converted. Possible values are listed in section :ref:`error-handlers`.
854The default scheme is ``'surrogateescape'`` which Python also uses for its
855file system calls, see :ref:`os-filenames`.
856
857In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
858because all the metadata is stored using *UTF-8*. *encoding* is only used in
859the rare cases when binary pax headers are decoded or when strings with
860surrogate characters are stored.
861