• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
13**Source code:** :source:`Lib/tarfile.py`
14
15--------------
16
17The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
19Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
20higher-level functions in :ref:`shutil <archiving-operations>`.
21
22Some facts and figures:
23
24* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives
25  if the respective modules are available.
26
27* read/write support for the POSIX.1-1988 (ustar) format.
28
29* read/write support for the GNU tar format including *longname* and *longlink*
30  extensions, read-only support for the *sparse* extension.
31
32* read/write support for the POSIX.1-2001 (pax) format.
33
34  .. versionadded:: 2.6
35
36* handles directories, regular files, hardlinks, symbolic links, fifos,
37  character devices and block devices and is able to acquire and restore file
38  information like timestamp, access permissions and owner.
39
40
41.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
42
43   Return a :class:`TarFile` object for the pathname *name*. For detailed
44   information on :class:`TarFile` objects and the keyword arguments that are
45   allowed, see :ref:`tarfile-objects`.
46
47   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48   to ``'r'``. Here is a full list of mode combinations:
49
50   +------------------+---------------------------------------------+
51   | mode             | action                                      |
52   +==================+=============================================+
53   | ``'r' or 'r:*'`` | Open for reading with transparent           |
54   |                  | compression (recommended).                  |
55   +------------------+---------------------------------------------+
56   | ``'r:'``         | Open for reading exclusively without        |
57   |                  | compression.                                |
58   +------------------+---------------------------------------------+
59   | ``'r:gz'``       | Open for reading with gzip compression.     |
60   +------------------+---------------------------------------------+
61   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
62   +------------------+---------------------------------------------+
63   | ``'a' or 'a:'``  | Open for appending with no compression. The |
64   |                  | file is created if it does not exist.       |
65   +------------------+---------------------------------------------+
66   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
67   +------------------+---------------------------------------------+
68   | ``'w:gz'``       | Open for gzip compressed writing.           |
69   +------------------+---------------------------------------------+
70   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
71   +------------------+---------------------------------------------+
72
73   Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
74   to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
75   *mode* ``'r'`` to avoid this.  If a compression method is not supported,
76   :exc:`CompressionError` is raised.
77
78   If *fileobj* is specified, it is used as an alternative to a file object opened
79   for *name*. It is supposed to be at position 0.
80
81   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
82   accepts the keyword argument *compresslevel* (default ``9``) to
83   specify the compression level of the file.
84
85   For special purposes, there is a second format for *mode*:
86   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
87   object that processes its data as a stream of blocks.  No random seeking will
88   be done on the file. If given, *fileobj* may be any object that has a
89   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
90   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
91   in combination with e.g. ``sys.stdin``, a socket file object or a tape
92   device. However, such a :class:`TarFile` object is limited in that it does
93   not allow random access, see :ref:`tar-examples`.  The currently
94   possible modes:
95
96   +-------------+--------------------------------------------+
97   | Mode        | Action                                     |
98   +=============+============================================+
99   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
100   |             | with transparent compression.              |
101   +-------------+--------------------------------------------+
102   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
103   |             | for reading.                               |
104   +-------------+--------------------------------------------+
105   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
106   |             | reading.                                   |
107   +-------------+--------------------------------------------+
108   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
109   |             | reading.                                   |
110   +-------------+--------------------------------------------+
111   | ``'w|'``    | Open an uncompressed *stream* for writing. |
112   +-------------+--------------------------------------------+
113   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
114   |             | writing.                                   |
115   +-------------+--------------------------------------------+
116   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
117   |             | writing.                                   |
118   +-------------+--------------------------------------------+
119
120
121.. class:: TarFile
122
123   Class for reading and writing tar archives. Do not use this class directly,
124   better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
125
126
127.. function:: is_tarfile(name)
128
129   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
130   module can read.
131
132
133.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
134
135   Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
136   Please consult the documentation of the :mod:`zipfile` module for more details.
137   *compression* must be one of the following constants:
138
139
140   .. data:: TAR_PLAIN
141
142      Constant for an uncompressed tar archive.
143
144
145   .. data:: TAR_GZIPPED
146
147      Constant for a :mod:`gzip` compressed tar archive.
148
149
150   .. deprecated:: 2.6
151      The :class:`TarFileCompat` class has been removed in Python 3.
152
153
154.. exception:: TarError
155
156   Base class for all :mod:`tarfile` exceptions.
157
158
159.. exception:: ReadError
160
161   Is raised when a tar archive is opened, that either cannot be handled by the
162   :mod:`tarfile` module or is somehow invalid.
163
164
165.. exception:: CompressionError
166
167   Is raised when a compression method is not supported or when the data cannot be
168   decoded properly.
169
170
171.. exception:: StreamError
172
173   Is raised for the limitations that are typical for stream-like :class:`TarFile`
174   objects.
175
176
177.. exception:: ExtractError
178
179   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
180   :attr:`TarFile.errorlevel`\ ``== 2``.
181
182
183The following constants are available at the module level:
184
185.. data:: ENCODING
186
187   The default character encoding: ``'utf-8'`` on Windows, the value returned by
188   :func:`sys.getfilesystemencoding` otherwise.
189
190
191.. exception:: HeaderError
192
193   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
194
195   .. versionadded:: 2.6
196
197
198Each of the following constants defines a tar archive format that the
199:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
200details.
201
202
203.. data:: USTAR_FORMAT
204
205   POSIX.1-1988 (ustar) format.
206
207
208.. data:: GNU_FORMAT
209
210   GNU tar format.
211
212
213.. data:: PAX_FORMAT
214
215   POSIX.1-2001 (pax) format.
216
217
218.. data:: DEFAULT_FORMAT
219
220   The default format for creating archives. This is currently :const:`GNU_FORMAT`.
221
222
223.. seealso::
224
225   Module :mod:`zipfile`
226      Documentation of the :mod:`zipfile` standard module.
227
228   :ref:`archiving-operations`
229      Documentation of the higher-level archiving facilities provided by the
230      standard :mod:`shutil` module.
231
232   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
233      Documentation for tar archive files, including GNU tar extensions.
234
235
236.. _tarfile-objects:
237
238TarFile Objects
239---------------
240
241The :class:`TarFile` object provides an interface to a tar archive. A tar
242archive is a sequence of blocks. An archive member (a stored file) is made up of
243a header block followed by data blocks. It is possible to store a file in a tar
244archive several times. Each archive member is represented by a :class:`TarInfo`
245object, see :ref:`tarinfo-objects` for details.
246
247A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
248statement. It will automatically be closed when the block is completed. Please
249note that in the event of an exception an archive opened for writing will not
250be finalized; only the internally used file object will be closed. See the
251:ref:`tar-examples` section for a use case.
252
253.. versionadded:: 2.7
254   Added support for the context management protocol.
255
256.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
257
258   All following arguments are optional and can be accessed as instance attributes
259   as well.
260
261   *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
262   In this case, the file object's :attr:`name` attribute is used if it exists.
263
264   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
265   data to an existing file or ``'w'`` to create a new file overwriting an existing
266   one.
267
268   If *fileobj* is given, it is used for reading or writing data. If it can be
269   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
270   from position 0.
271
272   .. note::
273
274      *fileobj* is not closed, when :class:`TarFile` is closed.
275
276   *format* controls the archive format. It must be one of the constants
277   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
278   defined at module level.
279
280   .. versionadded:: 2.6
281
282   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
283   with a different one.
284
285   .. versionadded:: 2.6
286
287   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
288   is :const:`True`, add the content of the target files to the archive. This has no
289   effect on systems that do not support symbolic links.
290
291   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
292   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
293   as possible. This is only useful for reading concatenated or damaged archives.
294
295   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
296   messages). The messages are written to ``sys.stderr``.
297
298   If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
299   Nevertheless, they appear as error messages in the debug output, when debugging
300   is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError` or
301   :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
302   :exc:`TarError` exceptions as well.
303
304   The *encoding* and *errors* arguments control the way strings are converted to
305   unicode objects and vice versa. The default settings will work for most users.
306   See section :ref:`tar-unicode` for in-depth information.
307
308   .. versionadded:: 2.6
309
310   The *pax_headers* argument is an optional dictionary of unicode strings which
311   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
312
313   .. versionadded:: 2.6
314
315
316.. classmethod:: TarFile.open(...)
317
318   Alternative constructor. The :func:`tarfile.open` function is actually a
319   shortcut to this classmethod.
320
321
322.. method:: TarFile.getmember(name)
323
324   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
325   in the archive, :exc:`KeyError` is raised.
326
327   .. note::
328
329      If a member occurs more than once in the archive, its last occurrence is assumed
330      to be the most up-to-date version.
331
332
333.. method:: TarFile.getmembers()
334
335   Return the members of the archive as a list of :class:`TarInfo` objects. The
336   list has the same order as the members in the archive.
337
338
339.. method:: TarFile.getnames()
340
341   Return the members as a list of their names. It has the same order as the list
342   returned by :meth:`getmembers`.
343
344
345.. method:: TarFile.list(verbose=True)
346
347   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
348   only the names of the members are printed. If it is :const:`True`, output
349   similar to that of :program:`ls -l` is produced.
350
351
352.. method:: TarFile.next()
353
354   Return the next member of the archive as a :class:`TarInfo` object, when
355   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
356   available.
357
358
359.. method:: TarFile.extractall(path=".", members=None)
360
361   Extract all members from the archive to the current working directory or
362   directory *path*. If optional *members* is given, it must be a subset of the
363   list returned by :meth:`getmembers`. Directory information like owner,
364   modification time and permissions are set after all members have been extracted.
365   This is done to work around two problems: A directory's modification time is
366   reset each time a file is created in it. And, if a directory's permissions do
367   not allow writing, extracting files to it will fail.
368
369   .. warning::
370
371      Never extract archives from untrusted sources without prior inspection.
372      It is possible that files are created outside of *path*, e.g. members
373      that have absolute filenames starting with ``"/"`` or filenames with two
374      dots ``".."``.
375
376   .. versionadded:: 2.5
377
378
379.. method:: TarFile.extract(member, path="")
380
381   Extract a member from the archive to the current working directory, using its
382   full name. Its file information is extracted as accurately as possible. *member*
383   may be a filename or a :class:`TarInfo` object. You can specify a different
384   directory using *path*.
385
386   .. note::
387
388      The :meth:`extract` method does not take care of several extraction issues.
389      In most cases you should consider using the :meth:`extractall` method.
390
391   .. warning::
392
393      See the warning for :meth:`extractall`.
394
395
396.. method:: TarFile.extractfile(member)
397
398   Extract a member from the archive as a file object. *member* may be a filename
399   or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
400   is returned. If *member* is a link, a file-like object is constructed from the
401   link's target. If *member* is none of the above, :const:`None` is returned.
402
403   .. note::
404
405      The file-like object is read-only.  It provides the methods
406      :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
407      and :meth:`close`, and also supports iteration over its lines.
408
409
410.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
411
412   Add the file *name* to the archive. *name* may be any type of file (directory,
413   fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
414   for the file in the archive. Directories are added recursively by default. This
415   can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
416   it must be a function that takes one filename argument and returns a boolean
417   value. Depending on this value the respective file is either excluded
418   (:const:`True`) or added (:const:`False`). If *filter* is specified it must
419   be a function that takes a :class:`TarInfo` object argument and returns the
420   changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
421   object will be excluded from the archive. See :ref:`tar-examples` for an
422   example.
423
424   .. versionchanged:: 2.6
425      Added the *exclude* parameter.
426
427   .. versionchanged:: 2.7
428      Added the *filter* parameter.
429
430   .. deprecated:: 2.7
431      The *exclude* parameter is deprecated, please use the *filter* parameter
432      instead.  For maximum portability, *filter* should be used as a keyword
433      argument rather than as a positional argument so that code won't be
434      affected when *exclude* is ultimately removed.
435
436
437.. method:: TarFile.addfile(tarinfo, fileobj=None)
438
439   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
440   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
441   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
442
443   .. note::
444      On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
445      avoid irritation about the file size.
446
447
448.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
449
450   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
451   equivalent on an existing file.  The file is either named by *name*, or
452   specified as a file object *fileobj* with a file descriptor.  If
453   given, *arcname* specifies an alternative name for the file in the
454   archive, otherwise, the name is taken from *fileobj*’s
455   :attr:`~file.name` attribute, or the *name* argument.
456
457   You can modify some
458   of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
459   If the file object is not an ordinary file object positioned at the
460   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
461   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
462   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
463   could be a dummy string.
464
465
466.. method:: TarFile.close()
467
468   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
469   appended to the archive.
470
471
472.. attribute:: TarFile.posix
473
474   Setting this to :const:`True` is equivalent to setting the :attr:`format`
475   attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
476   :const:`GNU_FORMAT`.
477
478   .. versionchanged:: 2.4
479      *posix* defaults to :const:`False`.
480
481   .. deprecated:: 2.6
482      Use the :attr:`format` attribute instead.
483
484
485.. attribute:: TarFile.pax_headers
486
487   A dictionary containing key-value pairs of pax global headers.
488
489   .. versionadded:: 2.6
490
491
492.. _tarinfo-objects:
493
494TarInfo Objects
495---------------
496
497A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
498from storing all required attributes of a file (like file type, size, time,
499permissions, owner etc.), it provides some useful methods to determine its type.
500It does *not* contain the file's data itself.
501
502:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
503:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
504
505
506.. class:: TarInfo(name="")
507
508   Create a :class:`TarInfo` object.
509
510
511.. method:: TarInfo.frombuf(buf)
512
513   Create and return a :class:`TarInfo` object from string buffer *buf*.
514
515   .. versionadded:: 2.6
516      Raises :exc:`HeaderError` if the buffer is invalid..
517
518
519.. method:: TarInfo.fromtarfile(tarfile)
520
521   Read the next member from the :class:`TarFile` object *tarfile* and return it as
522   a :class:`TarInfo` object.
523
524   .. versionadded:: 2.6
525
526
527.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
528
529   Create a string buffer from a :class:`TarInfo` object. For information on the
530   arguments see the constructor of the :class:`TarFile` class.
531
532   .. versionchanged:: 2.6
533      The arguments were added.
534
535A ``TarInfo`` object has the following public data attributes:
536
537
538.. attribute:: TarInfo.name
539
540   Name of the archive member.
541
542
543.. attribute:: TarInfo.size
544
545   Size in bytes.
546
547
548.. attribute:: TarInfo.mtime
549
550   Time of last modification.
551
552
553.. attribute:: TarInfo.mode
554
555   Permission bits.
556
557
558.. attribute:: TarInfo.type
559
560   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
561   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
562   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
563   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
564   more conveniently, use the ``is*()`` methods below.
565
566
567.. attribute:: TarInfo.linkname
568
569   Name of the target file name, which is only present in :class:`TarInfo` objects
570   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
571
572
573.. attribute:: TarInfo.uid
574
575   User ID of the user who originally stored this member.
576
577
578.. attribute:: TarInfo.gid
579
580   Group ID of the user who originally stored this member.
581
582
583.. attribute:: TarInfo.uname
584
585   User name.
586
587
588.. attribute:: TarInfo.gname
589
590   Group name.
591
592
593.. attribute:: TarInfo.pax_headers
594
595   A dictionary containing key-value pairs of an associated pax extended header.
596
597   .. versionadded:: 2.6
598
599A :class:`TarInfo` object also provides some convenient query methods:
600
601
602.. method:: TarInfo.isfile()
603
604   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
605
606
607.. method:: TarInfo.isreg()
608
609   Same as :meth:`isfile`.
610
611
612.. method:: TarInfo.isdir()
613
614   Return :const:`True` if it is a directory.
615
616
617.. method:: TarInfo.issym()
618
619   Return :const:`True` if it is a symbolic link.
620
621
622.. method:: TarInfo.islnk()
623
624   Return :const:`True` if it is a hard link.
625
626
627.. method:: TarInfo.ischr()
628
629   Return :const:`True` if it is a character device.
630
631
632.. method:: TarInfo.isblk()
633
634   Return :const:`True` if it is a block device.
635
636
637.. method:: TarInfo.isfifo()
638
639   Return :const:`True` if it is a FIFO.
640
641
642.. method:: TarInfo.isdev()
643
644   Return :const:`True` if it is one of character device, block device or FIFO.
645
646
647.. _tar-examples:
648
649Examples
650--------
651
652How to extract an entire tar archive to the current working directory::
653
654   import tarfile
655   tar = tarfile.open("sample.tar.gz")
656   tar.extractall()
657   tar.close()
658
659How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
660a generator function instead of a list::
661
662   import os
663   import tarfile
664
665   def py_files(members):
666       for tarinfo in members:
667           if os.path.splitext(tarinfo.name)[1] == ".py":
668               yield tarinfo
669
670   tar = tarfile.open("sample.tar.gz")
671   tar.extractall(members=py_files(tar))
672   tar.close()
673
674How to create an uncompressed tar archive from a list of filenames::
675
676   import tarfile
677   tar = tarfile.open("sample.tar", "w")
678   for name in ["foo", "bar", "quux"]:
679       tar.add(name)
680   tar.close()
681
682The same example using the :keyword:`with` statement::
683
684    import tarfile
685    with tarfile.open("sample.tar", "w") as tar:
686        for name in ["foo", "bar", "quux"]:
687            tar.add(name)
688
689How to read a gzip compressed tar archive and display some member information::
690
691   import tarfile
692   tar = tarfile.open("sample.tar.gz", "r:gz")
693   for tarinfo in tar:
694       print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
695       if tarinfo.isreg():
696           print "a regular file."
697       elif tarinfo.isdir():
698           print "a directory."
699       else:
700           print "something else."
701   tar.close()
702
703How to create an archive and reset the user information using the *filter*
704parameter in :meth:`TarFile.add`::
705
706    import tarfile
707    def reset(tarinfo):
708        tarinfo.uid = tarinfo.gid = 0
709        tarinfo.uname = tarinfo.gname = "root"
710        return tarinfo
711    tar = tarfile.open("sample.tar.gz", "w:gz")
712    tar.add("foo", filter=reset)
713    tar.close()
714
715
716.. _tar-formats:
717
718Supported tar formats
719---------------------
720
721There are three tar formats that can be created with the :mod:`tarfile` module:
722
723* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
724  up to a length of at best 256 characters and linknames up to 100 characters. The
725  maximum file size is 8 gigabytes. This is an old and limited but widely
726  supported format.
727
728* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
729  linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
730  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
731  extensions for long names, sparse file support is read-only.
732
733* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
734  format with virtually no limits. It supports long filenames and linknames, large
735  files and stores pathnames in a portable way. However, not all tar
736  implementations today are able to handle pax archives properly.
737
738  The *pax* format is an extension to the existing *ustar* format. It uses extra
739  headers for information that cannot be stored otherwise. There are two flavours
740  of pax headers: Extended headers only affect the subsequent file header, global
741  headers are valid for the complete archive and affect all following files. All
742  the data in a pax header is encoded in *UTF-8* for portability reasons.
743
744There are some more variants of the tar format which can be read, but not
745created:
746
747* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
748  storing only regular files and directories. Names must not be longer than 100
749  characters, there is no user/group name information. Some archives have
750  miscalculated header checksums in case of fields with non-ASCII characters.
751
752* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
753  pax format, but is not compatible.
754
755.. _tar-unicode:
756
757Unicode issues
758--------------
759
760The tar format was originally conceived to make backups on tape drives with the
761main focus on preserving file system information. Nowadays tar archives are
762commonly used for file distribution and exchanging archives over networks. One
763problem of the original format (that all other formats are merely variants of)
764is that there is no concept of supporting different character encodings. For
765example, an ordinary tar archive created on a *UTF-8* system cannot be read
766correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
767filenames, linknames, user/group names) containing these characters will appear
768damaged.  Unfortunately, there is no way to autodetect the encoding of an
769archive.
770
771The pax format was designed to solve this problem. It stores non-ASCII names
772using the universal character encoding *UTF-8*. When a pax archive is read,
773these *UTF-8* names are converted to the encoding of the local file system.
774
775The details of unicode conversion are controlled by the *encoding* and *errors*
776keyword arguments of the :class:`TarFile` class.
777
778The default value for *encoding* is the local character encoding. It is deduced
779from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
780read mode, *encoding* is used exclusively to convert unicode names from a pax
781archive to strings in the local character encoding. In write mode, the use of
782*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
783input names that contain non-ASCII characters need to be decoded before being
784stored as *UTF-8* strings. The other formats do not make use of *encoding*
785unless unicode objects are used as input names. These are converted to 8-bit
786character strings before they are added to the archive.
787
788The *errors* argument defines how characters are treated that cannot be
789converted to or from *encoding*. Possible values are listed in section
790:ref:`codec-base-classes`. In read mode, there is an additional scheme
791``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
792representation. This is the default scheme. In write mode the default value for
793*errors* is ``'strict'`` to ensure that name information is not altered
794unnoticed.
795
796