• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!lzma` --- Compression using the LZMA algorithm
2=====================================================
3
4.. module:: lzma
5   :synopsis: A Python wrapper for the liblzma compression library.
6
7.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
8.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
9
10.. versionadded:: 3.3
11
12**Source code:** :source:`Lib/lzma.py`
13
14--------------
15
16This module provides classes and convenience functions for compressing and
17decompressing data using the LZMA compression algorithm. Also included is a file
18interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the
19:program:`xz` utility, as well as raw compressed streams.
20
21The interface provided by this module is very similar to that of the :mod:`bz2`
22module. Note that :class:`LZMAFile` and :class:`bz2.BZ2File` are *not*
23thread-safe, so if you need to use a single :class:`LZMAFile` instance
24from multiple threads, it is necessary to protect it with a lock.
25
26
27.. exception:: LZMAError
28
29   This exception is raised when an error occurs during compression or
30   decompression, or while initializing the compressor/decompressor state.
31
32
33Reading and writing compressed files
34------------------------------------
35
36.. function:: open(filename, mode="rb", *, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None)
37
38   Open an LZMA-compressed file in binary or text mode, returning a :term:`file
39   object`.
40
41   The *filename* argument can be either an actual file name (given as a
42   :class:`str`, :class:`bytes` or :term:`path-like <path-like object>` object), in
43   which case the named file is opened, or it can be an existing file object
44   to read from or write to.
45
46   The *mode* argument can be any of ``"r"``, ``"rb"``, ``"w"``, ``"wb"``,
47   ``"x"``, ``"xb"``, ``"a"`` or ``"ab"`` for binary mode, or ``"rt"``,
48   ``"wt"``, ``"xt"``, or ``"at"`` for text mode. The default is ``"rb"``.
49
50   When opening a file for reading, the *format* and *filters* arguments have
51   the same meanings as for :class:`LZMADecompressor`. In this case, the *check*
52   and *preset* arguments should not be used.
53
54   When opening a file for writing, the *format*, *check*, *preset* and
55   *filters* arguments have the same meanings as for :class:`LZMACompressor`.
56
57   For binary mode, this function is equivalent to the :class:`LZMAFile`
58   constructor: ``LZMAFile(filename, mode, ...)``. In this case, the *encoding*,
59   *errors* and *newline* arguments must not be provided.
60
61   For text mode, a :class:`LZMAFile` object is created, and wrapped in an
62   :class:`io.TextIOWrapper` instance with the specified encoding, error
63   handling behavior, and line ending(s).
64
65   .. versionchanged:: 3.4
66      Added support for the ``"x"``, ``"xb"`` and ``"xt"`` modes.
67
68   .. versionchanged:: 3.6
69      Accepts a :term:`path-like object`.
70
71
72.. class:: LZMAFile(filename=None, mode="r", *, format=None, check=-1, preset=None, filters=None)
73
74   Open an LZMA-compressed file in binary mode.
75
76   An :class:`LZMAFile` can wrap an already-open :term:`file object`, or operate
77   directly on a named file. The *filename* argument specifies either the file
78   object to wrap, or the name of the file to open (as a :class:`str`,
79   :class:`bytes` or :term:`path-like <path-like object>` object). When wrapping an
80   existing file object, the wrapped file will not be closed when the
81   :class:`LZMAFile` is closed.
82
83   The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for
84   overwriting, ``"x"`` for exclusive creation, or ``"a"`` for appending. These
85   can equivalently be given as ``"rb"``, ``"wb"``, ``"xb"`` and ``"ab"``
86   respectively.
87
88   If *filename* is a file object (rather than an actual file name), a mode of
89   ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``.
90
91   When opening a file for reading, the input file may be the concatenation of
92   multiple separate compressed streams. These are transparently decoded as a
93   single logical stream.
94
95   When opening a file for reading, the *format* and *filters* arguments have
96   the same meanings as for :class:`LZMADecompressor`. In this case, the *check*
97   and *preset* arguments should not be used.
98
99   When opening a file for writing, the *format*, *check*, *preset* and
100   *filters* arguments have the same meanings as for :class:`LZMACompressor`.
101
102   :class:`LZMAFile` supports all the members specified by
103   :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach`
104   and :meth:`~io.IOBase.truncate`.
105   Iteration and the :keyword:`with` statement are supported.
106
107   The following method and attributes are also provided:
108
109   .. method:: peek(size=-1)
110
111      Return buffered data without advancing the file position. At least one
112      byte of data will be returned, unless EOF has been reached. The exact
113      number of bytes returned is unspecified (the *size* argument is ignored).
114
115      .. note:: While calling :meth:`peek` does not change the file position of
116         the :class:`LZMAFile`, it may change the position of the underlying
117         file object (e.g. if the :class:`LZMAFile` was constructed by passing a
118         file object for *filename*).
119
120   .. attribute:: mode
121
122      ``'rb'`` for reading and ``'wb'`` for writing.
123
124      .. versionadded:: 3.13
125
126   .. attribute:: name
127
128      The lzma file name.  Equivalent to the :attr:`~io.FileIO.name`
129      attribute of the underlying :term:`file object`.
130
131      .. versionadded:: 3.13
132
133
134   .. versionchanged:: 3.4
135      Added support for the ``"x"`` and ``"xb"`` modes.
136
137   .. versionchanged:: 3.5
138      The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
139      ``None``.
140
141   .. versionchanged:: 3.6
142      Accepts a :term:`path-like object`.
143
144
145Compressing and decompressing data in memory
146--------------------------------------------
147
148.. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None)
149
150   Create a compressor object, which can be used to compress data incrementally.
151
152   For a more convenient way of compressing a single chunk of data, see
153   :func:`compress`.
154
155   The *format* argument specifies what container format should be used.
156   Possible values are:
157
158   * :const:`FORMAT_XZ`: The ``.xz`` container format.
159      This is the default format.
160
161   * :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format.
162      This format is more limited than ``.xz`` -- it does not support integrity
163      checks or multiple filters.
164
165   * :const:`FORMAT_RAW`: A raw data stream, not using any container format.
166      This format specifier does not support integrity checks, and requires that
167      you always specify a custom filter chain (for both compression and
168      decompression). Additionally, data compressed in this manner cannot be
169      decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`).
170
171   The *check* argument specifies the type of integrity check to include in the
172   compressed data. This check is used when decompressing, to ensure that the
173   data has not been corrupted. Possible values are:
174
175   * :const:`CHECK_NONE`: No integrity check.
176     This is the default (and the only acceptable value) for
177     :const:`FORMAT_ALONE` and :const:`FORMAT_RAW`.
178
179   * :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check.
180
181   * :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check.
182     This is the default for :const:`FORMAT_XZ`.
183
184   * :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm.
185
186   If the specified check is not supported, an :class:`LZMAError` is raised.
187
188   The compression settings can be specified either as a preset compression
189   level (with the *preset* argument), or in detail as a custom filter chain
190   (with the *filters* argument).
191
192   The *preset* argument (if provided) should be an integer between ``0`` and
193   ``9`` (inclusive), optionally OR-ed with the constant
194   :const:`PRESET_EXTREME`. If neither *preset* nor *filters* are given, the
195   default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``).
196   Higher presets produce smaller output, but make the compression process
197   slower.
198
199   .. note::
200
201      In addition to being more CPU-intensive, compression with higher presets
202      also requires much more memory (and produces output that needs more memory
203      to decompress). With preset ``9`` for example, the overhead for an
204      :class:`LZMACompressor` object can be as high as 800 MiB. For this reason,
205      it is generally best to stick with the default preset.
206
207   The *filters* argument (if provided) should be a filter chain specifier.
208   See :ref:`filter-chain-specs` for details.
209
210   .. method:: compress(data)
211
212      Compress *data* (a :class:`bytes` object), returning a :class:`bytes`
213      object containing compressed data for at least part of the input. Some of
214      *data* may be buffered internally, for use in later calls to
215      :meth:`compress` and :meth:`flush`. The returned data should be
216      concatenated with the output of any previous calls to :meth:`compress`.
217
218   .. method:: flush()
219
220      Finish the compression process, returning a :class:`bytes` object
221      containing any data stored in the compressor's internal buffers.
222
223      The compressor cannot be used after this method has been called.
224
225
226.. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None)
227
228   Create a decompressor object, which can be used to decompress data
229   incrementally.
230
231   For a more convenient way of decompressing an entire compressed stream at
232   once, see :func:`decompress`.
233
234   The *format* argument specifies the container format that should be used. The
235   default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and
236   ``.lzma`` files. Other possible values are :const:`FORMAT_XZ`,
237   :const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`.
238
239   The *memlimit* argument specifies a limit (in bytes) on the amount of memory
240   that the decompressor can use. When this argument is used, decompression will
241   fail with an :class:`LZMAError` if it is not possible to decompress the input
242   within the given memory limit.
243
244   The *filters* argument specifies the filter chain that was used to create
245   the stream being decompressed. This argument is required if *format* is
246   :const:`FORMAT_RAW`, but should not be used for other formats.
247   See :ref:`filter-chain-specs` for more information about filter chains.
248
249   .. note::
250      This class does not transparently handle inputs containing multiple
251      compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To
252      decompress a multi-stream input with :class:`LZMADecompressor`, you must
253      create a new decompressor for each stream.
254
255   .. method:: decompress(data, max_length=-1)
256
257      Decompress *data* (a :term:`bytes-like object`), returning
258      uncompressed data as bytes. Some of *data* may be buffered
259      internally, for use in later calls to :meth:`decompress`. The
260      returned data should be concatenated with the output of any
261      previous calls to :meth:`decompress`.
262
263      If *max_length* is nonnegative, returns at most *max_length*
264      bytes of decompressed data. If this limit is reached and further
265      output can be produced, the :attr:`~.needs_input` attribute will
266      be set to ``False``. In this case, the next call to
267      :meth:`~.decompress` may provide *data* as ``b''`` to obtain
268      more of the output.
269
270      If all of the input data was decompressed and returned (either
271      because this was less than *max_length* bytes, or because
272      *max_length* was negative), the :attr:`~.needs_input` attribute
273      will be set to ``True``.
274
275      Attempting to decompress data after the end of stream is reached
276      raises an :exc:`EOFError`.  Any data found after the end of the
277      stream is ignored and saved in the :attr:`~.unused_data` attribute.
278
279      .. versionchanged:: 3.5
280         Added the *max_length* parameter.
281
282   .. attribute:: check
283
284      The ID of the integrity check used by the input stream. This may be
285      :const:`CHECK_UNKNOWN` until enough of the input has been decoded to
286      determine what integrity check it uses.
287
288   .. attribute:: eof
289
290      ``True`` if the end-of-stream marker has been reached.
291
292   .. attribute:: unused_data
293
294      Data found after the end of the compressed stream.
295
296      Before the end of the stream is reached, this will be ``b""``.
297
298   .. attribute:: needs_input
299
300      ``False`` if the :meth:`.decompress` method can provide more
301      decompressed data before requiring new uncompressed input.
302
303      .. versionadded:: 3.5
304
305.. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None)
306
307   Compress *data* (a :class:`bytes` object), returning the compressed data as a
308   :class:`bytes` object.
309
310   See :class:`LZMACompressor` above for a description of the *format*, *check*,
311   *preset* and *filters* arguments.
312
313
314.. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None)
315
316   Decompress *data* (a :class:`bytes` object), returning the uncompressed data
317   as a :class:`bytes` object.
318
319   If *data* is the concatenation of multiple distinct compressed streams,
320   decompress all of these streams, and return the concatenation of the results.
321
322   See :class:`LZMADecompressor` above for a description of the *format*,
323   *memlimit* and *filters* arguments.
324
325
326Miscellaneous
327-------------
328
329.. function:: is_check_supported(check)
330
331   Return ``True`` if the given integrity check is supported on this system.
332
333   :const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported.
334   :const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are
335   using a version of :program:`liblzma` that was compiled with a limited
336   feature set.
337
338
339.. _filter-chain-specs:
340
341Specifying custom filter chains
342-------------------------------
343
344A filter chain specifier is a sequence of dictionaries, where each dictionary
345contains the ID and options for a single filter. Each dictionary must contain
346the key ``"id"``, and may contain additional keys to specify filter-dependent
347options. Valid filter IDs are as follows:
348
349* Compression filters:
350
351  * :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`)
352  * :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`)
353
354* Delta filter:
355
356  * :const:`FILTER_DELTA`
357
358* Branch-Call-Jump (BCJ) filters:
359
360  * :const:`FILTER_X86`
361  * :const:`FILTER_IA64`
362  * :const:`FILTER_ARM`
363  * :const:`FILTER_ARMTHUMB`
364  * :const:`FILTER_POWERPC`
365  * :const:`FILTER_SPARC`
366
367A filter chain can consist of up to 4 filters, and cannot be empty. The last
368filter in the chain must be a compression filter, and any other filters must be
369delta or BCJ filters.
370
371Compression filters support the following options (specified as additional
372entries in the dictionary representing the filter):
373
374* ``preset``: A compression preset to use as a source of default values for
375  options that are not specified explicitly.
376* ``dict_size``: Dictionary size in bytes. This should be between 4 KiB and
377  1.5 GiB (inclusive).
378* ``lc``: Number of literal context bits.
379* ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at
380  most 4.
381* ``pb``: Number of position bits; must be at most 4.
382* ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`.
383* ``nice_len``: What should be considered a "nice length" for a match.
384  This should be 273 or less.
385* ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`,
386  :const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`.
387* ``depth``: Maximum search depth used by match finder. 0 (default) means to
388  select automatically based on other filter options.
389
390The delta filter stores the differences between bytes, producing more repetitive
391input for the compressor in certain circumstances. It supports one option,
392``dist``. This indicates the distance between bytes to be subtracted. The
393default is 1, i.e. take the differences between adjacent bytes.
394
395The BCJ filters are intended to be applied to machine code. They convert
396relative branches, calls and jumps in the code to use absolute addressing, with
397the aim of increasing the redundancy that can be exploited by the compressor.
398These filters support one option, ``start_offset``. This specifies the address
399that should be mapped to the beginning of the input data. The default is 0.
400
401
402Examples
403--------
404
405Reading in a compressed file::
406
407   import lzma
408   with lzma.open("file.xz") as f:
409       file_content = f.read()
410
411Creating a compressed file::
412
413   import lzma
414   data = b"Insert Data Here"
415   with lzma.open("file.xz", "w") as f:
416       f.write(data)
417
418Compressing data in memory::
419
420   import lzma
421   data_in = b"Insert Data Here"
422   data_out = lzma.compress(data_in)
423
424Incremental compression::
425
426   import lzma
427   lzc = lzma.LZMACompressor()
428   out1 = lzc.compress(b"Some data\n")
429   out2 = lzc.compress(b"Another piece of data\n")
430   out3 = lzc.compress(b"Even more data\n")
431   out4 = lzc.flush()
432   # Concatenate all the partial results:
433   result = b"".join([out1, out2, out3, out4])
434
435Writing compressed data to an already-open file::
436
437   import lzma
438   with open("file.xz", "wb") as f:
439       f.write(b"This data will not be compressed\n")
440       with lzma.open(f, "w") as lzf:
441           lzf.write(b"This *will* be compressed\n")
442       f.write(b"Not compressed\n")
443
444Creating a compressed file using a custom filter chain::
445
446   import lzma
447   my_filters = [
448       {"id": lzma.FILTER_DELTA, "dist": 5},
449       {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME},
450   ]
451   with lzma.open("file.xz", "w", filters=my_filters) as f:
452       f.write(b"blah blah blah")
453