1:mod:`!lzma` --- Compression using the LZMA algorithm 2===================================================== 3 4.. module:: lzma 5 :synopsis: A Python wrapper for the liblzma compression library. 6 7.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 8.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 9 10.. versionadded:: 3.3 11 12**Source code:** :source:`Lib/lzma.py` 13 14-------------- 15 16This module provides classes and convenience functions for compressing and 17decompressing data using the LZMA compression algorithm. Also included is a file 18interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the 19:program:`xz` utility, as well as raw compressed streams. 20 21The interface provided by this module is very similar to that of the :mod:`bz2` 22module. Note that :class:`LZMAFile` and :class:`bz2.BZ2File` are *not* 23thread-safe, so if you need to use a single :class:`LZMAFile` instance 24from multiple threads, it is necessary to protect it with a lock. 25 26 27.. exception:: LZMAError 28 29 This exception is raised when an error occurs during compression or 30 decompression, or while initializing the compressor/decompressor state. 31 32 33Reading and writing compressed files 34------------------------------------ 35 36.. function:: open(filename, mode="rb", *, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None) 37 38 Open an LZMA-compressed file in binary or text mode, returning a :term:`file 39 object`. 40 41 The *filename* argument can be either an actual file name (given as a 42 :class:`str`, :class:`bytes` or :term:`path-like <path-like object>` object), in 43 which case the named file is opened, or it can be an existing file object 44 to read from or write to. 45 46 The *mode* argument can be any of ``"r"``, ``"rb"``, ``"w"``, ``"wb"``, 47 ``"x"``, ``"xb"``, ``"a"`` or ``"ab"`` for binary mode, or ``"rt"``, 48 ``"wt"``, ``"xt"``, or ``"at"`` for text mode. The default is ``"rb"``. 49 50 When opening a file for reading, the *format* and *filters* arguments have 51 the same meanings as for :class:`LZMADecompressor`. In this case, the *check* 52 and *preset* arguments should not be used. 53 54 When opening a file for writing, the *format*, *check*, *preset* and 55 *filters* arguments have the same meanings as for :class:`LZMACompressor`. 56 57 For binary mode, this function is equivalent to the :class:`LZMAFile` 58 constructor: ``LZMAFile(filename, mode, ...)``. In this case, the *encoding*, 59 *errors* and *newline* arguments must not be provided. 60 61 For text mode, a :class:`LZMAFile` object is created, and wrapped in an 62 :class:`io.TextIOWrapper` instance with the specified encoding, error 63 handling behavior, and line ending(s). 64 65 .. versionchanged:: 3.4 66 Added support for the ``"x"``, ``"xb"`` and ``"xt"`` modes. 67 68 .. versionchanged:: 3.6 69 Accepts a :term:`path-like object`. 70 71 72.. class:: LZMAFile(filename=None, mode="r", *, format=None, check=-1, preset=None, filters=None) 73 74 Open an LZMA-compressed file in binary mode. 75 76 An :class:`LZMAFile` can wrap an already-open :term:`file object`, or operate 77 directly on a named file. The *filename* argument specifies either the file 78 object to wrap, or the name of the file to open (as a :class:`str`, 79 :class:`bytes` or :term:`path-like <path-like object>` object). When wrapping an 80 existing file object, the wrapped file will not be closed when the 81 :class:`LZMAFile` is closed. 82 83 The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for 84 overwriting, ``"x"`` for exclusive creation, or ``"a"`` for appending. These 85 can equivalently be given as ``"rb"``, ``"wb"``, ``"xb"`` and ``"ab"`` 86 respectively. 87 88 If *filename* is a file object (rather than an actual file name), a mode of 89 ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``. 90 91 When opening a file for reading, the input file may be the concatenation of 92 multiple separate compressed streams. These are transparently decoded as a 93 single logical stream. 94 95 When opening a file for reading, the *format* and *filters* arguments have 96 the same meanings as for :class:`LZMADecompressor`. In this case, the *check* 97 and *preset* arguments should not be used. 98 99 When opening a file for writing, the *format*, *check*, *preset* and 100 *filters* arguments have the same meanings as for :class:`LZMACompressor`. 101 102 :class:`LZMAFile` supports all the members specified by 103 :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach` 104 and :meth:`~io.IOBase.truncate`. 105 Iteration and the :keyword:`with` statement are supported. 106 107 The following method and attributes are also provided: 108 109 .. method:: peek(size=-1) 110 111 Return buffered data without advancing the file position. At least one 112 byte of data will be returned, unless EOF has been reached. The exact 113 number of bytes returned is unspecified (the *size* argument is ignored). 114 115 .. note:: While calling :meth:`peek` does not change the file position of 116 the :class:`LZMAFile`, it may change the position of the underlying 117 file object (e.g. if the :class:`LZMAFile` was constructed by passing a 118 file object for *filename*). 119 120 .. attribute:: mode 121 122 ``'rb'`` for reading and ``'wb'`` for writing. 123 124 .. versionadded:: 3.13 125 126 .. attribute:: name 127 128 The lzma file name. Equivalent to the :attr:`~io.FileIO.name` 129 attribute of the underlying :term:`file object`. 130 131 .. versionadded:: 3.13 132 133 134 .. versionchanged:: 3.4 135 Added support for the ``"x"`` and ``"xb"`` modes. 136 137 .. versionchanged:: 3.5 138 The :meth:`~io.BufferedIOBase.read` method now accepts an argument of 139 ``None``. 140 141 .. versionchanged:: 3.6 142 Accepts a :term:`path-like object`. 143 144 145Compressing and decompressing data in memory 146-------------------------------------------- 147 148.. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None) 149 150 Create a compressor object, which can be used to compress data incrementally. 151 152 For a more convenient way of compressing a single chunk of data, see 153 :func:`compress`. 154 155 The *format* argument specifies what container format should be used. 156 Possible values are: 157 158 * :const:`FORMAT_XZ`: The ``.xz`` container format. 159 This is the default format. 160 161 * :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format. 162 This format is more limited than ``.xz`` -- it does not support integrity 163 checks or multiple filters. 164 165 * :const:`FORMAT_RAW`: A raw data stream, not using any container format. 166 This format specifier does not support integrity checks, and requires that 167 you always specify a custom filter chain (for both compression and 168 decompression). Additionally, data compressed in this manner cannot be 169 decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`). 170 171 The *check* argument specifies the type of integrity check to include in the 172 compressed data. This check is used when decompressing, to ensure that the 173 data has not been corrupted. Possible values are: 174 175 * :const:`CHECK_NONE`: No integrity check. 176 This is the default (and the only acceptable value) for 177 :const:`FORMAT_ALONE` and :const:`FORMAT_RAW`. 178 179 * :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check. 180 181 * :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check. 182 This is the default for :const:`FORMAT_XZ`. 183 184 * :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm. 185 186 If the specified check is not supported, an :class:`LZMAError` is raised. 187 188 The compression settings can be specified either as a preset compression 189 level (with the *preset* argument), or in detail as a custom filter chain 190 (with the *filters* argument). 191 192 The *preset* argument (if provided) should be an integer between ``0`` and 193 ``9`` (inclusive), optionally OR-ed with the constant 194 :const:`PRESET_EXTREME`. If neither *preset* nor *filters* are given, the 195 default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``). 196 Higher presets produce smaller output, but make the compression process 197 slower. 198 199 .. note:: 200 201 In addition to being more CPU-intensive, compression with higher presets 202 also requires much more memory (and produces output that needs more memory 203 to decompress). With preset ``9`` for example, the overhead for an 204 :class:`LZMACompressor` object can be as high as 800 MiB. For this reason, 205 it is generally best to stick with the default preset. 206 207 The *filters* argument (if provided) should be a filter chain specifier. 208 See :ref:`filter-chain-specs` for details. 209 210 .. method:: compress(data) 211 212 Compress *data* (a :class:`bytes` object), returning a :class:`bytes` 213 object containing compressed data for at least part of the input. Some of 214 *data* may be buffered internally, for use in later calls to 215 :meth:`compress` and :meth:`flush`. The returned data should be 216 concatenated with the output of any previous calls to :meth:`compress`. 217 218 .. method:: flush() 219 220 Finish the compression process, returning a :class:`bytes` object 221 containing any data stored in the compressor's internal buffers. 222 223 The compressor cannot be used after this method has been called. 224 225 226.. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None) 227 228 Create a decompressor object, which can be used to decompress data 229 incrementally. 230 231 For a more convenient way of decompressing an entire compressed stream at 232 once, see :func:`decompress`. 233 234 The *format* argument specifies the container format that should be used. The 235 default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and 236 ``.lzma`` files. Other possible values are :const:`FORMAT_XZ`, 237 :const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`. 238 239 The *memlimit* argument specifies a limit (in bytes) on the amount of memory 240 that the decompressor can use. When this argument is used, decompression will 241 fail with an :class:`LZMAError` if it is not possible to decompress the input 242 within the given memory limit. 243 244 The *filters* argument specifies the filter chain that was used to create 245 the stream being decompressed. This argument is required if *format* is 246 :const:`FORMAT_RAW`, but should not be used for other formats. 247 See :ref:`filter-chain-specs` for more information about filter chains. 248 249 .. note:: 250 This class does not transparently handle inputs containing multiple 251 compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To 252 decompress a multi-stream input with :class:`LZMADecompressor`, you must 253 create a new decompressor for each stream. 254 255 .. method:: decompress(data, max_length=-1) 256 257 Decompress *data* (a :term:`bytes-like object`), returning 258 uncompressed data as bytes. Some of *data* may be buffered 259 internally, for use in later calls to :meth:`decompress`. The 260 returned data should be concatenated with the output of any 261 previous calls to :meth:`decompress`. 262 263 If *max_length* is nonnegative, returns at most *max_length* 264 bytes of decompressed data. If this limit is reached and further 265 output can be produced, the :attr:`~.needs_input` attribute will 266 be set to ``False``. In this case, the next call to 267 :meth:`~.decompress` may provide *data* as ``b''`` to obtain 268 more of the output. 269 270 If all of the input data was decompressed and returned (either 271 because this was less than *max_length* bytes, or because 272 *max_length* was negative), the :attr:`~.needs_input` attribute 273 will be set to ``True``. 274 275 Attempting to decompress data after the end of stream is reached 276 raises an :exc:`EOFError`. Any data found after the end of the 277 stream is ignored and saved in the :attr:`~.unused_data` attribute. 278 279 .. versionchanged:: 3.5 280 Added the *max_length* parameter. 281 282 .. attribute:: check 283 284 The ID of the integrity check used by the input stream. This may be 285 :const:`CHECK_UNKNOWN` until enough of the input has been decoded to 286 determine what integrity check it uses. 287 288 .. attribute:: eof 289 290 ``True`` if the end-of-stream marker has been reached. 291 292 .. attribute:: unused_data 293 294 Data found after the end of the compressed stream. 295 296 Before the end of the stream is reached, this will be ``b""``. 297 298 .. attribute:: needs_input 299 300 ``False`` if the :meth:`.decompress` method can provide more 301 decompressed data before requiring new uncompressed input. 302 303 .. versionadded:: 3.5 304 305.. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None) 306 307 Compress *data* (a :class:`bytes` object), returning the compressed data as a 308 :class:`bytes` object. 309 310 See :class:`LZMACompressor` above for a description of the *format*, *check*, 311 *preset* and *filters* arguments. 312 313 314.. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None) 315 316 Decompress *data* (a :class:`bytes` object), returning the uncompressed data 317 as a :class:`bytes` object. 318 319 If *data* is the concatenation of multiple distinct compressed streams, 320 decompress all of these streams, and return the concatenation of the results. 321 322 See :class:`LZMADecompressor` above for a description of the *format*, 323 *memlimit* and *filters* arguments. 324 325 326Miscellaneous 327------------- 328 329.. function:: is_check_supported(check) 330 331 Return ``True`` if the given integrity check is supported on this system. 332 333 :const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported. 334 :const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are 335 using a version of :program:`liblzma` that was compiled with a limited 336 feature set. 337 338 339.. _filter-chain-specs: 340 341Specifying custom filter chains 342------------------------------- 343 344A filter chain specifier is a sequence of dictionaries, where each dictionary 345contains the ID and options for a single filter. Each dictionary must contain 346the key ``"id"``, and may contain additional keys to specify filter-dependent 347options. Valid filter IDs are as follows: 348 349* Compression filters: 350 351 * :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`) 352 * :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`) 353 354* Delta filter: 355 356 * :const:`FILTER_DELTA` 357 358* Branch-Call-Jump (BCJ) filters: 359 360 * :const:`FILTER_X86` 361 * :const:`FILTER_IA64` 362 * :const:`FILTER_ARM` 363 * :const:`FILTER_ARMTHUMB` 364 * :const:`FILTER_POWERPC` 365 * :const:`FILTER_SPARC` 366 367A filter chain can consist of up to 4 filters, and cannot be empty. The last 368filter in the chain must be a compression filter, and any other filters must be 369delta or BCJ filters. 370 371Compression filters support the following options (specified as additional 372entries in the dictionary representing the filter): 373 374* ``preset``: A compression preset to use as a source of default values for 375 options that are not specified explicitly. 376* ``dict_size``: Dictionary size in bytes. This should be between 4 KiB and 377 1.5 GiB (inclusive). 378* ``lc``: Number of literal context bits. 379* ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at 380 most 4. 381* ``pb``: Number of position bits; must be at most 4. 382* ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`. 383* ``nice_len``: What should be considered a "nice length" for a match. 384 This should be 273 or less. 385* ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`, 386 :const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`. 387* ``depth``: Maximum search depth used by match finder. 0 (default) means to 388 select automatically based on other filter options. 389 390The delta filter stores the differences between bytes, producing more repetitive 391input for the compressor in certain circumstances. It supports one option, 392``dist``. This indicates the distance between bytes to be subtracted. The 393default is 1, i.e. take the differences between adjacent bytes. 394 395The BCJ filters are intended to be applied to machine code. They convert 396relative branches, calls and jumps in the code to use absolute addressing, with 397the aim of increasing the redundancy that can be exploited by the compressor. 398These filters support one option, ``start_offset``. This specifies the address 399that should be mapped to the beginning of the input data. The default is 0. 400 401 402Examples 403-------- 404 405Reading in a compressed file:: 406 407 import lzma 408 with lzma.open("file.xz") as f: 409 file_content = f.read() 410 411Creating a compressed file:: 412 413 import lzma 414 data = b"Insert Data Here" 415 with lzma.open("file.xz", "w") as f: 416 f.write(data) 417 418Compressing data in memory:: 419 420 import lzma 421 data_in = b"Insert Data Here" 422 data_out = lzma.compress(data_in) 423 424Incremental compression:: 425 426 import lzma 427 lzc = lzma.LZMACompressor() 428 out1 = lzc.compress(b"Some data\n") 429 out2 = lzc.compress(b"Another piece of data\n") 430 out3 = lzc.compress(b"Even more data\n") 431 out4 = lzc.flush() 432 # Concatenate all the partial results: 433 result = b"".join([out1, out2, out3, out4]) 434 435Writing compressed data to an already-open file:: 436 437 import lzma 438 with open("file.xz", "wb") as f: 439 f.write(b"This data will not be compressed\n") 440 with lzma.open(f, "w") as lzf: 441 lzf.write(b"This *will* be compressed\n") 442 f.write(b"Not compressed\n") 443 444Creating a compressed file using a custom filter chain:: 445 446 import lzma 447 my_filters = [ 448 {"id": lzma.FILTER_DELTA, "dist": 5}, 449 {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME}, 450 ] 451 with lzma.open("file.xz", "w", filters=my_filters) as f: 452 f.write(b"blah blah blah") 453