1:mod:`lzma` --- Compression using the LZMA algorithm 2==================================================== 3 4.. module:: lzma 5 :synopsis: A Python wrapper for the liblzma compression library. 6 7.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 8.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 9 10.. versionadded:: 3.3 11 12**Source code:** :source:`Lib/lzma.py` 13 14-------------- 15 16This module provides classes and convenience functions for compressing and 17decompressing data using the LZMA compression algorithm. Also included is a file 18interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the 19:program:`xz` utility, as well as raw compressed streams. 20 21The interface provided by this module is very similar to that of the :mod:`bz2` 22module. However, note that :class:`LZMAFile` is *not* thread-safe, unlike 23:class:`bz2.BZ2File`, so if you need to use a single :class:`LZMAFile` instance 24from multiple threads, it is necessary to protect it with a lock. 25 26 27.. exception:: LZMAError 28 29 This exception is raised when an error occurs during compression or 30 decompression, or while initializing the compressor/decompressor state. 31 32 33Reading and writing compressed files 34------------------------------------ 35 36.. function:: open(filename, mode="rb", \*, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None) 37 38 Open an LZMA-compressed file in binary or text mode, returning a :term:`file 39 object`. 40 41 The *filename* argument can be either an actual file name (given as a 42 :class:`str`, :class:`bytes` or :term:`path-like <path-like object>` object), in 43 which case the named file is opened, or it can be an existing file object 44 to read from or write to. 45 46 The *mode* argument can be any of ``"r"``, ``"rb"``, ``"w"``, ``"wb"``, 47 ``"x"``, ``"xb"``, ``"a"`` or ``"ab"`` for binary mode, or ``"rt"``, 48 ``"wt"``, ``"xt"``, or ``"at"`` for text mode. The default is ``"rb"``. 49 50 When opening a file for reading, the *format* and *filters* arguments have 51 the same meanings as for :class:`LZMADecompressor`. In this case, the *check* 52 and *preset* arguments should not be used. 53 54 When opening a file for writing, the *format*, *check*, *preset* and 55 *filters* arguments have the same meanings as for :class:`LZMACompressor`. 56 57 For binary mode, this function is equivalent to the :class:`LZMAFile` 58 constructor: ``LZMAFile(filename, mode, ...)``. In this case, the *encoding*, 59 *errors* and *newline* arguments must not be provided. 60 61 For text mode, a :class:`LZMAFile` object is created, and wrapped in an 62 :class:`io.TextIOWrapper` instance with the specified encoding, error 63 handling behavior, and line ending(s). 64 65 .. versionchanged:: 3.4 66 Added support for the ``"x"``, ``"xb"`` and ``"xt"`` modes. 67 68 .. versionchanged:: 3.6 69 Accepts a :term:`path-like object`. 70 71 72.. class:: LZMAFile(filename=None, mode="r", \*, format=None, check=-1, preset=None, filters=None) 73 74 Open an LZMA-compressed file in binary mode. 75 76 An :class:`LZMAFile` can wrap an already-open :term:`file object`, or operate 77 directly on a named file. The *filename* argument specifies either the file 78 object to wrap, or the name of the file to open (as a :class:`str`, 79 :class:`bytes` or :term:`path-like <path-like object>` object). When wrapping an 80 existing file object, the wrapped file will not be closed when the 81 :class:`LZMAFile` is closed. 82 83 The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for 84 overwriting, ``"x"`` for exclusive creation, or ``"a"`` for appending. These 85 can equivalently be given as ``"rb"``, ``"wb"``, ``"xb"`` and ``"ab"`` 86 respectively. 87 88 If *filename* is a file object (rather than an actual file name), a mode of 89 ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``. 90 91 When opening a file for reading, the input file may be the concatenation of 92 multiple separate compressed streams. These are transparently decoded as a 93 single logical stream. 94 95 When opening a file for reading, the *format* and *filters* arguments have 96 the same meanings as for :class:`LZMADecompressor`. In this case, the *check* 97 and *preset* arguments should not be used. 98 99 When opening a file for writing, the *format*, *check*, *preset* and 100 *filters* arguments have the same meanings as for :class:`LZMACompressor`. 101 102 :class:`LZMAFile` supports all the members specified by 103 :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. 104 Iteration and the :keyword:`with` statement are supported. 105 106 The following method is also provided: 107 108 .. method:: peek(size=-1) 109 110 Return buffered data without advancing the file position. At least one 111 byte of data will be returned, unless EOF has been reached. The exact 112 number of bytes returned is unspecified (the *size* argument is ignored). 113 114 .. note:: While calling :meth:`peek` does not change the file position of 115 the :class:`LZMAFile`, it may change the position of the underlying 116 file object (e.g. if the :class:`LZMAFile` was constructed by passing a 117 file object for *filename*). 118 119 .. versionchanged:: 3.4 120 Added support for the ``"x"`` and ``"xb"`` modes. 121 122 .. versionchanged:: 3.5 123 The :meth:`~io.BufferedIOBase.read` method now accepts an argument of 124 ``None``. 125 126 .. versionchanged:: 3.6 127 Accepts a :term:`path-like object`. 128 129 130Compressing and decompressing data in memory 131-------------------------------------------- 132 133.. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None) 134 135 Create a compressor object, which can be used to compress data incrementally. 136 137 For a more convenient way of compressing a single chunk of data, see 138 :func:`compress`. 139 140 The *format* argument specifies what container format should be used. 141 Possible values are: 142 143 * :const:`FORMAT_XZ`: The ``.xz`` container format. 144 This is the default format. 145 146 * :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format. 147 This format is more limited than ``.xz`` -- it does not support integrity 148 checks or multiple filters. 149 150 * :const:`FORMAT_RAW`: A raw data stream, not using any container format. 151 This format specifier does not support integrity checks, and requires that 152 you always specify a custom filter chain (for both compression and 153 decompression). Additionally, data compressed in this manner cannot be 154 decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`). 155 156 The *check* argument specifies the type of integrity check to include in the 157 compressed data. This check is used when decompressing, to ensure that the 158 data has not been corrupted. Possible values are: 159 160 * :const:`CHECK_NONE`: No integrity check. 161 This is the default (and the only acceptable value) for 162 :const:`FORMAT_ALONE` and :const:`FORMAT_RAW`. 163 164 * :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check. 165 166 * :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check. 167 This is the default for :const:`FORMAT_XZ`. 168 169 * :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm. 170 171 If the specified check is not supported, an :class:`LZMAError` is raised. 172 173 The compression settings can be specified either as a preset compression 174 level (with the *preset* argument), or in detail as a custom filter chain 175 (with the *filters* argument). 176 177 The *preset* argument (if provided) should be an integer between ``0`` and 178 ``9`` (inclusive), optionally OR-ed with the constant 179 :const:`PRESET_EXTREME`. If neither *preset* nor *filters* are given, the 180 default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``). 181 Higher presets produce smaller output, but make the compression process 182 slower. 183 184 .. note:: 185 186 In addition to being more CPU-intensive, compression with higher presets 187 also requires much more memory (and produces output that needs more memory 188 to decompress). With preset ``9`` for example, the overhead for an 189 :class:`LZMACompressor` object can be as high as 800 MiB. For this reason, 190 it is generally best to stick with the default preset. 191 192 The *filters* argument (if provided) should be a filter chain specifier. 193 See :ref:`filter-chain-specs` for details. 194 195 .. method:: compress(data) 196 197 Compress *data* (a :class:`bytes` object), returning a :class:`bytes` 198 object containing compressed data for at least part of the input. Some of 199 *data* may be buffered internally, for use in later calls to 200 :meth:`compress` and :meth:`flush`. The returned data should be 201 concatenated with the output of any previous calls to :meth:`compress`. 202 203 .. method:: flush() 204 205 Finish the compression process, returning a :class:`bytes` object 206 containing any data stored in the compressor's internal buffers. 207 208 The compressor cannot be used after this method has been called. 209 210 211.. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None) 212 213 Create a decompressor object, which can be used to decompress data 214 incrementally. 215 216 For a more convenient way of decompressing an entire compressed stream at 217 once, see :func:`decompress`. 218 219 The *format* argument specifies the container format that should be used. The 220 default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and 221 ``.lzma`` files. Other possible values are :const:`FORMAT_XZ`, 222 :const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`. 223 224 The *memlimit* argument specifies a limit (in bytes) on the amount of memory 225 that the decompressor can use. When this argument is used, decompression will 226 fail with an :class:`LZMAError` if it is not possible to decompress the input 227 within the given memory limit. 228 229 The *filters* argument specifies the filter chain that was used to create 230 the stream being decompressed. This argument is required if *format* is 231 :const:`FORMAT_RAW`, but should not be used for other formats. 232 See :ref:`filter-chain-specs` for more information about filter chains. 233 234 .. note:: 235 This class does not transparently handle inputs containing multiple 236 compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To 237 decompress a multi-stream input with :class:`LZMADecompressor`, you must 238 create a new decompressor for each stream. 239 240 .. method:: decompress(data, max_length=-1) 241 242 Decompress *data* (a :term:`bytes-like object`), returning 243 uncompressed data as bytes. Some of *data* may be buffered 244 internally, for use in later calls to :meth:`decompress`. The 245 returned data should be concatenated with the output of any 246 previous calls to :meth:`decompress`. 247 248 If *max_length* is nonnegative, returns at most *max_length* 249 bytes of decompressed data. If this limit is reached and further 250 output can be produced, the :attr:`~.needs_input` attribute will 251 be set to ``False``. In this case, the next call to 252 :meth:`~.decompress` may provide *data* as ``b''`` to obtain 253 more of the output. 254 255 If all of the input data was decompressed and returned (either 256 because this was less than *max_length* bytes, or because 257 *max_length* was negative), the :attr:`~.needs_input` attribute 258 will be set to ``True``. 259 260 Attempting to decompress data after the end of stream is reached 261 raises an `EOFError`. Any data found after the end of the 262 stream is ignored and saved in the :attr:`~.unused_data` attribute. 263 264 .. versionchanged:: 3.5 265 Added the *max_length* parameter. 266 267 .. attribute:: check 268 269 The ID of the integrity check used by the input stream. This may be 270 :const:`CHECK_UNKNOWN` until enough of the input has been decoded to 271 determine what integrity check it uses. 272 273 .. attribute:: eof 274 275 ``True`` if the end-of-stream marker has been reached. 276 277 .. attribute:: unused_data 278 279 Data found after the end of the compressed stream. 280 281 Before the end of the stream is reached, this will be ``b""``. 282 283 .. attribute:: needs_input 284 285 ``False`` if the :meth:`.decompress` method can provide more 286 decompressed data before requiring new uncompressed input. 287 288 .. versionadded:: 3.5 289 290.. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None) 291 292 Compress *data* (a :class:`bytes` object), returning the compressed data as a 293 :class:`bytes` object. 294 295 See :class:`LZMACompressor` above for a description of the *format*, *check*, 296 *preset* and *filters* arguments. 297 298 299.. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None) 300 301 Decompress *data* (a :class:`bytes` object), returning the uncompressed data 302 as a :class:`bytes` object. 303 304 If *data* is the concatenation of multiple distinct compressed streams, 305 decompress all of these streams, and return the concatenation of the results. 306 307 See :class:`LZMADecompressor` above for a description of the *format*, 308 *memlimit* and *filters* arguments. 309 310 311Miscellaneous 312------------- 313 314.. function:: is_check_supported(check) 315 316 Return ``True`` if the given integrity check is supported on this system. 317 318 :const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported. 319 :const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are 320 using a version of :program:`liblzma` that was compiled with a limited 321 feature set. 322 323 324.. _filter-chain-specs: 325 326Specifying custom filter chains 327------------------------------- 328 329A filter chain specifier is a sequence of dictionaries, where each dictionary 330contains the ID and options for a single filter. Each dictionary must contain 331the key ``"id"``, and may contain additional keys to specify filter-dependent 332options. Valid filter IDs are as follows: 333 334* Compression filters: 335 * :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`) 336 * :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`) 337 338* Delta filter: 339 * :const:`FILTER_DELTA` 340 341* Branch-Call-Jump (BCJ) filters: 342 * :const:`FILTER_X86` 343 * :const:`FILTER_IA64` 344 * :const:`FILTER_ARM` 345 * :const:`FILTER_ARMTHUMB` 346 * :const:`FILTER_POWERPC` 347 * :const:`FILTER_SPARC` 348 349A filter chain can consist of up to 4 filters, and cannot be empty. The last 350filter in the chain must be a compression filter, and any other filters must be 351delta or BCJ filters. 352 353Compression filters support the following options (specified as additional 354entries in the dictionary representing the filter): 355 356 * ``preset``: A compression preset to use as a source of default values for 357 options that are not specified explicitly. 358 * ``dict_size``: Dictionary size in bytes. This should be between 4 KiB and 359 1.5 GiB (inclusive). 360 * ``lc``: Number of literal context bits. 361 * ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at 362 most 4. 363 * ``pb``: Number of position bits; must be at most 4. 364 * ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`. 365 * ``nice_len``: What should be considered a "nice length" for a match. 366 This should be 273 or less. 367 * ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`, 368 :const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`. 369 * ``depth``: Maximum search depth used by match finder. 0 (default) means to 370 select automatically based on other filter options. 371 372The delta filter stores the differences between bytes, producing more repetitive 373input for the compressor in certain circumstances. It supports one option, 374``dist``. This indicates the distance between bytes to be subtracted. The 375default is 1, i.e. take the differences between adjacent bytes. 376 377The BCJ filters are intended to be applied to machine code. They convert 378relative branches, calls and jumps in the code to use absolute addressing, with 379the aim of increasing the redundancy that can be exploited by the compressor. 380These filters support one option, ``start_offset``. This specifies the address 381that should be mapped to the beginning of the input data. The default is 0. 382 383 384Examples 385-------- 386 387Reading in a compressed file:: 388 389 import lzma 390 with lzma.open("file.xz") as f: 391 file_content = f.read() 392 393Creating a compressed file:: 394 395 import lzma 396 data = b"Insert Data Here" 397 with lzma.open("file.xz", "w") as f: 398 f.write(data) 399 400Compressing data in memory:: 401 402 import lzma 403 data_in = b"Insert Data Here" 404 data_out = lzma.compress(data_in) 405 406Incremental compression:: 407 408 import lzma 409 lzc = lzma.LZMACompressor() 410 out1 = lzc.compress(b"Some data\n") 411 out2 = lzc.compress(b"Another piece of data\n") 412 out3 = lzc.compress(b"Even more data\n") 413 out4 = lzc.flush() 414 # Concatenate all the partial results: 415 result = b"".join([out1, out2, out3, out4]) 416 417Writing compressed data to an already-open file:: 418 419 import lzma 420 with open("file.xz", "wb") as f: 421 f.write(b"This data will not be compressed\n") 422 with lzma.open(f, "w") as lzf: 423 lzf.write(b"This *will* be compressed\n") 424 f.write(b"Not compressed\n") 425 426Creating a compressed file using a custom filter chain:: 427 428 import lzma 429 my_filters = [ 430 {"id": lzma.FILTER_DELTA, "dist": 5}, 431 {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME}, 432 ] 433 with lzma.open("file.xz", "w", filters=my_filters) as f: 434 f.write(b"blah blah blah") 435