1:mod:`bz2` --- Support for :program:`bzip2` compression 2======================================================= 3 4.. module:: bz2 5 :synopsis: Interfaces for bzip2 compression and decompression. 6 7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 8.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 10.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 11 12**Source code:** :source:`Lib/bz2.py` 13 14-------------- 15 16This module provides a comprehensive interface for compressing and 17decompressing data using the bzip2 compression algorithm. 18 19The :mod:`bz2` module contains: 20 21* The :func:`.open` function and :class:`BZ2File` class for reading and 22 writing compressed files. 23* The :class:`BZ2Compressor` and :class:`BZ2Decompressor` classes for 24 incremental (de)compression. 25* The :func:`compress` and :func:`decompress` functions for one-shot 26 (de)compression. 27 28All of the classes in this module may safely be accessed from multiple threads. 29 30 31(De)compression of files 32------------------------ 33 34.. function:: open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None) 35 36 Open a bzip2-compressed file in binary or text mode, returning a :term:`file 37 object`. 38 39 As with the constructor for :class:`BZ2File`, the *filename* argument can be 40 an actual filename (a :class:`str` or :class:`bytes` object), or an existing 41 file object to read from or write to. 42 43 The *mode* argument can be any of ``'r'``, ``'rb'``, ``'w'``, ``'wb'``, 44 ``'x'``, ``'xb'``, ``'a'`` or ``'ab'`` for binary mode, or ``'rt'``, 45 ``'wt'``, ``'xt'``, or ``'at'`` for text mode. The default is ``'rb'``. 46 47 The *compresslevel* argument is an integer from 1 to 9, as for the 48 :class:`BZ2File` constructor. 49 50 For binary mode, this function is equivalent to the :class:`BZ2File` 51 constructor: ``BZ2File(filename, mode, compresslevel=compresslevel)``. In 52 this case, the *encoding*, *errors* and *newline* arguments must not be 53 provided. 54 55 For text mode, a :class:`BZ2File` object is created, and wrapped in an 56 :class:`io.TextIOWrapper` instance with the specified encoding, error 57 handling behavior, and line ending(s). 58 59 .. versionadded:: 3.3 60 61 .. versionchanged:: 3.4 62 The ``'x'`` (exclusive creation) mode was added. 63 64 .. versionchanged:: 3.6 65 Accepts a :term:`path-like object`. 66 67 68.. class:: BZ2File(filename, mode='r', *, compresslevel=9) 69 70 Open a bzip2-compressed file in binary mode. 71 72 If *filename* is a :class:`str` or :class:`bytes` object, open the named file 73 directly. Otherwise, *filename* should be a :term:`file object`, which will 74 be used to read or write the compressed data. 75 76 The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for 77 overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These 78 can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'`` 79 respectively. 80 81 If *filename* is a file object (rather than an actual file name), a mode of 82 ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``. 83 84 If *mode* is ``'w'`` or ``'a'``, *compresslevel* can be an integer between 85 ``1`` and ``9`` specifying the level of compression: ``1`` produces the 86 least compression, and ``9`` (default) produces the most compression. 87 88 If *mode* is ``'r'``, the input file may be the concatenation of multiple 89 compressed streams. 90 91 :class:`BZ2File` provides all of the members specified by the 92 :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. 93 Iteration and the :keyword:`with` statement are supported. 94 95 :class:`BZ2File` also provides the following method: 96 97 .. method:: peek([n]) 98 99 Return buffered data without advancing the file position. At least one 100 byte of data will be returned (unless at EOF). The exact number of bytes 101 returned is unspecified. 102 103 .. note:: While calling :meth:`peek` does not change the file position of 104 the :class:`BZ2File`, it may change the position of the underlying file 105 object (e.g. if the :class:`BZ2File` was constructed by passing a file 106 object for *filename*). 107 108 .. versionadded:: 3.3 109 110 111 .. versionchanged:: 3.1 112 Support for the :keyword:`with` statement was added. 113 114 .. versionchanged:: 3.3 115 The :meth:`fileno`, :meth:`readable`, :meth:`seekable`, :meth:`writable`, 116 :meth:`read1` and :meth:`readinto` methods were added. 117 118 .. versionchanged:: 3.3 119 Support was added for *filename* being a :term:`file object` instead of an 120 actual filename. 121 122 .. versionchanged:: 3.3 123 The ``'a'`` (append) mode was added, along with support for reading 124 multi-stream files. 125 126 .. versionchanged:: 3.4 127 The ``'x'`` (exclusive creation) mode was added. 128 129 .. versionchanged:: 3.5 130 The :meth:`~io.BufferedIOBase.read` method now accepts an argument of 131 ``None``. 132 133 .. versionchanged:: 3.6 134 Accepts a :term:`path-like object`. 135 136 .. versionchanged:: 3.9 137 The *buffering* parameter has been removed. It was ignored and deprecated 138 since Python 3.0. Pass an open file object to control how the file is 139 opened. 140 141 The *compresslevel* parameter became keyword-only. 142 143 144Incremental (de)compression 145--------------------------- 146 147.. class:: BZ2Compressor(compresslevel=9) 148 149 Create a new compressor object. This object may be used to compress data 150 incrementally. For one-shot compression, use the :func:`compress` function 151 instead. 152 153 *compresslevel*, if given, must be an integer between ``1`` and ``9``. The 154 default is ``9``. 155 156 .. method:: compress(data) 157 158 Provide data to the compressor object. Returns a chunk of compressed data 159 if possible, or an empty byte string otherwise. 160 161 When you have finished providing data to the compressor, call the 162 :meth:`flush` method to finish the compression process. 163 164 165 .. method:: flush() 166 167 Finish the compression process. Returns the compressed data left in 168 internal buffers. 169 170 The compressor object may not be used after this method has been called. 171 172 173.. class:: BZ2Decompressor() 174 175 Create a new decompressor object. This object may be used to decompress data 176 incrementally. For one-shot compression, use the :func:`decompress` function 177 instead. 178 179 .. note:: 180 This class does not transparently handle inputs containing multiple 181 compressed streams, unlike :func:`decompress` and :class:`BZ2File`. If 182 you need to decompress a multi-stream input with :class:`BZ2Decompressor`, 183 you must use a new decompressor for each stream. 184 185 .. method:: decompress(data, max_length=-1) 186 187 Decompress *data* (a :term:`bytes-like object`), returning 188 uncompressed data as bytes. Some of *data* may be buffered 189 internally, for use in later calls to :meth:`decompress`. The 190 returned data should be concatenated with the output of any 191 previous calls to :meth:`decompress`. 192 193 If *max_length* is nonnegative, returns at most *max_length* 194 bytes of decompressed data. If this limit is reached and further 195 output can be produced, the :attr:`~.needs_input` attribute will 196 be set to ``False``. In this case, the next call to 197 :meth:`~.decompress` may provide *data* as ``b''`` to obtain 198 more of the output. 199 200 If all of the input data was decompressed and returned (either 201 because this was less than *max_length* bytes, or because 202 *max_length* was negative), the :attr:`~.needs_input` attribute 203 will be set to ``True``. 204 205 Attempting to decompress data after the end of stream is reached 206 raises an `EOFError`. Any data found after the end of the 207 stream is ignored and saved in the :attr:`~.unused_data` attribute. 208 209 .. versionchanged:: 3.5 210 Added the *max_length* parameter. 211 212 .. attribute:: eof 213 214 ``True`` if the end-of-stream marker has been reached. 215 216 .. versionadded:: 3.3 217 218 219 .. attribute:: unused_data 220 221 Data found after the end of the compressed stream. 222 223 If this attribute is accessed before the end of the stream has been 224 reached, its value will be ``b''``. 225 226 .. attribute:: needs_input 227 228 ``False`` if the :meth:`.decompress` method can provide more 229 decompressed data before requiring new uncompressed input. 230 231 .. versionadded:: 3.5 232 233 234One-shot (de)compression 235------------------------ 236 237.. function:: compress(data, compresslevel=9) 238 239 Compress *data*, a :term:`bytes-like object <bytes-like object>`. 240 241 *compresslevel*, if given, must be an integer between ``1`` and ``9``. The 242 default is ``9``. 243 244 For incremental compression, use a :class:`BZ2Compressor` instead. 245 246 247.. function:: decompress(data) 248 249 Decompress *data*, a :term:`bytes-like object <bytes-like object>`. 250 251 If *data* is the concatenation of multiple compressed streams, decompress 252 all of the streams. 253 254 For incremental decompression, use a :class:`BZ2Decompressor` instead. 255 256 .. versionchanged:: 3.3 257 Support for multi-stream inputs was added. 258 259.. _bz2-usage-examples: 260 261Examples of usage 262----------------- 263 264Below are some examples of typical usage of the :mod:`bz2` module. 265 266Using :func:`compress` and :func:`decompress` to demonstrate round-trip compression: 267 268 >>> import bz2 269 >>> data = b"""\ 270 ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue 271 ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, 272 ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus 273 ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. 274 ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo 275 ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum 276 ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" 277 >>> c = bz2.compress(data) 278 >>> len(data) / len(c) # Data compression ratio 279 1.513595166163142 280 >>> d = bz2.decompress(c) 281 >>> data == d # Check equality to original object after round-trip 282 True 283 284Using :class:`BZ2Compressor` for incremental compression: 285 286 >>> import bz2 287 >>> def gen_data(chunks=10, chunksize=1000): 288 ... """Yield incremental blocks of chunksize bytes.""" 289 ... for _ in range(chunks): 290 ... yield b"z" * chunksize 291 ... 292 >>> comp = bz2.BZ2Compressor() 293 >>> out = b"" 294 >>> for chunk in gen_data(): 295 ... # Provide data to the compressor object 296 ... out = out + comp.compress(chunk) 297 ... 298 >>> # Finish the compression process. Call this once you have 299 >>> # finished providing data to the compressor. 300 >>> out = out + comp.flush() 301 302The example above uses a very "nonrandom" stream of data 303(a stream of `b"z"` chunks). Random data tends to compress poorly, 304while ordered, repetitive data usually yields a high compression ratio. 305 306Writing and reading a bzip2-compressed file in binary mode: 307 308 >>> import bz2 309 >>> data = b"""\ 310 ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue 311 ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, 312 ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus 313 ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. 314 ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo 315 ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum 316 ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" 317 >>> with bz2.open("myfile.bz2", "wb") as f: 318 ... # Write compressed data to file 319 ... unused = f.write(data) 320 >>> with bz2.open("myfile.bz2", "rb") as f: 321 ... # Decompress data from file 322 ... content = f.read() 323 >>> content == data # Check equality to original object after round-trip 324 True 325