• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!bz2` --- Support for :program:`bzip2` compression
2========================================================
3
4.. module:: bz2
5   :synopsis: Interfaces for bzip2 compression and decompression.
6
7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
8.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
10.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
11
12**Source code:** :source:`Lib/bz2.py`
13
14--------------
15
16This module provides a comprehensive interface for compressing and
17decompressing data using the bzip2 compression algorithm.
18
19The :mod:`bz2` module contains:
20
21* The :func:`.open` function and :class:`BZ2File` class for reading and
22  writing compressed files.
23* The :class:`BZ2Compressor` and :class:`BZ2Decompressor` classes for
24  incremental (de)compression.
25* The :func:`compress` and :func:`decompress` functions for one-shot
26  (de)compression.
27
28
29(De)compression of files
30------------------------
31
32.. function:: open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)
33
34   Open a bzip2-compressed file in binary or text mode, returning a :term:`file
35   object`.
36
37   As with the constructor for :class:`BZ2File`, the *filename* argument can be
38   an actual filename (a :class:`str` or :class:`bytes` object), or an existing
39   file object to read from or write to.
40
41   The *mode* argument can be any of ``'r'``, ``'rb'``, ``'w'``, ``'wb'``,
42   ``'x'``, ``'xb'``, ``'a'`` or ``'ab'`` for binary mode, or ``'rt'``,
43   ``'wt'``, ``'xt'``, or ``'at'`` for text mode. The default is ``'rb'``.
44
45   The *compresslevel* argument is an integer from 1 to 9, as for the
46   :class:`BZ2File` constructor.
47
48   For binary mode, this function is equivalent to the :class:`BZ2File`
49   constructor: ``BZ2File(filename, mode, compresslevel=compresslevel)``. In
50   this case, the *encoding*, *errors* and *newline* arguments must not be
51   provided.
52
53   For text mode, a :class:`BZ2File` object is created, and wrapped in an
54   :class:`io.TextIOWrapper` instance with the specified encoding, error
55   handling behavior, and line ending(s).
56
57   .. versionadded:: 3.3
58
59   .. versionchanged:: 3.4
60      The ``'x'`` (exclusive creation) mode was added.
61
62   .. versionchanged:: 3.6
63      Accepts a :term:`path-like object`.
64
65
66.. class:: BZ2File(filename, mode='r', *, compresslevel=9)
67
68   Open a bzip2-compressed file in binary mode.
69
70   If *filename* is a :class:`str` or :class:`bytes` object, open the named file
71   directly. Otherwise, *filename* should be a :term:`file object`, which will
72   be used to read or write the compressed data.
73
74   The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for
75   overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These
76   can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'``
77   respectively.
78
79   If *filename* is a file object (rather than an actual file name), a mode of
80   ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``.
81
82   If *mode* is ``'w'`` or ``'a'``, *compresslevel* can be an integer between
83   ``1`` and ``9`` specifying the level of compression: ``1`` produces the
84   least compression, and ``9`` (default) produces the most compression.
85
86   If *mode* is ``'r'``, the input file may be the concatenation of multiple
87   compressed streams.
88
89   :class:`BZ2File` provides all of the members specified by the
90   :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach`
91   and :meth:`~io.IOBase.truncate`.
92   Iteration and the :keyword:`with` statement are supported.
93
94   :class:`BZ2File` also provides the following methods and attributes:
95
96   .. method:: peek([n])
97
98      Return buffered data without advancing the file position. At least one
99      byte of data will be returned (unless at EOF). The exact number of bytes
100      returned is unspecified.
101
102      .. note:: While calling :meth:`peek` does not change the file position of
103         the :class:`BZ2File`, it may change the position of the underlying file
104         object (e.g. if the :class:`BZ2File` was constructed by passing a file
105         object for *filename*).
106
107      .. versionadded:: 3.3
108
109   .. method:: fileno()
110
111      Return the file descriptor for the underlying file.
112
113      .. versionadded:: 3.3
114
115   .. method:: readable()
116
117      Return whether the file was opened for reading.
118
119      .. versionadded:: 3.3
120
121   .. method:: seekable()
122
123      Return whether the file supports seeking.
124
125      .. versionadded:: 3.3
126
127   .. method:: writable()
128
129      Return whether the file was opened for writing.
130
131      .. versionadded:: 3.3
132
133   .. method:: read1(size=-1)
134
135      Read up to *size* uncompressed bytes, while trying to avoid
136      making multiple reads from the underlying stream. Reads up to a
137      buffer's worth of data if size is negative.
138
139      Returns ``b''`` if the file is at EOF.
140
141      .. versionadded:: 3.3
142
143   .. method:: readinto(b)
144
145      Read bytes into *b*.
146
147      Returns the number of bytes read (0 for EOF).
148
149      .. versionadded:: 3.3
150
151   .. attribute:: mode
152
153      ``'rb'`` for reading and ``'wb'`` for writing.
154
155      .. versionadded:: 3.13
156
157   .. attribute:: name
158
159      The bzip2 file name.  Equivalent to the :attr:`~io.FileIO.name`
160      attribute of the underlying :term:`file object`.
161
162      .. versionadded:: 3.13
163
164
165   .. versionchanged:: 3.1
166      Support for the :keyword:`with` statement was added.
167
168   .. versionchanged:: 3.3
169      Support was added for *filename* being a :term:`file object` instead of an
170      actual filename.
171
172      The ``'a'`` (append) mode was added, along with support for reading
173      multi-stream files.
174
175   .. versionchanged:: 3.4
176      The ``'x'`` (exclusive creation) mode was added.
177
178   .. versionchanged:: 3.5
179      The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
180      ``None``.
181
182   .. versionchanged:: 3.6
183      Accepts a :term:`path-like object`.
184
185   .. versionchanged:: 3.9
186      The *buffering* parameter has been removed. It was ignored and deprecated
187      since Python 3.0. Pass an open file object to control how the file is
188      opened.
189
190      The *compresslevel* parameter became keyword-only.
191
192   .. versionchanged:: 3.10
193      This class is thread unsafe in the face of multiple simultaneous
194      readers or writers, just like its equivalent classes in :mod:`gzip` and
195      :mod:`lzma` have always been.
196
197
198Incremental (de)compression
199---------------------------
200
201.. class:: BZ2Compressor(compresslevel=9)
202
203   Create a new compressor object. This object may be used to compress data
204   incrementally. For one-shot compression, use the :func:`compress` function
205   instead.
206
207   *compresslevel*, if given, must be an integer between ``1`` and ``9``. The
208   default is ``9``.
209
210   .. method:: compress(data)
211
212      Provide data to the compressor object. Returns a chunk of compressed data
213      if possible, or an empty byte string otherwise.
214
215      When you have finished providing data to the compressor, call the
216      :meth:`flush` method to finish the compression process.
217
218
219   .. method:: flush()
220
221      Finish the compression process. Returns the compressed data left in
222      internal buffers.
223
224      The compressor object may not be used after this method has been called.
225
226
227.. class:: BZ2Decompressor()
228
229   Create a new decompressor object. This object may be used to decompress data
230   incrementally. For one-shot compression, use the :func:`decompress` function
231   instead.
232
233   .. note::
234      This class does not transparently handle inputs containing multiple
235      compressed streams, unlike :func:`decompress` and :class:`BZ2File`. If
236      you need to decompress a multi-stream input with :class:`BZ2Decompressor`,
237      you must use a new decompressor for each stream.
238
239   .. method:: decompress(data, max_length=-1)
240
241      Decompress *data* (a :term:`bytes-like object`), returning
242      uncompressed data as bytes. Some of *data* may be buffered
243      internally, for use in later calls to :meth:`decompress`. The
244      returned data should be concatenated with the output of any
245      previous calls to :meth:`decompress`.
246
247      If *max_length* is nonnegative, returns at most *max_length*
248      bytes of decompressed data. If this limit is reached and further
249      output can be produced, the :attr:`~.needs_input` attribute will
250      be set to ``False``. In this case, the next call to
251      :meth:`~.decompress` may provide *data* as ``b''`` to obtain
252      more of the output.
253
254      If all of the input data was decompressed and returned (either
255      because this was less than *max_length* bytes, or because
256      *max_length* was negative), the :attr:`~.needs_input` attribute
257      will be set to ``True``.
258
259      Attempting to decompress data after the end of stream is reached
260      raises an :exc:`EOFError`.  Any data found after the end of the
261      stream is ignored and saved in the :attr:`~.unused_data` attribute.
262
263      .. versionchanged:: 3.5
264         Added the *max_length* parameter.
265
266   .. attribute:: eof
267
268      ``True`` if the end-of-stream marker has been reached.
269
270      .. versionadded:: 3.3
271
272
273   .. attribute:: unused_data
274
275      Data found after the end of the compressed stream.
276
277      If this attribute is accessed before the end of the stream has been
278      reached, its value will be ``b''``.
279
280   .. attribute:: needs_input
281
282      ``False`` if the :meth:`.decompress` method can provide more
283      decompressed data before requiring new uncompressed input.
284
285      .. versionadded:: 3.5
286
287
288One-shot (de)compression
289------------------------
290
291.. function:: compress(data, compresslevel=9)
292
293   Compress *data*, a :term:`bytes-like object <bytes-like object>`.
294
295   *compresslevel*, if given, must be an integer between ``1`` and ``9``. The
296   default is ``9``.
297
298   For incremental compression, use a :class:`BZ2Compressor` instead.
299
300
301.. function:: decompress(data)
302
303   Decompress *data*, a :term:`bytes-like object <bytes-like object>`.
304
305   If *data* is the concatenation of multiple compressed streams, decompress
306   all of the streams.
307
308   For incremental decompression, use a :class:`BZ2Decompressor` instead.
309
310   .. versionchanged:: 3.3
311      Support for multi-stream inputs was added.
312
313.. _bz2-usage-examples:
314
315Examples of usage
316-----------------
317
318Below are some examples of typical usage of the :mod:`bz2` module.
319
320Using :func:`compress` and :func:`decompress` to demonstrate round-trip compression:
321
322    >>> import bz2
323    >>> data = b"""\
324    ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue
325    ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem,
326    ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus
327    ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat.
328    ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo
329    ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum
330    ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum."""
331    >>> c = bz2.compress(data)
332    >>> len(data) / len(c)  # Data compression ratio
333    1.513595166163142
334    >>> d = bz2.decompress(c)
335    >>> data == d  # Check equality to original object after round-trip
336    True
337
338Using :class:`BZ2Compressor` for incremental compression:
339
340    >>> import bz2
341    >>> def gen_data(chunks=10, chunksize=1000):
342    ...     """Yield incremental blocks of chunksize bytes."""
343    ...     for _ in range(chunks):
344    ...         yield b"z" * chunksize
345    ...
346    >>> comp = bz2.BZ2Compressor()
347    >>> out = b""
348    >>> for chunk in gen_data():
349    ...     # Provide data to the compressor object
350    ...     out = out + comp.compress(chunk)
351    ...
352    >>> # Finish the compression process.  Call this once you have
353    >>> # finished providing data to the compressor.
354    >>> out = out + comp.flush()
355
356The example above uses a very "nonrandom" stream of data
357(a stream of ``b"z"`` chunks).  Random data tends to compress poorly,
358while ordered, repetitive data usually yields a high compression ratio.
359
360Writing and reading a bzip2-compressed file in binary mode:
361
362    >>> import bz2
363    >>> data = b"""\
364    ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue
365    ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem,
366    ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus
367    ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat.
368    ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo
369    ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum
370    ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum."""
371    >>> with bz2.open("myfile.bz2", "wb") as f:
372    ...     # Write compressed data to file
373    ...     unused = f.write(data)
374    ...
375    >>> with bz2.open("myfile.bz2", "rb") as f:
376    ...     # Decompress data from file
377    ...     content = f.read()
378    ...
379    >>> content == data  # Check equality to original object after round-trip
380    True
381
382.. testcleanup::
383
384   import os
385   os.remove("myfile.bz2")
386