• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`pickle` --- Python object serialization
2=============================================
3
4.. module:: pickle
5   :synopsis: Convert Python objects to streams of bytes and back.
6
7.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
8.. sectionauthor:: Barry Warsaw <barry@python.org>
9
10**Source code:** :source:`Lib/pickle.py`
11
12.. index::
13   single: persistence
14   pair: persistent; objects
15   pair: serializing; objects
16   pair: marshalling; objects
17   pair: flattening; objects
18   pair: pickling; objects
19
20--------------
21
22The :mod:`pickle` module implements binary protocols for serializing and
23de-serializing a Python object structure.  *"Pickling"* is the process
24whereby a Python object hierarchy is converted into a byte stream, and
25*"unpickling"* is the inverse operation, whereby a byte stream
26(from a :term:`binary file` or :term:`bytes-like object`) is converted
27back into an object hierarchy.  Pickling (and unpickling) is alternatively
28known as "serialization", "marshalling," [#]_ or "flattening"; however, to
29avoid confusion, the terms used here are "pickling" and "unpickling".
30
31.. warning::
32
33   The :mod:`pickle` module is not secure against erroneous or maliciously
34   constructed data.  Never unpickle data received from an untrusted or
35   unauthenticated source.
36
37
38Relationship to other Python modules
39------------------------------------
40
41Comparison with ``marshal``
42^^^^^^^^^^^^^^^^^^^^^^^^^^^
43
44Python has a more primitive serialization module called :mod:`marshal`, but in
45general :mod:`pickle` should always be the preferred way to serialize Python
46objects.  :mod:`marshal` exists primarily to support Python's :file:`.pyc`
47files.
48
49The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
50
51* The :mod:`pickle` module keeps track of the objects it has already serialized,
52  so that later references to the same object won't be serialized again.
53  :mod:`marshal` doesn't do this.
54
55  This has implications both for recursive objects and object sharing.  Recursive
56  objects are objects that contain references to themselves.  These are not
57  handled by marshal, and in fact, attempting to marshal recursive objects will
58  crash your Python interpreter.  Object sharing happens when there are multiple
59  references to the same object in different places in the object hierarchy being
60  serialized.  :mod:`pickle` stores such objects only once, and ensures that all
61  other references point to the master copy.  Shared objects remain shared, which
62  can be very important for mutable objects.
63
64* :mod:`marshal` cannot be used to serialize user-defined classes and their
65  instances.  :mod:`pickle` can save and restore class instances transparently,
66  however the class definition must be importable and live in the same module as
67  when the object was stored.
68
69* The :mod:`marshal` serialization format is not guaranteed to be portable
70  across Python versions.  Because its primary job in life is to support
71  :file:`.pyc` files, the Python implementers reserve the right to change the
72  serialization format in non-backwards compatible ways should the need arise.
73  The :mod:`pickle` serialization format is guaranteed to be backwards compatible
74  across Python releases provided a compatible pickle protocol is chosen and
75  pickling and unpickling code deals with Python 2 to Python 3 type differences
76  if your data is crossing that unique breaking change language boundary.
77
78Comparison with ``json``
79^^^^^^^^^^^^^^^^^^^^^^^^
80
81There are fundamental differences between the pickle protocols and
82`JSON (JavaScript Object Notation) <http://json.org>`_:
83
84* JSON is a text serialization format (it outputs unicode text, although
85  most of the time it is then encoded to ``utf-8``), while pickle is
86  a binary serialization format;
87
88* JSON is human-readable, while pickle is not;
89
90* JSON is interoperable and widely used outside of the Python ecosystem,
91  while pickle is Python-specific;
92
93* JSON, by default, can only represent a subset of the Python built-in
94  types, and no custom classes; pickle can represent an extremely large
95  number of Python types (many of them automatically, by clever usage
96  of Python's introspection facilities; complex cases can be tackled by
97  implementing :ref:`specific object APIs <pickle-inst>`).
98
99.. seealso::
100   The :mod:`json` module: a standard library module allowing JSON
101   serialization and deserialization.
102
103
104.. _pickle-protocols:
105
106Data stream format
107------------------
108
109.. index::
110   single: External Data Representation
111
112The data format used by :mod:`pickle` is Python-specific.  This has the
113advantage that there are no restrictions imposed by external standards such as
114JSON or XDR (which can't represent pointer sharing); however it means that
115non-Python programs may not be able to reconstruct pickled Python objects.
116
117By default, the :mod:`pickle` data format uses a relatively compact binary
118representation.  If you need optimal size characteristics, you can efficiently
119:doc:`compress <archiving>` pickled data.
120
121The module :mod:`pickletools` contains tools for analyzing data streams
122generated by :mod:`pickle`.  :mod:`pickletools` source code has extensive
123comments about opcodes used by pickle protocols.
124
125There are currently 5 different protocols which can be used for pickling.
126The higher the protocol used, the more recent the version of Python needed
127to read the pickle produced.
128
129* Protocol version 0 is the original "human-readable" protocol and is
130  backwards compatible with earlier versions of Python.
131
132* Protocol version 1 is an old binary format which is also compatible with
133  earlier versions of Python.
134
135* Protocol version 2 was introduced in Python 2.3.  It provides much more
136  efficient pickling of :term:`new-style class`\es.  Refer to :pep:`307` for
137  information about improvements brought by protocol 2.
138
139* Protocol version 3 was added in Python 3.0.  It has explicit support for
140  :class:`bytes` objects and cannot be unpickled by Python 2.x.  This is
141  the default protocol, and the recommended protocol when compatibility with
142  other Python 3 versions is required.
143
144* Protocol version 4 was added in Python 3.4.  It adds support for very large
145  objects, pickling more kinds of objects, and some data format
146  optimizations.  Refer to :pep:`3154` for information about improvements
147  brought by protocol 4.
148
149.. note::
150   Serialization is a more primitive notion than persistence; although
151   :mod:`pickle` reads and writes file objects, it does not handle the issue of
152   naming persistent objects, nor the (even more complicated) issue of concurrent
153   access to persistent objects.  The :mod:`pickle` module can transform a complex
154   object into a byte stream and it can transform the byte stream into an object
155   with the same internal structure.  Perhaps the most obvious thing to do with
156   these byte streams is to write them onto a file, but it is also conceivable to
157   send them across a network or store them in a database.  The :mod:`shelve`
158   module provides a simple interface to pickle and unpickle objects on
159   DBM-style database files.
160
161
162Module Interface
163----------------
164
165To serialize an object hierarchy, you simply call the :func:`dumps` function.
166Similarly, to de-serialize a data stream, you call the :func:`loads` function.
167However, if you want more control over serialization and de-serialization,
168you can create a :class:`Pickler` or an :class:`Unpickler` object, respectively.
169
170The :mod:`pickle` module provides the following constants:
171
172
173.. data:: HIGHEST_PROTOCOL
174
175   An integer, the highest :ref:`protocol version <pickle-protocols>`
176   available.  This value can be passed as a *protocol* value to functions
177   :func:`dump` and :func:`dumps` as well as the :class:`Pickler`
178   constructor.
179
180.. data:: DEFAULT_PROTOCOL
181
182   An integer, the default :ref:`protocol version <pickle-protocols>` used
183   for pickling.  May be less than :data:`HIGHEST_PROTOCOL`.  Currently the
184   default protocol is 3, a new protocol designed for Python 3.
185
186
187The :mod:`pickle` module provides the following functions to make the pickling
188process more convenient:
189
190.. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
191
192   Write a pickled representation of *obj* to the open :term:`file object` *file*.
193   This is equivalent to ``Pickler(file, protocol).dump(obj)``.
194
195   The optional *protocol* argument, an integer, tells the pickler to use
196   the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
197   If not specified, the default is :data:`DEFAULT_PROTOCOL`.  If a negative
198   number is specified, :data:`HIGHEST_PROTOCOL` is selected.
199
200   The *file* argument must have a write() method that accepts a single bytes
201   argument.  It can thus be an on-disk file opened for binary writing, an
202   :class:`io.BytesIO` instance, or any other custom object that meets this
203   interface.
204
205   If *fix_imports* is true and *protocol* is less than 3, pickle will try to
206   map the new Python 3 names to the old module names used in Python 2, so
207   that the pickle data stream is readable with Python 2.
208
209.. function:: dumps(obj, protocol=None, \*, fix_imports=True)
210
211   Return the pickled representation of the object as a :class:`bytes` object,
212   instead of writing it to a file.
213
214   Arguments *protocol* and *fix_imports* have the same meaning as in
215   :func:`dump`.
216
217.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
218
219   Read a pickled object representation from the open :term:`file object`
220   *file* and return the reconstituted object hierarchy specified therein.
221   This is equivalent to ``Unpickler(file).load()``.
222
223   The protocol version of the pickle is detected automatically, so no
224   protocol argument is needed.  Bytes past the pickled object's
225   representation are ignored.
226
227   The argument *file* must have two methods, a read() method that takes an
228   integer argument, and a readline() method that requires no arguments.  Both
229   methods should return bytes.  Thus *file* can be an on-disk file opened for
230   binary reading, an :class:`io.BytesIO` object, or any other custom object
231   that meets this interface.
232
233   Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
234   which are used to control compatibility support for pickle stream generated
235   by Python 2.  If *fix_imports* is true, pickle will try to map the old
236   Python 2 names to the new names used in Python 3.  The *encoding* and
237   *errors* tell pickle how to decode 8-bit string instances pickled by Python
238   2; these default to 'ASCII' and 'strict', respectively.  The *encoding* can
239   be 'bytes' to read these 8-bit string instances as bytes objects.
240   Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
241   instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
242   :class:`~datetime.time` pickled by Python 2.
243
244.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
245
246   Read a pickled object hierarchy from a :class:`bytes` object and return the
247   reconstituted object hierarchy specified therein.
248
249   The protocol version of the pickle is detected automatically, so no
250   protocol argument is needed.  Bytes past the pickled object's
251   representation are ignored.
252
253   Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
254   which are used to control compatibility support for pickle stream generated
255   by Python 2.  If *fix_imports* is true, pickle will try to map the old
256   Python 2 names to the new names used in Python 3.  The *encoding* and
257   *errors* tell pickle how to decode 8-bit string instances pickled by Python
258   2; these default to 'ASCII' and 'strict', respectively.  The *encoding* can
259   be 'bytes' to read these 8-bit string instances as bytes objects.
260   Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
261   instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
262   :class:`~datetime.time` pickled by Python 2.
263
264
265The :mod:`pickle` module defines three exceptions:
266
267.. exception:: PickleError
268
269   Common base class for the other pickling exceptions.  It inherits
270   :exc:`Exception`.
271
272.. exception:: PicklingError
273
274   Error raised when an unpicklable object is encountered by :class:`Pickler`.
275   It inherits :exc:`PickleError`.
276
277   Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
278   pickled.
279
280.. exception:: UnpicklingError
281
282   Error raised when there is a problem unpickling an object, such as a data
283   corruption or a security violation.  It inherits :exc:`PickleError`.
284
285   Note that other exceptions may also be raised during unpickling, including
286   (but not necessarily limited to) AttributeError, EOFError, ImportError, and
287   IndexError.
288
289
290The :mod:`pickle` module exports two classes, :class:`Pickler` and
291:class:`Unpickler`:
292
293.. class:: Pickler(file, protocol=None, \*, fix_imports=True)
294
295   This takes a binary file for writing a pickle data stream.
296
297   The optional *protocol* argument, an integer, tells the pickler to use
298   the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
299   If not specified, the default is :data:`DEFAULT_PROTOCOL`.  If a negative
300   number is specified, :data:`HIGHEST_PROTOCOL` is selected.
301
302   The *file* argument must have a write() method that accepts a single bytes
303   argument.  It can thus be an on-disk file opened for binary writing, an
304   :class:`io.BytesIO` instance, or any other custom object that meets this
305   interface.
306
307   If *fix_imports* is true and *protocol* is less than 3, pickle will try to
308   map the new Python 3 names to the old module names used in Python 2, so
309   that the pickle data stream is readable with Python 2.
310
311   .. method:: dump(obj)
312
313      Write a pickled representation of *obj* to the open file object given in
314      the constructor.
315
316   .. method:: persistent_id(obj)
317
318      Do nothing by default.  This exists so a subclass can override it.
319
320      If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual.  Any
321      other value causes :class:`Pickler` to emit the returned value as a
322      persistent ID for *obj*.  The meaning of this persistent ID should be
323      defined by :meth:`Unpickler.persistent_load`.  Note that the value
324      returned by :meth:`persistent_id` cannot itself have a persistent ID.
325
326      See :ref:`pickle-persistent` for details and examples of uses.
327
328   .. attribute:: dispatch_table
329
330      A pickler object's dispatch table is a registry of *reduction
331      functions* of the kind which can be declared using
332      :func:`copyreg.pickle`.  It is a mapping whose keys are classes
333      and whose values are reduction functions.  A reduction function
334      takes a single argument of the associated class and should
335      conform to the same interface as a :meth:`__reduce__`
336      method.
337
338      By default, a pickler object will not have a
339      :attr:`dispatch_table` attribute, and it will instead use the
340      global dispatch table managed by the :mod:`copyreg` module.
341      However, to customize the pickling for a specific pickler object
342      one can set the :attr:`dispatch_table` attribute to a dict-like
343      object.  Alternatively, if a subclass of :class:`Pickler` has a
344      :attr:`dispatch_table` attribute then this will be used as the
345      default dispatch table for instances of that class.
346
347      See :ref:`pickle-dispatch` for usage examples.
348
349      .. versionadded:: 3.3
350
351   .. attribute:: fast
352
353      Deprecated. Enable fast mode if set to a true value.  The fast mode
354      disables the usage of memo, therefore speeding the pickling process by not
355      generating superfluous PUT opcodes.  It should not be used with
356      self-referential objects, doing otherwise will cause :class:`Pickler` to
357      recurse infinitely.
358
359      Use :func:`pickletools.optimize` if you need more compact pickles.
360
361
362.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
363
364   This takes a binary file for reading a pickle data stream.
365
366   The protocol version of the pickle is detected automatically, so no
367   protocol argument is needed.
368
369   The argument *file* must have two methods, a read() method that takes an
370   integer argument, and a readline() method that requires no arguments.  Both
371   methods should return bytes.  Thus *file* can be an on-disk file object
372   opened for binary reading, an :class:`io.BytesIO` object, or any other
373   custom object that meets this interface.
374
375   Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
376   which are used to control compatibility support for pickle stream generated
377   by Python 2.  If *fix_imports* is true, pickle will try to map the old
378   Python 2 names to the new names used in Python 3.  The *encoding* and
379   *errors* tell pickle how to decode 8-bit string instances pickled by Python
380   2; these default to 'ASCII' and 'strict', respectively.  The *encoding* can
381   be 'bytes' to read these 8-bit string instances as bytes objects.
382
383   .. method:: load()
384
385      Read a pickled object representation from the open file object given in
386      the constructor, and return the reconstituted object hierarchy specified
387      therein.  Bytes past the pickled object's representation are ignored.
388
389   .. method:: persistent_load(pid)
390
391      Raise an :exc:`UnpicklingError` by default.
392
393      If defined, :meth:`persistent_load` should return the object specified by
394      the persistent ID *pid*.  If an invalid persistent ID is encountered, an
395      :exc:`UnpicklingError` should be raised.
396
397      See :ref:`pickle-persistent` for details and examples of uses.
398
399   .. method:: find_class(module, name)
400
401      Import *module* if necessary and return the object called *name* from it,
402      where the *module* and *name* arguments are :class:`str` objects.  Note,
403      unlike its name suggests, :meth:`find_class` is also used for finding
404      functions.
405
406      Subclasses may override this to gain control over what type of objects and
407      how they can be loaded, potentially reducing security risks. Refer to
408      :ref:`pickle-restrict` for details.
409
410
411.. _pickle-picklable:
412
413What can be pickled and unpickled?
414----------------------------------
415
416The following types can be pickled:
417
418* ``None``, ``True``, and ``False``
419
420* integers, floating point numbers, complex numbers
421
422* strings, bytes, bytearrays
423
424* tuples, lists, sets, and dictionaries containing only picklable objects
425
426* functions defined at the top level of a module (using :keyword:`def`, not
427  :keyword:`lambda`)
428
429* built-in functions defined at the top level of a module
430
431* classes that are defined at the top level of a module
432
433* instances of such classes whose :attr:`~object.__dict__` or the result of
434  calling :meth:`__getstate__` is picklable  (see section :ref:`pickle-inst` for
435  details).
436
437Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
438exception; when this happens, an unspecified number of bytes may have already
439been written to the underlying file.  Trying to pickle a highly recursive data
440structure may exceed the maximum recursion depth, a :exc:`RecursionError` will be
441raised in this case.  You can carefully raise this limit with
442:func:`sys.setrecursionlimit`.
443
444Note that functions (built-in and user-defined) are pickled by "fully qualified"
445name reference, not by value. [#]_  This means that only the function name is
446pickled, along with the name of the module the function is defined in.  Neither
447the function's code, nor any of its function attributes are pickled.  Thus the
448defining module must be importable in the unpickling environment, and the module
449must contain the named object, otherwise an exception will be raised. [#]_
450
451Similarly, classes are pickled by named reference, so the same restrictions in
452the unpickling environment apply.  Note that none of the class's code or data is
453pickled, so in the following example the class attribute ``attr`` is not
454restored in the unpickling environment::
455
456   class Foo:
457       attr = 'A class attribute'
458
459   picklestring = pickle.dumps(Foo)
460
461These restrictions are why picklable functions and classes must be defined in
462the top level of a module.
463
464Similarly, when class instances are pickled, their class's code and data are not
465pickled along with them.  Only the instance data are pickled.  This is done on
466purpose, so you can fix bugs in a class or add methods to the class and still
467load objects that were created with an earlier version of the class.  If you
468plan to have long-lived objects that will see many versions of a class, it may
469be worthwhile to put a version number in the objects so that suitable
470conversions can be made by the class's :meth:`__setstate__` method.
471
472
473.. _pickle-inst:
474
475Pickling Class Instances
476------------------------
477
478.. currentmodule:: None
479
480In this section, we describe the general mechanisms available to you to define,
481customize, and control how class instances are pickled and unpickled.
482
483In most cases, no additional code is needed to make instances picklable.  By
484default, pickle will retrieve the class and the attributes of an instance via
485introspection. When a class instance is unpickled, its :meth:`__init__` method
486is usually *not* invoked.  The default behaviour first creates an uninitialized
487instance and then restores the saved attributes.  The following code shows an
488implementation of this behaviour::
489
490   def save(obj):
491       return (obj.__class__, obj.__dict__)
492
493   def load(cls, attributes):
494       obj = cls.__new__(cls)
495       obj.__dict__.update(attributes)
496       return obj
497
498Classes can alter the default behaviour by providing one or several special
499methods:
500
501.. method:: object.__getnewargs_ex__()
502
503   In protocols 2 and newer, classes that implements the
504   :meth:`__getnewargs_ex__` method can dictate the values passed to the
505   :meth:`__new__` method upon unpickling.  The method must return a pair
506   ``(args, kwargs)`` where *args* is a tuple of positional arguments
507   and *kwargs* a dictionary of named arguments for constructing the
508   object.  Those will be passed to the :meth:`__new__` method upon
509   unpickling.
510
511   You should implement this method if the :meth:`__new__` method of your
512   class requires keyword-only arguments.  Otherwise, it is recommended for
513   compatibility to implement :meth:`__getnewargs__`.
514
515   .. versionchanged:: 3.6
516      :meth:`__getnewargs_ex__` is now used in protocols 2 and 3.
517
518
519.. method:: object.__getnewargs__()
520
521   This method serves a similar purpose as :meth:`__getnewargs_ex__`, but
522   supports only positional arguments.  It must return a tuple of arguments
523   ``args`` which will be passed to the :meth:`__new__` method upon unpickling.
524
525   :meth:`__getnewargs__` will not be called if :meth:`__getnewargs_ex__` is
526   defined.
527
528   .. versionchanged:: 3.6
529      Before Python 3.6, :meth:`__getnewargs__` was called instead of
530      :meth:`__getnewargs_ex__` in protocols 2 and 3.
531
532
533.. method:: object.__getstate__()
534
535   Classes can further influence how their instances are pickled; if the class
536   defines the method :meth:`__getstate__`, it is called and the returned object
537   is pickled as the contents for the instance, instead of the contents of the
538   instance's dictionary.  If the :meth:`__getstate__` method is absent, the
539   instance's :attr:`~object.__dict__` is pickled as usual.
540
541
542.. method:: object.__setstate__(state)
543
544   Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
545   the unpickled state.  In that case, there is no requirement for the state
546   object to be a dictionary.  Otherwise, the pickled state must be a dictionary
547   and its items are assigned to the new instance's dictionary.
548
549   .. note::
550
551      If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
552      method will not be called upon unpickling.
553
554
555Refer to the section :ref:`pickle-state` for more information about how to use
556the methods :meth:`__getstate__` and :meth:`__setstate__`.
557
558.. note::
559
560   At unpickling time, some methods like :meth:`__getattr__`,
561   :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
562   instance.  In case those methods rely on some internal invariant being
563   true, the type should implement :meth:`__getnewargs__` or
564   :meth:`__getnewargs_ex__` to establish such an invariant; otherwise,
565   neither :meth:`__new__` nor :meth:`__init__` will be called.
566
567.. index:: pair: copy; protocol
568
569As we shall see, pickle does not use directly the methods described above.  In
570fact, these methods are part of the copy protocol which implements the
571:meth:`__reduce__` special method.  The copy protocol provides a unified
572interface for retrieving the data necessary for pickling and copying
573objects. [#]_
574
575Although powerful, implementing :meth:`__reduce__` directly in your classes is
576error prone.  For this reason, class designers should use the high-level
577interface (i.e., :meth:`__getnewargs_ex__`, :meth:`__getstate__` and
578:meth:`__setstate__`) whenever possible.  We will show, however, cases where
579using :meth:`__reduce__` is the only option or leads to more efficient pickling
580or both.
581
582.. method:: object.__reduce__()
583
584   The interface is currently defined as follows.  The :meth:`__reduce__` method
585   takes no argument and shall return either a string or preferably a tuple (the
586   returned object is often referred to as the "reduce value").
587
588   If a string is returned, the string should be interpreted as the name of a
589   global variable.  It should be the object's local name relative to its
590   module; the pickle module searches the module namespace to determine the
591   object's module.  This behaviour is typically useful for singletons.
592
593   When a tuple is returned, it must be between two and five items long.
594   Optional items can either be omitted, or ``None`` can be provided as their
595   value.  The semantics of each item are in order:
596
597   .. XXX Mention __newobj__ special-case?
598
599   * A callable object that will be called to create the initial version of the
600     object.
601
602   * A tuple of arguments for the callable object.  An empty tuple must be given
603     if the callable does not accept any argument.
604
605   * Optionally, the object's state, which will be passed to the object's
606     :meth:`__setstate__` method as previously described.  If the object has no
607     such method then, the value must be a dictionary and it will be added to
608     the object's :attr:`~object.__dict__` attribute.
609
610   * Optionally, an iterator (and not a sequence) yielding successive items.
611     These items will be appended to the object either using
612     ``obj.append(item)`` or, in batch, using ``obj.extend(list_of_items)``.
613     This is primarily used for list subclasses, but may be used by other
614     classes as long as they have :meth:`append` and :meth:`extend` methods with
615     the appropriate signature.  (Whether :meth:`append` or :meth:`extend` is
616     used depends on which pickle protocol version is used as well as the number
617     of items to append, so both must be supported.)
618
619   * Optionally, an iterator (not a sequence) yielding successive key-value
620     pairs.  These items will be stored to the object using ``obj[key] =
621     value``.  This is primarily used for dictionary subclasses, but may be used
622     by other classes as long as they implement :meth:`__setitem__`.
623
624
625.. method:: object.__reduce_ex__(protocol)
626
627   Alternatively, a :meth:`__reduce_ex__` method may be defined.  The only
628   difference is this method should take a single integer argument, the protocol
629   version.  When defined, pickle will prefer it over the :meth:`__reduce__`
630   method.  In addition, :meth:`__reduce__` automatically becomes a synonym for
631   the extended version.  The main use for this method is to provide
632   backwards-compatible reduce values for older Python releases.
633
634.. currentmodule:: pickle
635
636.. _pickle-persistent:
637
638Persistence of External Objects
639^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
640
641.. index::
642   single: persistent_id (pickle protocol)
643   single: persistent_load (pickle protocol)
644
645For the benefit of object persistence, the :mod:`pickle` module supports the
646notion of a reference to an object outside the pickled data stream.  Such
647objects are referenced by a persistent ID, which should be either a string of
648alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
649any newer protocol).
650
651The resolution of such persistent IDs is not defined by the :mod:`pickle`
652module; it will delegate this resolution to the user defined methods on the
653pickler and unpickler, :meth:`~Pickler.persistent_id` and
654:meth:`~Unpickler.persistent_load` respectively.
655
656To pickle objects that have an external persistent id, the pickler must have a
657custom :meth:`~Pickler.persistent_id` method that takes an object as an
658argument and returns either ``None`` or the persistent id for that object.
659When ``None`` is returned, the pickler simply pickles the object as normal.
660When a persistent ID string is returned, the pickler will pickle that object,
661along with a marker so that the unpickler will recognize it as a persistent ID.
662
663To unpickle external objects, the unpickler must have a custom
664:meth:`~Unpickler.persistent_load` method that takes a persistent ID object and
665returns the referenced object.
666
667Here is a comprehensive example presenting how persistent ID can be used to
668pickle external objects by reference.
669
670.. literalinclude:: ../includes/dbpickle.py
671
672.. _pickle-dispatch:
673
674Dispatch Tables
675^^^^^^^^^^^^^^^
676
677If one wants to customize pickling of some classes without disturbing
678any other code which depends on pickling, then one can create a
679pickler with a private dispatch table.
680
681The global dispatch table managed by the :mod:`copyreg` module is
682available as :data:`copyreg.dispatch_table`.  Therefore, one may
683choose to use a modified copy of :data:`copyreg.dispatch_table` as a
684private dispatch table.
685
686For example ::
687
688   f = io.BytesIO()
689   p = pickle.Pickler(f)
690   p.dispatch_table = copyreg.dispatch_table.copy()
691   p.dispatch_table[SomeClass] = reduce_SomeClass
692
693creates an instance of :class:`pickle.Pickler` with a private dispatch
694table which handles the ``SomeClass`` class specially.  Alternatively,
695the code ::
696
697   class MyPickler(pickle.Pickler):
698       dispatch_table = copyreg.dispatch_table.copy()
699       dispatch_table[SomeClass] = reduce_SomeClass
700   f = io.BytesIO()
701   p = MyPickler(f)
702
703does the same, but all instances of ``MyPickler`` will by default
704share the same dispatch table.  The equivalent code using the
705:mod:`copyreg` module is ::
706
707   copyreg.pickle(SomeClass, reduce_SomeClass)
708   f = io.BytesIO()
709   p = pickle.Pickler(f)
710
711.. _pickle-state:
712
713Handling Stateful Objects
714^^^^^^^^^^^^^^^^^^^^^^^^^
715
716.. index::
717   single: __getstate__() (copy protocol)
718   single: __setstate__() (copy protocol)
719
720Here's an example that shows how to modify pickling behavior for a class.
721The :class:`TextReader` class opens a text file, and returns the line number and
722line contents each time its :meth:`!readline` method is called. If a
723:class:`TextReader` instance is pickled, all attributes *except* the file object
724member are saved. When the instance is unpickled, the file is reopened, and
725reading resumes from the last location. The :meth:`__setstate__` and
726:meth:`__getstate__` methods are used to implement this behavior. ::
727
728   class TextReader:
729       """Print and number lines in a text file."""
730
731       def __init__(self, filename):
732           self.filename = filename
733           self.file = open(filename)
734           self.lineno = 0
735
736       def readline(self):
737           self.lineno += 1
738           line = self.file.readline()
739           if not line:
740               return None
741           if line.endswith('\n'):
742               line = line[:-1]
743           return "%i: %s" % (self.lineno, line)
744
745       def __getstate__(self):
746           # Copy the object's state from self.__dict__ which contains
747           # all our instance attributes. Always use the dict.copy()
748           # method to avoid modifying the original state.
749           state = self.__dict__.copy()
750           # Remove the unpicklable entries.
751           del state['file']
752           return state
753
754       def __setstate__(self, state):
755           # Restore instance attributes (i.e., filename and lineno).
756           self.__dict__.update(state)
757           # Restore the previously opened file's state. To do so, we need to
758           # reopen it and read from it until the line count is restored.
759           file = open(self.filename)
760           for _ in range(self.lineno):
761               file.readline()
762           # Finally, save the file.
763           self.file = file
764
765
766A sample usage might be something like this::
767
768   >>> reader = TextReader("hello.txt")
769   >>> reader.readline()
770   '1: Hello world!'
771   >>> reader.readline()
772   '2: I am line number two.'
773   >>> new_reader = pickle.loads(pickle.dumps(reader))
774   >>> new_reader.readline()
775   '3: Goodbye!'
776
777
778.. _pickle-restrict:
779
780Restricting Globals
781-------------------
782
783.. index::
784   single: find_class() (pickle protocol)
785
786By default, unpickling will import any class or function that it finds in the
787pickle data.  For many applications, this behaviour is unacceptable as it
788permits the unpickler to import and invoke arbitrary code.  Just consider what
789this hand-crafted pickle data stream does when loaded::
790
791    >>> import pickle
792    >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
793    hello world
794    0
795
796In this example, the unpickler imports the :func:`os.system` function and then
797apply the string argument "echo hello world".  Although this example is
798inoffensive, it is not difficult to imagine one that could damage your system.
799
800For this reason, you may want to control what gets unpickled by customizing
801:meth:`Unpickler.find_class`.  Unlike its name suggests,
802:meth:`Unpickler.find_class` is called whenever a global (i.e., a class or
803a function) is requested.  Thus it is possible to either completely forbid
804globals or restrict them to a safe subset.
805
806Here is an example of an unpickler allowing only few safe classes from the
807:mod:`builtins` module to be loaded::
808
809   import builtins
810   import io
811   import pickle
812
813   safe_builtins = {
814       'range',
815       'complex',
816       'set',
817       'frozenset',
818       'slice',
819   }
820
821   class RestrictedUnpickler(pickle.Unpickler):
822
823       def find_class(self, module, name):
824           # Only allow safe classes from builtins.
825           if module == "builtins" and name in safe_builtins:
826               return getattr(builtins, name)
827           # Forbid everything else.
828           raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
829                                        (module, name))
830
831   def restricted_loads(s):
832       """Helper function analogous to pickle.loads()."""
833       return RestrictedUnpickler(io.BytesIO(s)).load()
834
835A sample usage of our unpickler working has intended::
836
837    >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
838    [1, 2, range(0, 15)]
839    >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
840    Traceback (most recent call last):
841      ...
842    pickle.UnpicklingError: global 'os.system' is forbidden
843    >>> restricted_loads(b'cbuiltins\neval\n'
844    ...                  b'(S\'getattr(__import__("os"), "system")'
845    ...                  b'("echo hello world")\'\ntR.')
846    Traceback (most recent call last):
847      ...
848    pickle.UnpicklingError: global 'builtins.eval' is forbidden
849
850
851.. XXX Add note about how extension codes could evade our protection
852   mechanism (e.g. cached classes do not invokes find_class()).
853
854As our examples shows, you have to be careful with what you allow to be
855unpickled.  Therefore if security is a concern, you may want to consider
856alternatives such as the marshalling API in :mod:`xmlrpc.client` or
857third-party solutions.
858
859
860Performance
861-----------
862
863Recent versions of the pickle protocol (from protocol 2 and upwards) feature
864efficient binary encodings for several common features and built-in types.
865Also, the :mod:`pickle` module has a transparent optimizer written in C.
866
867
868.. _pickle-example:
869
870Examples
871--------
872
873For the simplest code, use the :func:`dump` and :func:`load` functions. ::
874
875   import pickle
876
877   # An arbitrary collection of objects supported by pickle.
878   data = {
879       'a': [1, 2.0, 3, 4+6j],
880       'b': ("character string", b"byte string"),
881       'c': {None, True, False}
882   }
883
884   with open('data.pickle', 'wb') as f:
885       # Pickle the 'data' dictionary using the highest protocol available.
886       pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
887
888
889The following example reads the resulting pickled data. ::
890
891   import pickle
892
893   with open('data.pickle', 'rb') as f:
894       # The protocol version used is detected automatically, so we do not
895       # have to specify it.
896       data = pickle.load(f)
897
898
899.. XXX: Add examples showing how to optimize pickles for size (like using
900.. pickletools.optimize() or the gzip module).
901
902
903.. seealso::
904
905   Module :mod:`copyreg`
906      Pickle interface constructor registration for extension types.
907
908   Module :mod:`pickletools`
909      Tools for working with and analyzing pickled data.
910
911   Module :mod:`shelve`
912      Indexed databases of objects; uses :mod:`pickle`.
913
914   Module :mod:`copy`
915      Shallow and deep object copying.
916
917   Module :mod:`marshal`
918      High-performance serialization of built-in types.
919
920
921.. rubric:: Footnotes
922
923.. [#] Don't confuse this with the :mod:`marshal` module
924
925.. [#] This is why :keyword:`lambda` functions cannot be pickled:  all
926    :keyword:`!lambda` functions share the same name:  ``<lambda>``.
927
928.. [#] The exception raised will likely be an :exc:`ImportError` or an
929   :exc:`AttributeError` but it could be something else.
930
931.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
932   operations.
933
934.. [#] The limitation on alphanumeric characters is due to the fact
935   the persistent IDs, in protocol 0, are delimited by the newline
936   character.  Therefore if any kind of newline characters occurs in
937   persistent IDs, the resulting pickle will become unreadable.
938