1:mod:`pickle` --- Python object serialization 2============================================= 3 4.. module:: pickle 5 :synopsis: Convert Python objects to streams of bytes and back. 6 7.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>. 8.. sectionauthor:: Barry Warsaw <barry@python.org> 9 10**Source code:** :source:`Lib/pickle.py` 11 12.. index:: 13 single: persistence 14 pair: persistent; objects 15 pair: serializing; objects 16 pair: marshalling; objects 17 pair: flattening; objects 18 pair: pickling; objects 19 20-------------- 21 22The :mod:`pickle` module implements binary protocols for serializing and 23de-serializing a Python object structure. *"Pickling"* is the process 24whereby a Python object hierarchy is converted into a byte stream, and 25*"unpickling"* is the inverse operation, whereby a byte stream 26(from a :term:`binary file` or :term:`bytes-like object`) is converted 27back into an object hierarchy. Pickling (and unpickling) is alternatively 28known as "serialization", "marshalling," [#]_ or "flattening"; however, to 29avoid confusion, the terms used here are "pickling" and "unpickling". 30 31.. warning:: 32 33 The :mod:`pickle` module is not secure against erroneous or maliciously 34 constructed data. Never unpickle data received from an untrusted or 35 unauthenticated source. 36 37 38Relationship to other Python modules 39------------------------------------ 40 41Comparison with ``marshal`` 42^^^^^^^^^^^^^^^^^^^^^^^^^^^ 43 44Python has a more primitive serialization module called :mod:`marshal`, but in 45general :mod:`pickle` should always be the preferred way to serialize Python 46objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc` 47files. 48 49The :mod:`pickle` module differs from :mod:`marshal` in several significant ways: 50 51* The :mod:`pickle` module keeps track of the objects it has already serialized, 52 so that later references to the same object won't be serialized again. 53 :mod:`marshal` doesn't do this. 54 55 This has implications both for recursive objects and object sharing. Recursive 56 objects are objects that contain references to themselves. These are not 57 handled by marshal, and in fact, attempting to marshal recursive objects will 58 crash your Python interpreter. Object sharing happens when there are multiple 59 references to the same object in different places in the object hierarchy being 60 serialized. :mod:`pickle` stores such objects only once, and ensures that all 61 other references point to the master copy. Shared objects remain shared, which 62 can be very important for mutable objects. 63 64* :mod:`marshal` cannot be used to serialize user-defined classes and their 65 instances. :mod:`pickle` can save and restore class instances transparently, 66 however the class definition must be importable and live in the same module as 67 when the object was stored. 68 69* The :mod:`marshal` serialization format is not guaranteed to be portable 70 across Python versions. Because its primary job in life is to support 71 :file:`.pyc` files, the Python implementers reserve the right to change the 72 serialization format in non-backwards compatible ways should the need arise. 73 The :mod:`pickle` serialization format is guaranteed to be backwards compatible 74 across Python releases provided a compatible pickle protocol is chosen and 75 pickling and unpickling code deals with Python 2 to Python 3 type differences 76 if your data is crossing that unique breaking change language boundary. 77 78Comparison with ``json`` 79^^^^^^^^^^^^^^^^^^^^^^^^ 80 81There are fundamental differences between the pickle protocols and 82`JSON (JavaScript Object Notation) <http://json.org>`_: 83 84* JSON is a text serialization format (it outputs unicode text, although 85 most of the time it is then encoded to ``utf-8``), while pickle is 86 a binary serialization format; 87 88* JSON is human-readable, while pickle is not; 89 90* JSON is interoperable and widely used outside of the Python ecosystem, 91 while pickle is Python-specific; 92 93* JSON, by default, can only represent a subset of the Python built-in 94 types, and no custom classes; pickle can represent an extremely large 95 number of Python types (many of them automatically, by clever usage 96 of Python's introspection facilities; complex cases can be tackled by 97 implementing :ref:`specific object APIs <pickle-inst>`). 98 99.. seealso:: 100 The :mod:`json` module: a standard library module allowing JSON 101 serialization and deserialization. 102 103 104.. _pickle-protocols: 105 106Data stream format 107------------------ 108 109.. index:: 110 single: External Data Representation 111 112The data format used by :mod:`pickle` is Python-specific. This has the 113advantage that there are no restrictions imposed by external standards such as 114JSON or XDR (which can't represent pointer sharing); however it means that 115non-Python programs may not be able to reconstruct pickled Python objects. 116 117By default, the :mod:`pickle` data format uses a relatively compact binary 118representation. If you need optimal size characteristics, you can efficiently 119:doc:`compress <archiving>` pickled data. 120 121The module :mod:`pickletools` contains tools for analyzing data streams 122generated by :mod:`pickle`. :mod:`pickletools` source code has extensive 123comments about opcodes used by pickle protocols. 124 125There are currently 5 different protocols which can be used for pickling. 126The higher the protocol used, the more recent the version of Python needed 127to read the pickle produced. 128 129* Protocol version 0 is the original "human-readable" protocol and is 130 backwards compatible with earlier versions of Python. 131 132* Protocol version 1 is an old binary format which is also compatible with 133 earlier versions of Python. 134 135* Protocol version 2 was introduced in Python 2.3. It provides much more 136 efficient pickling of :term:`new-style class`\es. Refer to :pep:`307` for 137 information about improvements brought by protocol 2. 138 139* Protocol version 3 was added in Python 3.0. It has explicit support for 140 :class:`bytes` objects and cannot be unpickled by Python 2.x. This is 141 the default protocol, and the recommended protocol when compatibility with 142 other Python 3 versions is required. 143 144* Protocol version 4 was added in Python 3.4. It adds support for very large 145 objects, pickling more kinds of objects, and some data format 146 optimizations. Refer to :pep:`3154` for information about improvements 147 brought by protocol 4. 148 149.. note:: 150 Serialization is a more primitive notion than persistence; although 151 :mod:`pickle` reads and writes file objects, it does not handle the issue of 152 naming persistent objects, nor the (even more complicated) issue of concurrent 153 access to persistent objects. The :mod:`pickle` module can transform a complex 154 object into a byte stream and it can transform the byte stream into an object 155 with the same internal structure. Perhaps the most obvious thing to do with 156 these byte streams is to write them onto a file, but it is also conceivable to 157 send them across a network or store them in a database. The :mod:`shelve` 158 module provides a simple interface to pickle and unpickle objects on 159 DBM-style database files. 160 161 162Module Interface 163---------------- 164 165To serialize an object hierarchy, you simply call the :func:`dumps` function. 166Similarly, to de-serialize a data stream, you call the :func:`loads` function. 167However, if you want more control over serialization and de-serialization, 168you can create a :class:`Pickler` or an :class:`Unpickler` object, respectively. 169 170The :mod:`pickle` module provides the following constants: 171 172 173.. data:: HIGHEST_PROTOCOL 174 175 An integer, the highest :ref:`protocol version <pickle-protocols>` 176 available. This value can be passed as a *protocol* value to functions 177 :func:`dump` and :func:`dumps` as well as the :class:`Pickler` 178 constructor. 179 180.. data:: DEFAULT_PROTOCOL 181 182 An integer, the default :ref:`protocol version <pickle-protocols>` used 183 for pickling. May be less than :data:`HIGHEST_PROTOCOL`. Currently the 184 default protocol is 3, a new protocol designed for Python 3. 185 186 187The :mod:`pickle` module provides the following functions to make the pickling 188process more convenient: 189 190.. function:: dump(obj, file, protocol=None, \*, fix_imports=True) 191 192 Write a pickled representation of *obj* to the open :term:`file object` *file*. 193 This is equivalent to ``Pickler(file, protocol).dump(obj)``. 194 195 The optional *protocol* argument, an integer, tells the pickler to use 196 the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`. 197 If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative 198 number is specified, :data:`HIGHEST_PROTOCOL` is selected. 199 200 The *file* argument must have a write() method that accepts a single bytes 201 argument. It can thus be an on-disk file opened for binary writing, an 202 :class:`io.BytesIO` instance, or any other custom object that meets this 203 interface. 204 205 If *fix_imports* is true and *protocol* is less than 3, pickle will try to 206 map the new Python 3 names to the old module names used in Python 2, so 207 that the pickle data stream is readable with Python 2. 208 209.. function:: dumps(obj, protocol=None, \*, fix_imports=True) 210 211 Return the pickled representation of the object as a :class:`bytes` object, 212 instead of writing it to a file. 213 214 Arguments *protocol* and *fix_imports* have the same meaning as in 215 :func:`dump`. 216 217.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict") 218 219 Read a pickled object representation from the open :term:`file object` 220 *file* and return the reconstituted object hierarchy specified therein. 221 This is equivalent to ``Unpickler(file).load()``. 222 223 The protocol version of the pickle is detected automatically, so no 224 protocol argument is needed. Bytes past the pickled object's 225 representation are ignored. 226 227 The argument *file* must have two methods, a read() method that takes an 228 integer argument, and a readline() method that requires no arguments. Both 229 methods should return bytes. Thus *file* can be an on-disk file opened for 230 binary reading, an :class:`io.BytesIO` object, or any other custom object 231 that meets this interface. 232 233 Optional keyword arguments are *fix_imports*, *encoding* and *errors*, 234 which are used to control compatibility support for pickle stream generated 235 by Python 2. If *fix_imports* is true, pickle will try to map the old 236 Python 2 names to the new names used in Python 3. The *encoding* and 237 *errors* tell pickle how to decode 8-bit string instances pickled by Python 238 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can 239 be 'bytes' to read these 8-bit string instances as bytes objects. 240 Using ``encoding='latin1'`` is required for unpickling NumPy arrays and 241 instances of :class:`~datetime.datetime`, :class:`~datetime.date` and 242 :class:`~datetime.time` pickled by Python 2. 243 244.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict") 245 246 Read a pickled object hierarchy from a :class:`bytes` object and return the 247 reconstituted object hierarchy specified therein. 248 249 The protocol version of the pickle is detected automatically, so no 250 protocol argument is needed. Bytes past the pickled object's 251 representation are ignored. 252 253 Optional keyword arguments are *fix_imports*, *encoding* and *errors*, 254 which are used to control compatibility support for pickle stream generated 255 by Python 2. If *fix_imports* is true, pickle will try to map the old 256 Python 2 names to the new names used in Python 3. The *encoding* and 257 *errors* tell pickle how to decode 8-bit string instances pickled by Python 258 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can 259 be 'bytes' to read these 8-bit string instances as bytes objects. 260 Using ``encoding='latin1'`` is required for unpickling NumPy arrays and 261 instances of :class:`~datetime.datetime`, :class:`~datetime.date` and 262 :class:`~datetime.time` pickled by Python 2. 263 264 265The :mod:`pickle` module defines three exceptions: 266 267.. exception:: PickleError 268 269 Common base class for the other pickling exceptions. It inherits 270 :exc:`Exception`. 271 272.. exception:: PicklingError 273 274 Error raised when an unpicklable object is encountered by :class:`Pickler`. 275 It inherits :exc:`PickleError`. 276 277 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be 278 pickled. 279 280.. exception:: UnpicklingError 281 282 Error raised when there is a problem unpickling an object, such as a data 283 corruption or a security violation. It inherits :exc:`PickleError`. 284 285 Note that other exceptions may also be raised during unpickling, including 286 (but not necessarily limited to) AttributeError, EOFError, ImportError, and 287 IndexError. 288 289 290The :mod:`pickle` module exports two classes, :class:`Pickler` and 291:class:`Unpickler`: 292 293.. class:: Pickler(file, protocol=None, \*, fix_imports=True) 294 295 This takes a binary file for writing a pickle data stream. 296 297 The optional *protocol* argument, an integer, tells the pickler to use 298 the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`. 299 If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative 300 number is specified, :data:`HIGHEST_PROTOCOL` is selected. 301 302 The *file* argument must have a write() method that accepts a single bytes 303 argument. It can thus be an on-disk file opened for binary writing, an 304 :class:`io.BytesIO` instance, or any other custom object that meets this 305 interface. 306 307 If *fix_imports* is true and *protocol* is less than 3, pickle will try to 308 map the new Python 3 names to the old module names used in Python 2, so 309 that the pickle data stream is readable with Python 2. 310 311 .. method:: dump(obj) 312 313 Write a pickled representation of *obj* to the open file object given in 314 the constructor. 315 316 .. method:: persistent_id(obj) 317 318 Do nothing by default. This exists so a subclass can override it. 319 320 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any 321 other value causes :class:`Pickler` to emit the returned value as a 322 persistent ID for *obj*. The meaning of this persistent ID should be 323 defined by :meth:`Unpickler.persistent_load`. Note that the value 324 returned by :meth:`persistent_id` cannot itself have a persistent ID. 325 326 See :ref:`pickle-persistent` for details and examples of uses. 327 328 .. attribute:: dispatch_table 329 330 A pickler object's dispatch table is a registry of *reduction 331 functions* of the kind which can be declared using 332 :func:`copyreg.pickle`. It is a mapping whose keys are classes 333 and whose values are reduction functions. A reduction function 334 takes a single argument of the associated class and should 335 conform to the same interface as a :meth:`__reduce__` 336 method. 337 338 By default, a pickler object will not have a 339 :attr:`dispatch_table` attribute, and it will instead use the 340 global dispatch table managed by the :mod:`copyreg` module. 341 However, to customize the pickling for a specific pickler object 342 one can set the :attr:`dispatch_table` attribute to a dict-like 343 object. Alternatively, if a subclass of :class:`Pickler` has a 344 :attr:`dispatch_table` attribute then this will be used as the 345 default dispatch table for instances of that class. 346 347 See :ref:`pickle-dispatch` for usage examples. 348 349 .. versionadded:: 3.3 350 351 .. attribute:: fast 352 353 Deprecated. Enable fast mode if set to a true value. The fast mode 354 disables the usage of memo, therefore speeding the pickling process by not 355 generating superfluous PUT opcodes. It should not be used with 356 self-referential objects, doing otherwise will cause :class:`Pickler` to 357 recurse infinitely. 358 359 Use :func:`pickletools.optimize` if you need more compact pickles. 360 361 362.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict") 363 364 This takes a binary file for reading a pickle data stream. 365 366 The protocol version of the pickle is detected automatically, so no 367 protocol argument is needed. 368 369 The argument *file* must have two methods, a read() method that takes an 370 integer argument, and a readline() method that requires no arguments. Both 371 methods should return bytes. Thus *file* can be an on-disk file object 372 opened for binary reading, an :class:`io.BytesIO` object, or any other 373 custom object that meets this interface. 374 375 Optional keyword arguments are *fix_imports*, *encoding* and *errors*, 376 which are used to control compatibility support for pickle stream generated 377 by Python 2. If *fix_imports* is true, pickle will try to map the old 378 Python 2 names to the new names used in Python 3. The *encoding* and 379 *errors* tell pickle how to decode 8-bit string instances pickled by Python 380 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can 381 be 'bytes' to read these 8-bit string instances as bytes objects. 382 383 .. method:: load() 384 385 Read a pickled object representation from the open file object given in 386 the constructor, and return the reconstituted object hierarchy specified 387 therein. Bytes past the pickled object's representation are ignored. 388 389 .. method:: persistent_load(pid) 390 391 Raise an :exc:`UnpicklingError` by default. 392 393 If defined, :meth:`persistent_load` should return the object specified by 394 the persistent ID *pid*. If an invalid persistent ID is encountered, an 395 :exc:`UnpicklingError` should be raised. 396 397 See :ref:`pickle-persistent` for details and examples of uses. 398 399 .. method:: find_class(module, name) 400 401 Import *module* if necessary and return the object called *name* from it, 402 where the *module* and *name* arguments are :class:`str` objects. Note, 403 unlike its name suggests, :meth:`find_class` is also used for finding 404 functions. 405 406 Subclasses may override this to gain control over what type of objects and 407 how they can be loaded, potentially reducing security risks. Refer to 408 :ref:`pickle-restrict` for details. 409 410 411.. _pickle-picklable: 412 413What can be pickled and unpickled? 414---------------------------------- 415 416The following types can be pickled: 417 418* ``None``, ``True``, and ``False`` 419 420* integers, floating point numbers, complex numbers 421 422* strings, bytes, bytearrays 423 424* tuples, lists, sets, and dictionaries containing only picklable objects 425 426* functions defined at the top level of a module (using :keyword:`def`, not 427 :keyword:`lambda`) 428 429* built-in functions defined at the top level of a module 430 431* classes that are defined at the top level of a module 432 433* instances of such classes whose :attr:`~object.__dict__` or the result of 434 calling :meth:`__getstate__` is picklable (see section :ref:`pickle-inst` for 435 details). 436 437Attempts to pickle unpicklable objects will raise the :exc:`PicklingError` 438exception; when this happens, an unspecified number of bytes may have already 439been written to the underlying file. Trying to pickle a highly recursive data 440structure may exceed the maximum recursion depth, a :exc:`RecursionError` will be 441raised in this case. You can carefully raise this limit with 442:func:`sys.setrecursionlimit`. 443 444Note that functions (built-in and user-defined) are pickled by "fully qualified" 445name reference, not by value. [#]_ This means that only the function name is 446pickled, along with the name of the module the function is defined in. Neither 447the function's code, nor any of its function attributes are pickled. Thus the 448defining module must be importable in the unpickling environment, and the module 449must contain the named object, otherwise an exception will be raised. [#]_ 450 451Similarly, classes are pickled by named reference, so the same restrictions in 452the unpickling environment apply. Note that none of the class's code or data is 453pickled, so in the following example the class attribute ``attr`` is not 454restored in the unpickling environment:: 455 456 class Foo: 457 attr = 'A class attribute' 458 459 picklestring = pickle.dumps(Foo) 460 461These restrictions are why picklable functions and classes must be defined in 462the top level of a module. 463 464Similarly, when class instances are pickled, their class's code and data are not 465pickled along with them. Only the instance data are pickled. This is done on 466purpose, so you can fix bugs in a class or add methods to the class and still 467load objects that were created with an earlier version of the class. If you 468plan to have long-lived objects that will see many versions of a class, it may 469be worthwhile to put a version number in the objects so that suitable 470conversions can be made by the class's :meth:`__setstate__` method. 471 472 473.. _pickle-inst: 474 475Pickling Class Instances 476------------------------ 477 478.. currentmodule:: None 479 480In this section, we describe the general mechanisms available to you to define, 481customize, and control how class instances are pickled and unpickled. 482 483In most cases, no additional code is needed to make instances picklable. By 484default, pickle will retrieve the class and the attributes of an instance via 485introspection. When a class instance is unpickled, its :meth:`__init__` method 486is usually *not* invoked. The default behaviour first creates an uninitialized 487instance and then restores the saved attributes. The following code shows an 488implementation of this behaviour:: 489 490 def save(obj): 491 return (obj.__class__, obj.__dict__) 492 493 def load(cls, attributes): 494 obj = cls.__new__(cls) 495 obj.__dict__.update(attributes) 496 return obj 497 498Classes can alter the default behaviour by providing one or several special 499methods: 500 501.. method:: object.__getnewargs_ex__() 502 503 In protocols 2 and newer, classes that implements the 504 :meth:`__getnewargs_ex__` method can dictate the values passed to the 505 :meth:`__new__` method upon unpickling. The method must return a pair 506 ``(args, kwargs)`` where *args* is a tuple of positional arguments 507 and *kwargs* a dictionary of named arguments for constructing the 508 object. Those will be passed to the :meth:`__new__` method upon 509 unpickling. 510 511 You should implement this method if the :meth:`__new__` method of your 512 class requires keyword-only arguments. Otherwise, it is recommended for 513 compatibility to implement :meth:`__getnewargs__`. 514 515 .. versionchanged:: 3.6 516 :meth:`__getnewargs_ex__` is now used in protocols 2 and 3. 517 518 519.. method:: object.__getnewargs__() 520 521 This method serves a similar purpose as :meth:`__getnewargs_ex__`, but 522 supports only positional arguments. It must return a tuple of arguments 523 ``args`` which will be passed to the :meth:`__new__` method upon unpickling. 524 525 :meth:`__getnewargs__` will not be called if :meth:`__getnewargs_ex__` is 526 defined. 527 528 .. versionchanged:: 3.6 529 Before Python 3.6, :meth:`__getnewargs__` was called instead of 530 :meth:`__getnewargs_ex__` in protocols 2 and 3. 531 532 533.. method:: object.__getstate__() 534 535 Classes can further influence how their instances are pickled; if the class 536 defines the method :meth:`__getstate__`, it is called and the returned object 537 is pickled as the contents for the instance, instead of the contents of the 538 instance's dictionary. If the :meth:`__getstate__` method is absent, the 539 instance's :attr:`~object.__dict__` is pickled as usual. 540 541 542.. method:: object.__setstate__(state) 543 544 Upon unpickling, if the class defines :meth:`__setstate__`, it is called with 545 the unpickled state. In that case, there is no requirement for the state 546 object to be a dictionary. Otherwise, the pickled state must be a dictionary 547 and its items are assigned to the new instance's dictionary. 548 549 .. note:: 550 551 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__` 552 method will not be called upon unpickling. 553 554 555Refer to the section :ref:`pickle-state` for more information about how to use 556the methods :meth:`__getstate__` and :meth:`__setstate__`. 557 558.. note:: 559 560 At unpickling time, some methods like :meth:`__getattr__`, 561 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the 562 instance. In case those methods rely on some internal invariant being 563 true, the type should implement :meth:`__getnewargs__` or 564 :meth:`__getnewargs_ex__` to establish such an invariant; otherwise, 565 neither :meth:`__new__` nor :meth:`__init__` will be called. 566 567.. index:: pair: copy; protocol 568 569As we shall see, pickle does not use directly the methods described above. In 570fact, these methods are part of the copy protocol which implements the 571:meth:`__reduce__` special method. The copy protocol provides a unified 572interface for retrieving the data necessary for pickling and copying 573objects. [#]_ 574 575Although powerful, implementing :meth:`__reduce__` directly in your classes is 576error prone. For this reason, class designers should use the high-level 577interface (i.e., :meth:`__getnewargs_ex__`, :meth:`__getstate__` and 578:meth:`__setstate__`) whenever possible. We will show, however, cases where 579using :meth:`__reduce__` is the only option or leads to more efficient pickling 580or both. 581 582.. method:: object.__reduce__() 583 584 The interface is currently defined as follows. The :meth:`__reduce__` method 585 takes no argument and shall return either a string or preferably a tuple (the 586 returned object is often referred to as the "reduce value"). 587 588 If a string is returned, the string should be interpreted as the name of a 589 global variable. It should be the object's local name relative to its 590 module; the pickle module searches the module namespace to determine the 591 object's module. This behaviour is typically useful for singletons. 592 593 When a tuple is returned, it must be between two and five items long. 594 Optional items can either be omitted, or ``None`` can be provided as their 595 value. The semantics of each item are in order: 596 597 .. XXX Mention __newobj__ special-case? 598 599 * A callable object that will be called to create the initial version of the 600 object. 601 602 * A tuple of arguments for the callable object. An empty tuple must be given 603 if the callable does not accept any argument. 604 605 * Optionally, the object's state, which will be passed to the object's 606 :meth:`__setstate__` method as previously described. If the object has no 607 such method then, the value must be a dictionary and it will be added to 608 the object's :attr:`~object.__dict__` attribute. 609 610 * Optionally, an iterator (and not a sequence) yielding successive items. 611 These items will be appended to the object either using 612 ``obj.append(item)`` or, in batch, using ``obj.extend(list_of_items)``. 613 This is primarily used for list subclasses, but may be used by other 614 classes as long as they have :meth:`append` and :meth:`extend` methods with 615 the appropriate signature. (Whether :meth:`append` or :meth:`extend` is 616 used depends on which pickle protocol version is used as well as the number 617 of items to append, so both must be supported.) 618 619 * Optionally, an iterator (not a sequence) yielding successive key-value 620 pairs. These items will be stored to the object using ``obj[key] = 621 value``. This is primarily used for dictionary subclasses, but may be used 622 by other classes as long as they implement :meth:`__setitem__`. 623 624 625.. method:: object.__reduce_ex__(protocol) 626 627 Alternatively, a :meth:`__reduce_ex__` method may be defined. The only 628 difference is this method should take a single integer argument, the protocol 629 version. When defined, pickle will prefer it over the :meth:`__reduce__` 630 method. In addition, :meth:`__reduce__` automatically becomes a synonym for 631 the extended version. The main use for this method is to provide 632 backwards-compatible reduce values for older Python releases. 633 634.. currentmodule:: pickle 635 636.. _pickle-persistent: 637 638Persistence of External Objects 639^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 640 641.. index:: 642 single: persistent_id (pickle protocol) 643 single: persistent_load (pickle protocol) 644 645For the benefit of object persistence, the :mod:`pickle` module supports the 646notion of a reference to an object outside the pickled data stream. Such 647objects are referenced by a persistent ID, which should be either a string of 648alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for 649any newer protocol). 650 651The resolution of such persistent IDs is not defined by the :mod:`pickle` 652module; it will delegate this resolution to the user defined methods on the 653pickler and unpickler, :meth:`~Pickler.persistent_id` and 654:meth:`~Unpickler.persistent_load` respectively. 655 656To pickle objects that have an external persistent id, the pickler must have a 657custom :meth:`~Pickler.persistent_id` method that takes an object as an 658argument and returns either ``None`` or the persistent id for that object. 659When ``None`` is returned, the pickler simply pickles the object as normal. 660When a persistent ID string is returned, the pickler will pickle that object, 661along with a marker so that the unpickler will recognize it as a persistent ID. 662 663To unpickle external objects, the unpickler must have a custom 664:meth:`~Unpickler.persistent_load` method that takes a persistent ID object and 665returns the referenced object. 666 667Here is a comprehensive example presenting how persistent ID can be used to 668pickle external objects by reference. 669 670.. literalinclude:: ../includes/dbpickle.py 671 672.. _pickle-dispatch: 673 674Dispatch Tables 675^^^^^^^^^^^^^^^ 676 677If one wants to customize pickling of some classes without disturbing 678any other code which depends on pickling, then one can create a 679pickler with a private dispatch table. 680 681The global dispatch table managed by the :mod:`copyreg` module is 682available as :data:`copyreg.dispatch_table`. Therefore, one may 683choose to use a modified copy of :data:`copyreg.dispatch_table` as a 684private dispatch table. 685 686For example :: 687 688 f = io.BytesIO() 689 p = pickle.Pickler(f) 690 p.dispatch_table = copyreg.dispatch_table.copy() 691 p.dispatch_table[SomeClass] = reduce_SomeClass 692 693creates an instance of :class:`pickle.Pickler` with a private dispatch 694table which handles the ``SomeClass`` class specially. Alternatively, 695the code :: 696 697 class MyPickler(pickle.Pickler): 698 dispatch_table = copyreg.dispatch_table.copy() 699 dispatch_table[SomeClass] = reduce_SomeClass 700 f = io.BytesIO() 701 p = MyPickler(f) 702 703does the same, but all instances of ``MyPickler`` will by default 704share the same dispatch table. The equivalent code using the 705:mod:`copyreg` module is :: 706 707 copyreg.pickle(SomeClass, reduce_SomeClass) 708 f = io.BytesIO() 709 p = pickle.Pickler(f) 710 711.. _pickle-state: 712 713Handling Stateful Objects 714^^^^^^^^^^^^^^^^^^^^^^^^^ 715 716.. index:: 717 single: __getstate__() (copy protocol) 718 single: __setstate__() (copy protocol) 719 720Here's an example that shows how to modify pickling behavior for a class. 721The :class:`TextReader` class opens a text file, and returns the line number and 722line contents each time its :meth:`!readline` method is called. If a 723:class:`TextReader` instance is pickled, all attributes *except* the file object 724member are saved. When the instance is unpickled, the file is reopened, and 725reading resumes from the last location. The :meth:`__setstate__` and 726:meth:`__getstate__` methods are used to implement this behavior. :: 727 728 class TextReader: 729 """Print and number lines in a text file.""" 730 731 def __init__(self, filename): 732 self.filename = filename 733 self.file = open(filename) 734 self.lineno = 0 735 736 def readline(self): 737 self.lineno += 1 738 line = self.file.readline() 739 if not line: 740 return None 741 if line.endswith('\n'): 742 line = line[:-1] 743 return "%i: %s" % (self.lineno, line) 744 745 def __getstate__(self): 746 # Copy the object's state from self.__dict__ which contains 747 # all our instance attributes. Always use the dict.copy() 748 # method to avoid modifying the original state. 749 state = self.__dict__.copy() 750 # Remove the unpicklable entries. 751 del state['file'] 752 return state 753 754 def __setstate__(self, state): 755 # Restore instance attributes (i.e., filename and lineno). 756 self.__dict__.update(state) 757 # Restore the previously opened file's state. To do so, we need to 758 # reopen it and read from it until the line count is restored. 759 file = open(self.filename) 760 for _ in range(self.lineno): 761 file.readline() 762 # Finally, save the file. 763 self.file = file 764 765 766A sample usage might be something like this:: 767 768 >>> reader = TextReader("hello.txt") 769 >>> reader.readline() 770 '1: Hello world!' 771 >>> reader.readline() 772 '2: I am line number two.' 773 >>> new_reader = pickle.loads(pickle.dumps(reader)) 774 >>> new_reader.readline() 775 '3: Goodbye!' 776 777 778.. _pickle-restrict: 779 780Restricting Globals 781------------------- 782 783.. index:: 784 single: find_class() (pickle protocol) 785 786By default, unpickling will import any class or function that it finds in the 787pickle data. For many applications, this behaviour is unacceptable as it 788permits the unpickler to import and invoke arbitrary code. Just consider what 789this hand-crafted pickle data stream does when loaded:: 790 791 >>> import pickle 792 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") 793 hello world 794 0 795 796In this example, the unpickler imports the :func:`os.system` function and then 797apply the string argument "echo hello world". Although this example is 798inoffensive, it is not difficult to imagine one that could damage your system. 799 800For this reason, you may want to control what gets unpickled by customizing 801:meth:`Unpickler.find_class`. Unlike its name suggests, 802:meth:`Unpickler.find_class` is called whenever a global (i.e., a class or 803a function) is requested. Thus it is possible to either completely forbid 804globals or restrict them to a safe subset. 805 806Here is an example of an unpickler allowing only few safe classes from the 807:mod:`builtins` module to be loaded:: 808 809 import builtins 810 import io 811 import pickle 812 813 safe_builtins = { 814 'range', 815 'complex', 816 'set', 817 'frozenset', 818 'slice', 819 } 820 821 class RestrictedUnpickler(pickle.Unpickler): 822 823 def find_class(self, module, name): 824 # Only allow safe classes from builtins. 825 if module == "builtins" and name in safe_builtins: 826 return getattr(builtins, name) 827 # Forbid everything else. 828 raise pickle.UnpicklingError("global '%s.%s' is forbidden" % 829 (module, name)) 830 831 def restricted_loads(s): 832 """Helper function analogous to pickle.loads().""" 833 return RestrictedUnpickler(io.BytesIO(s)).load() 834 835A sample usage of our unpickler working has intended:: 836 837 >>> restricted_loads(pickle.dumps([1, 2, range(15)])) 838 [1, 2, range(0, 15)] 839 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.") 840 Traceback (most recent call last): 841 ... 842 pickle.UnpicklingError: global 'os.system' is forbidden 843 >>> restricted_loads(b'cbuiltins\neval\n' 844 ... b'(S\'getattr(__import__("os"), "system")' 845 ... b'("echo hello world")\'\ntR.') 846 Traceback (most recent call last): 847 ... 848 pickle.UnpicklingError: global 'builtins.eval' is forbidden 849 850 851.. XXX Add note about how extension codes could evade our protection 852 mechanism (e.g. cached classes do not invokes find_class()). 853 854As our examples shows, you have to be careful with what you allow to be 855unpickled. Therefore if security is a concern, you may want to consider 856alternatives such as the marshalling API in :mod:`xmlrpc.client` or 857third-party solutions. 858 859 860Performance 861----------- 862 863Recent versions of the pickle protocol (from protocol 2 and upwards) feature 864efficient binary encodings for several common features and built-in types. 865Also, the :mod:`pickle` module has a transparent optimizer written in C. 866 867 868.. _pickle-example: 869 870Examples 871-------- 872 873For the simplest code, use the :func:`dump` and :func:`load` functions. :: 874 875 import pickle 876 877 # An arbitrary collection of objects supported by pickle. 878 data = { 879 'a': [1, 2.0, 3, 4+6j], 880 'b': ("character string", b"byte string"), 881 'c': {None, True, False} 882 } 883 884 with open('data.pickle', 'wb') as f: 885 # Pickle the 'data' dictionary using the highest protocol available. 886 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) 887 888 889The following example reads the resulting pickled data. :: 890 891 import pickle 892 893 with open('data.pickle', 'rb') as f: 894 # The protocol version used is detected automatically, so we do not 895 # have to specify it. 896 data = pickle.load(f) 897 898 899.. XXX: Add examples showing how to optimize pickles for size (like using 900.. pickletools.optimize() or the gzip module). 901 902 903.. seealso:: 904 905 Module :mod:`copyreg` 906 Pickle interface constructor registration for extension types. 907 908 Module :mod:`pickletools` 909 Tools for working with and analyzing pickled data. 910 911 Module :mod:`shelve` 912 Indexed databases of objects; uses :mod:`pickle`. 913 914 Module :mod:`copy` 915 Shallow and deep object copying. 916 917 Module :mod:`marshal` 918 High-performance serialization of built-in types. 919 920 921.. rubric:: Footnotes 922 923.. [#] Don't confuse this with the :mod:`marshal` module 924 925.. [#] This is why :keyword:`lambda` functions cannot be pickled: all 926 :keyword:`!lambda` functions share the same name: ``<lambda>``. 927 928.. [#] The exception raised will likely be an :exc:`ImportError` or an 929 :exc:`AttributeError` but it could be something else. 930 931.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying 932 operations. 933 934.. [#] The limitation on alphanumeric characters is due to the fact 935 the persistent IDs, in protocol 0, are delimited by the newline 936 character. Therefore if any kind of newline characters occurs in 937 persistent IDs, the resulting pickle will become unreadable. 938