1:mod:`pickle` --- Python object serialization 2============================================= 3 4.. index:: 5 single: persistence 6 pair: persistent; objects 7 pair: serializing; objects 8 pair: marshalling; objects 9 pair: flattening; objects 10 pair: pickling; objects 11 12.. module:: pickle 13 :synopsis: Convert Python objects to streams of bytes and back. 14.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>. 15.. sectionauthor:: Barry Warsaw <barry@zope.com> 16 17The :mod:`pickle` module implements a fundamental, but powerful algorithm for 18serializing and de-serializing a Python object structure. "Pickling" is the 19process whereby a Python object hierarchy is converted into a byte stream, and 20"unpickling" is the inverse operation, whereby a byte stream is converted back 21into an object hierarchy. Pickling (and unpickling) is alternatively known as 22"serialization", "marshalling," [#]_ or "flattening", however, to avoid 23confusion, the terms used here are "pickling" and "unpickling". 24 25This documentation describes both the :mod:`pickle` module and the 26:mod:`cPickle` module. 27 28.. warning:: 29 30 The :mod:`pickle` module is not secure against erroneous or maliciously 31 constructed data. Never unpickle data received from an untrusted or 32 unauthenticated source. 33 34 35Relationship to other Python modules 36------------------------------------ 37 38The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle` 39module. As its name implies, :mod:`cPickle` is written in C, so it can be up to 401000 times faster than :mod:`pickle`. However it does not support subclassing 41of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle` 42these are functions, not classes. Most applications have no need for this 43functionality, and can benefit from the improved performance of :mod:`cPickle`. 44Other than that, the interfaces of the two modules are nearly identical; the 45common interface is described in this manual and differences are pointed out 46where necessary. In the following discussions, we use the term "pickle" to 47collectively describe the :mod:`pickle` and :mod:`cPickle` modules. 48 49The data streams the two modules produce are guaranteed to be interchangeable. 50 51Python has a more primitive serialization module called :mod:`marshal`, but in 52general :mod:`pickle` should always be the preferred way to serialize Python 53objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc` 54files. 55 56The :mod:`pickle` module differs from :mod:`marshal` in several significant ways: 57 58* The :mod:`pickle` module keeps track of the objects it has already serialized, 59 so that later references to the same object won't be serialized again. 60 :mod:`marshal` doesn't do this. 61 62 This has implications both for recursive objects and object sharing. Recursive 63 objects are objects that contain references to themselves. These are not 64 handled by marshal, and in fact, attempting to marshal recursive objects will 65 crash your Python interpreter. Object sharing happens when there are multiple 66 references to the same object in different places in the object hierarchy being 67 serialized. :mod:`pickle` stores such objects only once, and ensures that all 68 other references point to the master copy. Shared objects remain shared, which 69 can be very important for mutable objects. 70 71* :mod:`marshal` cannot be used to serialize user-defined classes and their 72 instances. :mod:`pickle` can save and restore class instances transparently, 73 however the class definition must be importable and live in the same module as 74 when the object was stored. 75 76* The :mod:`marshal` serialization format is not guaranteed to be portable 77 across Python versions. Because its primary job in life is to support 78 :file:`.pyc` files, the Python implementers reserve the right to change the 79 serialization format in non-backwards compatible ways should the need arise. 80 The :mod:`pickle` serialization format is guaranteed to be backwards compatible 81 across Python releases. 82 83Note that serialization is a more primitive notion than persistence; although 84:mod:`pickle` reads and writes file objects, it does not handle the issue of 85naming persistent objects, nor the (even more complicated) issue of concurrent 86access to persistent objects. The :mod:`pickle` module can transform a complex 87object into a byte stream and it can transform the byte stream into an object 88with the same internal structure. Perhaps the most obvious thing to do with 89these byte streams is to write them onto a file, but it is also conceivable to 90send them across a network or store them in a database. The module 91:mod:`shelve` provides a simple interface to pickle and unpickle objects on 92DBM-style database files. 93 94 95Data stream format 96------------------ 97 98.. index:: 99 single: XDR 100 single: External Data Representation 101 102The data format used by :mod:`pickle` is Python-specific. This has the 103advantage that there are no restrictions imposed by external standards such as 104XDR (which can't represent pointer sharing); however it means that non-Python 105programs may not be able to reconstruct pickled Python objects. 106 107By default, the :mod:`pickle` data format uses a printable ASCII representation. 108This is slightly more voluminous than a binary representation. The big 109advantage of using printable ASCII (and of some other characteristics of 110:mod:`pickle`'s representation) is that for debugging or recovery purposes it is 111possible for a human to read the pickled file with a standard text editor. 112 113There are currently 3 different protocols which can be used for pickling. 114 115* Protocol version 0 is the original ASCII protocol and is backwards compatible 116 with earlier versions of Python. 117 118* Protocol version 1 is the old binary format which is also compatible with 119 earlier versions of Python. 120 121* Protocol version 2 was introduced in Python 2.3. It provides much more 122 efficient pickling of :term:`new-style class`\es. 123 124Refer to :pep:`307` for more information. 125 126If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified 127as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version 128available will be used. 129 130.. versionchanged:: 2.3 131 Introduced the *protocol* parameter. 132 133A binary format, which is slightly more efficient, can be chosen by specifying a 134*protocol* version >= 1. 135 136 137Usage 138----- 139 140To serialize an object hierarchy, you first create a pickler, then you call the 141pickler's :meth:`dump` method. To de-serialize a data stream, you first create 142an unpickler, then you call the unpickler's :meth:`load` method. The 143:mod:`pickle` module provides the following constant: 144 145 146.. data:: HIGHEST_PROTOCOL 147 148 The highest protocol version available. This value can be passed as a 149 *protocol* value. 150 151 .. versionadded:: 2.3 152 153.. note:: 154 155 Be sure to always open pickle files created with protocols >= 1 in binary mode. 156 For the old ASCII-based pickle protocol 0 you can use either text mode or binary 157 mode as long as you stay consistent. 158 159 A pickle file written with protocol 0 in binary mode will contain lone linefeeds 160 as line terminators and therefore will look "funny" when viewed in Notepad or 161 other editors which do not support this format. 162 163The :mod:`pickle` module provides the following functions to make the pickling 164process more convenient: 165 166 167.. function:: dump(obj, file[, protocol]) 168 169 Write a pickled representation of *obj* to the open file object *file*. This is 170 equivalent to ``Pickler(file, protocol).dump(obj)``. 171 172 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 173 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol 174 version will be used. 175 176 .. versionchanged:: 2.3 177 Introduced the *protocol* parameter. 178 179 *file* must have a :meth:`write` method that accepts a single string argument. 180 It can thus be a file object opened for writing, a :mod:`StringIO` object, or 181 any other custom object that meets this interface. 182 183 184.. function:: load(file) 185 186 Read a string from the open file object *file* and interpret it as a pickle data 187 stream, reconstructing and returning the original object hierarchy. This is 188 equivalent to ``Unpickler(file).load()``. 189 190 *file* must have two methods, a :meth:`read` method that takes an integer 191 argument, and a :meth:`readline` method that requires no arguments. Both 192 methods should return a string. Thus *file* can be a file object opened for 193 reading, a :mod:`StringIO` object, or any other custom object that meets this 194 interface. 195 196 This function automatically determines whether the data stream was written in 197 binary mode or not. 198 199 200.. function:: dumps(obj[, protocol]) 201 202 Return the pickled representation of the object as a string, instead of writing 203 it to a file. 204 205 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 206 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol 207 version will be used. 208 209 .. versionchanged:: 2.3 210 The *protocol* parameter was added. 211 212 213.. function:: loads(string) 214 215 Read a pickled object hierarchy from a string. Characters in the string past 216 the pickled object's representation are ignored. 217 218The :mod:`pickle` module also defines three exceptions: 219 220 221.. exception:: PickleError 222 223 A common base class for the other exceptions defined below. This inherits from 224 :exc:`Exception`. 225 226 227.. exception:: PicklingError 228 229 This exception is raised when an unpicklable object is passed to the 230 :meth:`dump` method. 231 232 233.. exception:: UnpicklingError 234 235 This exception is raised when there is a problem unpickling an object. Note that 236 other exceptions may also be raised during unpickling, including (but not 237 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`, 238 :exc:`ImportError`, and :exc:`IndexError`. 239 240The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and 241:class:`Unpickler`: 242 243 244.. class:: Pickler(file[, protocol]) 245 246 This takes a file-like object to which it will write a pickle data stream. 247 248 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 249 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest 250 protocol version will be used. 251 252 .. versionchanged:: 2.3 253 Introduced the *protocol* parameter. 254 255 *file* must have a :meth:`write` method that accepts a single string argument. 256 It can thus be an open file object, a :mod:`StringIO` object, or any other 257 custom object that meets this interface. 258 259 :class:`Pickler` objects define one (or two) public methods: 260 261 262 .. method:: dump(obj) 263 264 Write a pickled representation of *obj* to the open file object given in the 265 constructor. Either the binary or ASCII format will be used, depending on the 266 value of the *protocol* argument passed to the constructor. 267 268 269 .. method:: clear_memo() 270 271 Clears the pickler's "memo". The memo is the data structure that remembers 272 which objects the pickler has already seen, so that shared or recursive objects 273 pickled by reference and not by value. This method is useful when re-using 274 picklers. 275 276 .. note:: 277 278 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers 279 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an 280 instance variable called :attr:`memo` which is a Python dictionary. So to clear 281 the memo for a :mod:`pickle` module pickler, you could do the following:: 282 283 mypickler.memo.clear() 284 285 Code that does not need to support older versions of Python should simply use 286 :meth:`clear_memo`. 287 288It is possible to make multiple calls to the :meth:`dump` method of the same 289:class:`Pickler` instance. These must then be matched to the same number of 290calls to the :meth:`load` method of the corresponding :class:`Unpickler` 291instance. If the same object is pickled by multiple :meth:`dump` calls, the 292:meth:`load` will all yield references to the same object. [#]_ 293 294:class:`Unpickler` objects are defined as: 295 296 297.. class:: Unpickler(file) 298 299 This takes a file-like object from which it will read a pickle data stream. 300 This class automatically determines whether the data stream was written in 301 binary mode or not, so it does not need a flag as in the :class:`Pickler` 302 factory. 303 304 *file* must have two methods, a :meth:`read` method that takes an integer 305 argument, and a :meth:`readline` method that requires no arguments. Both 306 methods should return a string. Thus *file* can be a file object opened for 307 reading, a :mod:`StringIO` object, or any other custom object that meets this 308 interface. 309 310 :class:`Unpickler` objects have one (or two) public methods: 311 312 313 .. method:: load() 314 315 Read a pickled object representation from the open file object given in 316 the constructor, and return the reconstituted object hierarchy specified 317 therein. 318 319 This method automatically determines whether the data stream was written 320 in binary mode or not. 321 322 323 .. method:: noload() 324 325 This is just like :meth:`load` except that it doesn't actually create any 326 objects. This is useful primarily for finding what's called "persistent 327 ids" that may be referenced in a pickle data stream. See section 328 :ref:`pickle-protocol` below for more details. 329 330 **Note:** the :meth:`noload` method is currently only available on 331 :class:`Unpickler` objects created with the :mod:`cPickle` module. 332 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload` 333 method. 334 335 336What can be pickled and unpickled? 337---------------------------------- 338 339The following types can be pickled: 340 341* ``None``, ``True``, and ``False`` 342 343* integers, long integers, floating point numbers, complex numbers 344 345* normal and Unicode strings 346 347* tuples, lists, sets, and dictionaries containing only picklable objects 348 349* functions defined at the top level of a module 350 351* built-in functions defined at the top level of a module 352 353* classes that are defined at the top level of a module 354 355* instances of such classes whose :attr:`~object.__dict__` or the result of 356 calling :meth:`__getstate__` is picklable (see section :ref:`pickle-protocol` 357 for details). 358 359Attempts to pickle unpicklable objects will raise the :exc:`PicklingError` 360exception; when this happens, an unspecified number of bytes may have already 361been written to the underlying file. Trying to pickle a highly recursive data 362structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be 363raised in this case. You can carefully raise this limit with 364:func:`sys.setrecursionlimit`. 365 366Note that functions (built-in and user-defined) are pickled by "fully qualified" 367name reference, not by value. This means that only the function name is 368pickled, along with the name of the module the function is defined in. Neither 369the function's code, nor any of its function attributes are pickled. Thus the 370defining module must be importable in the unpickling environment, and the module 371must contain the named object, otherwise an exception will be raised. [#]_ 372 373Similarly, classes are pickled by named reference, so the same restrictions in 374the unpickling environment apply. Note that none of the class's code or data is 375pickled, so in the following example the class attribute ``attr`` is not 376restored in the unpickling environment:: 377 378 class Foo: 379 attr = 'a class attr' 380 381 picklestring = pickle.dumps(Foo) 382 383These restrictions are why picklable functions and classes must be defined in 384the top level of a module. 385 386Similarly, when class instances are pickled, their class's code and data are not 387pickled along with them. Only the instance data are pickled. This is done on 388purpose, so you can fix bugs in a class or add methods to the class and still 389load objects that were created with an earlier version of the class. If you 390plan to have long-lived objects that will see many versions of a class, it may 391be worthwhile to put a version number in the objects so that suitable 392conversions can be made by the class's :meth:`__setstate__` method. 393 394 395.. _pickle-protocol: 396 397The pickle protocol 398------------------- 399 400.. currentmodule:: None 401 402This section describes the "pickling protocol" that defines the interface 403between the pickler/unpickler and the objects that are being serialized. This 404protocol provides a standard way for you to define, customize, and control how 405your objects are serialized and de-serialized. The description in this section 406doesn't cover specific customizations that you can employ to make the unpickling 407environment slightly safer from untrusted pickle data streams; see section 408:ref:`pickle-sub` for more details. 409 410 411.. _pickle-inst: 412 413Pickling and unpickling normal class instances 414^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 415 416.. method:: object.__getinitargs__() 417 418 When a pickled class instance is unpickled, its :meth:`__init__` method is 419 normally *not* invoked. If it is desirable that the :meth:`__init__` method 420 be called on unpickling, an old-style class can define a method 421 :meth:`__getinitargs__`, which should return a *tuple* containing the 422 arguments to be passed to the class constructor (:meth:`__init__` for 423 example). The :meth:`__getinitargs__` method is called at pickle time; the 424 tuple it returns is incorporated in the pickle for the instance. 425 426.. method:: object.__getnewargs__() 427 428 New-style types can provide a :meth:`__getnewargs__` method that is used for 429 protocol 2. Implementing this method is needed if the type establishes some 430 internal invariants when the instance is created, or if the memory allocation 431 is affected by the values passed to the :meth:`__new__` method for the type 432 (as it is for tuples and strings). Instances of a :term:`new-style class` 433 ``C`` are created using :: 434 435 obj = C.__new__(C, *args) 436 437 where *args* is the result of calling :meth:`__getnewargs__` on the original 438 object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed. 439 440.. method:: object.__getstate__() 441 442 Classes can further influence how their instances are pickled; if the class 443 defines the method :meth:`__getstate__`, it is called and the return state is 444 pickled as the contents for the instance, instead of the contents of the 445 instance's dictionary. If there is no :meth:`__getstate__` method, the 446 instance's :attr:`~object.__dict__` is pickled. 447 448.. method:: object.__setstate__(state) 449 450 Upon unpickling, if the class also defines the method :meth:`__setstate__`, 451 it is called with the unpickled state. [#]_ If there is no 452 :meth:`__setstate__` method, the pickled state must be a dictionary and its 453 items are assigned to the new instance's dictionary. If a class defines both 454 :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a 455 dictionary and these methods can do what they want. [#]_ 456 457 .. note:: 458 459 For :term:`new-style class`\es, if :meth:`__getstate__` returns a false 460 value, the :meth:`__setstate__` method will not be called. 461 462.. note:: 463 464 At unpickling time, some methods like :meth:`__getattr__`, 465 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the 466 instance. In case those methods rely on some internal invariant being 467 true, the type should implement either :meth:`__getinitargs__` or 468 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither 469 :meth:`__new__` nor :meth:`__init__` will be called. 470 471 472Pickling and unpickling extension types 473^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 474 475.. method:: object.__reduce__() 476 477 When the :class:`Pickler` encounters an object of a type it knows nothing 478 about --- such as an extension type --- it looks in two places for a hint of 479 how to pickle it. One alternative is for the object to implement a 480 :meth:`__reduce__` method. If provided, at pickling time :meth:`__reduce__` 481 will be called with no arguments, and it must return either a string or a 482 tuple. 483 484 If a string is returned, it names a global variable whose contents are 485 pickled as normal. The string returned by :meth:`__reduce__` should be the 486 object's local name relative to its module; the pickle module searches the 487 module namespace to determine the object's module. 488 489 When a tuple is returned, it must be between two and five elements long. 490 Optional elements can either be omitted, or ``None`` can be provided as their 491 value. The contents of this tuple are pickled as normal and used to 492 reconstruct the object at unpickling time. The semantics of each element 493 are: 494 495 * A callable object that will be called to create the initial version of the 496 object. The next element of the tuple will provide arguments for this 497 callable, and later elements provide additional state information that will 498 subsequently be used to fully reconstruct the pickled data. 499 500 In the unpickling environment this object must be either a class, a 501 callable registered as a "safe constructor" (see below), or it must have an 502 attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an 503 :exc:`UnpicklingError` will be raised in the unpickling environment. Note 504 that as usual, the callable itself is pickled by name. 505 506 * A tuple of arguments for the callable object. 507 508 .. versionchanged:: 2.5 509 Formerly, this argument could also be ``None``. 510 511 * Optionally, the object's state, which will be passed to the object's 512 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If 513 the object has no :meth:`__setstate__` method, then, as above, the value 514 must be a dictionary and it will be added to the object's 515 :attr:`~object.__dict__`. 516 517 * Optionally, an iterator (and not a sequence) yielding successive list 518 items. These list items will be pickled, and appended to the object using 519 either ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is 520 primarily used for list subclasses, but may be used by other classes as 521 long as they have :meth:`append` and :meth:`extend` methods with the 522 appropriate signature. (Whether :meth:`append` or :meth:`extend` is used 523 depends on which pickle protocol version is used as well as the number of 524 items to append, so both must be supported.) 525 526 * Optionally, an iterator (not a sequence) yielding successive dictionary 527 items, which should be tuples of the form ``(key, value)``. These items 528 will be pickled and stored to the object using ``obj[key] = value``. This 529 is primarily used for dictionary subclasses, but may be used by other 530 classes as long as they implement :meth:`__setitem__`. 531 532.. method:: object.__reduce_ex__(protocol) 533 534 It is sometimes useful to know the protocol version when implementing 535 :meth:`__reduce__`. This can be done by implementing a method named 536 :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, 537 when it exists, is called in preference over :meth:`__reduce__` (you may 538 still provide :meth:`__reduce__` for backwards compatibility). The 539 :meth:`__reduce_ex__` method will be called with a single integer argument, 540 the protocol version. 541 542 The :class:`object` class implements both :meth:`__reduce__` and 543 :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` 544 but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation 545 detects this and calls :meth:`__reduce__`. 546 547An alternative to implementing a :meth:`__reduce__` method on the object to be 548pickled, is to register the callable with the :mod:`copy_reg` module. This 549module provides a way for programs to register "reduction functions" and 550constructors for user-defined types. Reduction functions have the same 551semantics and interface as the :meth:`__reduce__` method described above, except 552that they are called with a single argument, the object to be pickled. 553 554The registered constructor is deemed a "safe constructor" for purposes of 555unpickling as described above. 556 557 558Pickling and unpickling external objects 559^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 560 561.. index:: 562 single: persistent_id (pickle protocol) 563 single: persistent_load (pickle protocol) 564 565For the benefit of object persistence, the :mod:`pickle` module supports the 566notion of a reference to an object outside the pickled data stream. Such 567objects are referenced by a "persistent id", which is just an arbitrary string 568of printable ASCII characters. The resolution of such names is not defined by 569the :mod:`pickle` module; it will delegate this resolution to user defined 570functions on the pickler and unpickler. [#]_ 571 572To define external persistent id resolution, you need to set the 573:attr:`~Pickler.persistent_id` attribute of the pickler object and the 574:attr:`~Unpickler.persistent_load` attribute of the unpickler object. 575 576To pickle objects that have an external persistent id, the pickler must have a 577custom :func:`~Pickler.persistent_id` method that takes an object as an 578argument and returns either ``None`` or the persistent id for that object. 579When ``None`` is returned, the pickler simply pickles the object as normal. 580When a persistent id string is returned, the pickler will pickle that string, 581along with a marker so that the unpickler will recognize the string as a 582persistent id. 583 584To unpickle external objects, the unpickler must have a custom 585:func:`~Unpickler.persistent_load` function that takes a persistent id string 586and returns the referenced object. 587 588Here's a silly example that *might* shed more light:: 589 590 import pickle 591 from cStringIO import StringIO 592 593 src = StringIO() 594 p = pickle.Pickler(src) 595 596 def persistent_id(obj): 597 if hasattr(obj, 'x'): 598 return 'the value %d' % obj.x 599 else: 600 return None 601 602 p.persistent_id = persistent_id 603 604 class Integer: 605 def __init__(self, x): 606 self.x = x 607 def __str__(self): 608 return 'My name is integer %d' % self.x 609 610 i = Integer(7) 611 print i 612 p.dump(i) 613 614 datastream = src.getvalue() 615 print repr(datastream) 616 dst = StringIO(datastream) 617 618 up = pickle.Unpickler(dst) 619 620 class FancyInteger(Integer): 621 def __str__(self): 622 return 'I am the integer %d' % self.x 623 624 def persistent_load(persid): 625 if persid.startswith('the value '): 626 value = int(persid.split()[2]) 627 return FancyInteger(value) 628 else: 629 raise pickle.UnpicklingError, 'Invalid persistent id' 630 631 up.persistent_load = persistent_load 632 633 j = up.load() 634 print j 635 636In the :mod:`cPickle` module, the unpickler's :attr:`~Unpickler.persistent_load` 637attribute can also be set to a Python list, in which case, when the unpickler 638reaches a persistent id, the persistent id string will simply be appended to 639this list. This functionality exists so that a pickle data stream can be 640"sniffed" for object references without actually instantiating all the objects 641in a pickle. 642[#]_ Setting :attr:`~Unpickler.persistent_load` to a list is usually used in 643conjunction with the :meth:`~Unpickler.noload` method on the Unpickler. 644 645.. BAW: Both pickle and cPickle support something called inst_persistent_id() 646 which appears to give unknown types a second shot at producing a persistent 647 id. Since Jim Fulton can't remember why it was added or what it's for, I'm 648 leaving it undocumented. 649 650 651.. _pickle-sub: 652 653Subclassing Unpicklers 654---------------------- 655 656.. index:: 657 single: load_global() (pickle protocol) 658 single: find_global() (pickle protocol) 659 660By default, unpickling will import any class that it finds in the pickle data. 661You can control exactly what gets unpickled and what gets called by customizing 662your unpickler. Unfortunately, exactly how you do this is different depending 663on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_ 664 665In the :mod:`pickle` module, you need to derive a subclass from 666:class:`Unpickler`, overriding the :meth:`load_global` method. 667:meth:`load_global` should read two lines from the pickle data stream where the 668first line will the name of the module containing the class and the second line 669will be the name of the instance's class. It then looks up the class, possibly 670importing the module and digging out the attribute, then it appends what it 671finds to the unpickler's stack. Later on, this class will be assigned to the 672:attr:`__class__` attribute of an empty class, as a way of magically creating an 673instance without calling its class's :meth:`__init__`. Your job (should you 674choose to accept it), would be to have :meth:`load_global` push onto the 675unpickler's stack, a known safe version of any class you deem safe to unpickle. 676It is up to you to produce such a class. Or you could raise an error if you 677want to disallow all unpickling of instances. If this sounds like a hack, 678you're right. Refer to the source code to make this work. 679 680Things are a little cleaner with :mod:`cPickle`, but not by much. To control 681what gets unpickled, you can set the unpickler's :attr:`~Unpickler.find_global` 682attribute to a function or ``None``. If it is ``None`` then any attempts to 683unpickle instances will raise an :exc:`UnpicklingError`. If it is a function, 684then it should accept a module name and a class name, and return the 685corresponding class object. It is responsible for looking up the class and 686performing any necessary imports, and it may raise an error to prevent 687instances of the class from being unpickled. 688 689The moral of the story is that you should be really careful about the source of 690the strings your application unpickles. 691 692 693.. _pickle-example: 694 695Example 696------- 697 698For the simplest code, use the :func:`dump` and :func:`load` functions. Note 699that a self-referencing list is pickled and restored correctly. :: 700 701 import pickle 702 703 data1 = {'a': [1, 2.0, 3, 4+6j], 704 'b': ('string', u'Unicode string'), 705 'c': None} 706 707 selfref_list = [1, 2, 3] 708 selfref_list.append(selfref_list) 709 710 output = open('data.pkl', 'wb') 711 712 # Pickle dictionary using protocol 0. 713 pickle.dump(data1, output) 714 715 # Pickle the list using the highest protocol available. 716 pickle.dump(selfref_list, output, -1) 717 718 output.close() 719 720The following example reads the resulting pickled data. When reading a 721pickle-containing file, you should open the file in binary mode because you 722can't be sure if the ASCII or binary format was used. :: 723 724 import pprint, pickle 725 726 pkl_file = open('data.pkl', 'rb') 727 728 data1 = pickle.load(pkl_file) 729 pprint.pprint(data1) 730 731 data2 = pickle.load(pkl_file) 732 pprint.pprint(data2) 733 734 pkl_file.close() 735 736Here's a larger example that shows how to modify pickling behavior for a class. 737The :class:`TextReader` class opens a text file, and returns the line number and 738line contents each time its :meth:`!readline` method is called. If a 739:class:`TextReader` instance is pickled, all attributes *except* the file object 740member are saved. When the instance is unpickled, the file is reopened, and 741reading resumes from the last location. The :meth:`__setstate__` and 742:meth:`__getstate__` methods are used to implement this behavior. :: 743 744 #!/usr/local/bin/python 745 746 class TextReader: 747 """Print and number lines in a text file.""" 748 def __init__(self, file): 749 self.file = file 750 self.fh = open(file) 751 self.lineno = 0 752 753 def readline(self): 754 self.lineno = self.lineno + 1 755 line = self.fh.readline() 756 if not line: 757 return None 758 if line.endswith("\n"): 759 line = line[:-1] 760 return "%d: %s" % (self.lineno, line) 761 762 def __getstate__(self): 763 odict = self.__dict__.copy() # copy the dict since we change it 764 del odict['fh'] # remove filehandle entry 765 return odict 766 767 def __setstate__(self, dict): 768 fh = open(dict['file']) # reopen file 769 count = dict['lineno'] # read from file... 770 while count: # until line count is restored 771 fh.readline() 772 count = count - 1 773 self.__dict__.update(dict) # update attributes 774 self.fh = fh # save the file object 775 776A sample usage might be something like this:: 777 778 >>> import TextReader 779 >>> obj = TextReader.TextReader("TextReader.py") 780 >>> obj.readline() 781 '1: #!/usr/local/bin/python' 782 >>> obj.readline() 783 '2: ' 784 >>> obj.readline() 785 '3: class TextReader:' 786 >>> import pickle 787 >>> pickle.dump(obj, open('save.p', 'wb')) 788 789If you want to see that :mod:`pickle` works across Python processes, start 790another Python session, before continuing. What follows can happen from either 791the same process or a new process. :: 792 793 >>> import pickle 794 >>> reader = pickle.load(open('save.p', 'rb')) 795 >>> reader.readline() 796 '4: """Print and number lines in a text file."""' 797 798 799.. seealso:: 800 801 Module :mod:`copy_reg` 802 Pickle interface constructor registration for extension types. 803 804 Module :mod:`shelve` 805 Indexed databases of objects; uses :mod:`pickle`. 806 807 Module :mod:`copy` 808 Shallow and deep object copying. 809 810 Module :mod:`marshal` 811 High-performance serialization of built-in types. 812 813 814:mod:`cPickle` --- A faster :mod:`pickle` 815========================================= 816 817.. module:: cPickle 818 :synopsis: Faster version of pickle, but not subclassable. 819.. moduleauthor:: Jim Fulton <jim@zope.com> 820.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> 821 822 823.. index:: module: pickle 824 825The :mod:`cPickle` module supports serialization and de-serialization of Python 826objects, providing an interface and functionality nearly identical to the 827:mod:`pickle` module. There are several differences, the most important being 828performance and subclassability. 829 830First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because 831the former is implemented in C. Second, in the :mod:`cPickle` module the 832callables :func:`Pickler` and :func:`Unpickler` are functions, not classes. 833This means that you cannot use them to derive custom pickling and unpickling 834subclasses. Most applications have no need for this functionality and should 835benefit from the greatly improved performance of the :mod:`cPickle` module. 836 837The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are 838identical, so it is possible to use :mod:`pickle` and :mod:`cPickle` 839interchangeably with existing pickles. [#]_ 840 841There are additional minor differences in API between :mod:`cPickle` and 842:mod:`pickle`, however for most applications, they are interchangeable. More 843documentation is provided in the :mod:`pickle` module documentation, which 844includes a list of the documented differences. 845 846.. rubric:: Footnotes 847 848.. [#] Don't confuse this with the :mod:`marshal` module 849 850.. [#] In the :mod:`pickle` module these callables are classes, which you could 851 subclass to customize the behavior. However, in the :mod:`cPickle` module these 852 callables are factory functions and so cannot be subclassed. One common reason 853 to subclass is to control what objects can actually be unpickled. See section 854 :ref:`pickle-sub` for more details. 855 856.. [#] *Warning*: this is intended for pickling multiple objects without intervening 857 modifications to the objects or their parts. If you modify an object and then 858 pickle it again using the same :class:`Pickler` instance, the object is not 859 pickled again --- a reference to it is pickled and the :class:`Unpickler` will 860 return the old value, not the modified one. There are two problems here: (1) 861 detecting changes, and (2) marshalling a minimal set of changes. Garbage 862 Collection may also become a problem here. 863 864.. [#] The exception raised will likely be an :exc:`ImportError` or an 865 :exc:`AttributeError` but it could be something else. 866 867.. [#] These methods can also be used to implement copying class instances. 868 869.. [#] This protocol is also used by the shallow and deep copying operations defined in 870 the :mod:`copy` module. 871 872.. [#] The actual mechanism for associating these user defined functions is slightly 873 different for :mod:`pickle` and :mod:`cPickle`. The description given here 874 works the same for both implementations. Users of the :mod:`pickle` module 875 could also use subclassing to effect the same results, overriding the 876 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived 877 classes. 878 879.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles 880 in their living rooms. 881 882.. [#] A word of caution: the mechanisms described here use internal attributes and 883 methods, which are subject to change in future versions of Python. We intend to 884 someday provide a common interface for controlling this behavior, which will 885 work in either :mod:`pickle` or :mod:`cPickle`. 886 887.. [#] Since the pickle data format is actually a tiny stack-oriented programming 888 language, and some freedom is taken in the encodings of certain objects, it is 889 possible that the two modules produce different data streams for the same input 890 objects. However it is guaranteed that they will always be able to read each 891 other's data streams. 892 893