• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1.. highlight:: c
2
3.. _new-types-topics:
4
5*****************************************
6Defining Extension Types: Assorted Topics
7*****************************************
8
9.. _dnt-type-methods:
10
11This section aims to give a quick fly-by on the various type methods you can
12implement and what they do.
13
14Here is the definition of :c:type:`PyTypeObject`, with some fields only used in
15:ref:`debug builds <debug-build>` omitted:
16
17.. literalinclude:: ../includes/typestruct.h
18
19
20Now that's a *lot* of methods.  Don't worry too much though -- if you have
21a type you want to define, the chances are very good that you will only
22implement a handful of these.
23
24As you probably expect by now, we're going to go over this and give more
25information about the various handlers.  We won't go in the order they are
26defined in the structure, because there is a lot of historical baggage that
27impacts the ordering of the fields.  It's often easiest to find an example
28that includes the fields you need and then change the values to suit your new
29type. ::
30
31   const char *tp_name; /* For printing */
32
33The name of the type -- as mentioned in the previous chapter, this will appear in
34various places, almost entirely for diagnostic purposes. Try to choose something
35that will be helpful in such a situation! ::
36
37   Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */
38
39These fields tell the runtime how much memory to allocate when new objects of
40this type are created.  Python has some built-in support for variable length
41structures (think: strings, tuples) which is where the :c:member:`~PyTypeObject.tp_itemsize` field
42comes in.  This will be dealt with later. ::
43
44   const char *tp_doc;
45
46Here you can put a string (or its address) that you want returned when the
47Python script references ``obj.__doc__`` to retrieve the doc string.
48
49Now we come to the basic type methods -- the ones most extension types will
50implement.
51
52
53Finalization and De-allocation
54------------------------------
55
56.. index::
57   single: object; deallocation
58   single: deallocation, object
59   single: object; finalization
60   single: finalization, of objects
61
62::
63
64   destructor tp_dealloc;
65
66This function is called when the reference count of the instance of your type is
67reduced to zero and the Python interpreter wants to reclaim it.  If your type
68has memory to free or other clean-up to perform, you can put it here.  The
69object itself needs to be freed here as well.  Here is an example of this
70function::
71
72   static void
73   newdatatype_dealloc(newdatatypeobject *obj)
74   {
75       free(obj->obj_UnderlyingDatatypePtr);
76       Py_TYPE(obj)->tp_free((PyObject *)obj);
77   }
78
79If your type supports garbage collection, the destructor should call
80:c:func:`PyObject_GC_UnTrack` before clearing any member fields::
81
82   static void
83   newdatatype_dealloc(newdatatypeobject *obj)
84   {
85       PyObject_GC_UnTrack(obj);
86       Py_CLEAR(obj->other_obj);
87       ...
88       Py_TYPE(obj)->tp_free((PyObject *)obj);
89   }
90
91.. index::
92   single: PyErr_Fetch (C function)
93   single: PyErr_Restore (C function)
94
95One important requirement of the deallocator function is that it leaves any
96pending exceptions alone.  This is important since deallocators are frequently
97called as the interpreter unwinds the Python stack; when the stack is unwound
98due to an exception (rather than normal returns), nothing is done to protect the
99deallocators from seeing that an exception has already been set.  Any actions
100which a deallocator performs which may cause additional Python code to be
101executed may detect that an exception has been set.  This can lead to misleading
102errors from the interpreter.  The proper way to protect against this is to save
103a pending exception before performing the unsafe action, and restoring it when
104done.  This can be done using the :c:func:`PyErr_Fetch` and
105:c:func:`PyErr_Restore` functions::
106
107   static void
108   my_dealloc(PyObject *obj)
109   {
110       MyObject *self = (MyObject *) obj;
111       PyObject *cbresult;
112
113       if (self->my_callback != NULL) {
114           PyObject *err_type, *err_value, *err_traceback;
115
116           /* This saves the current exception state */
117           PyErr_Fetch(&err_type, &err_value, &err_traceback);
118
119           cbresult = PyObject_CallNoArgs(self->my_callback);
120           if (cbresult == NULL)
121               PyErr_WriteUnraisable(self->my_callback);
122           else
123               Py_DECREF(cbresult);
124
125           /* This restores the saved exception state */
126           PyErr_Restore(err_type, err_value, err_traceback);
127
128           Py_DECREF(self->my_callback);
129       }
130       Py_TYPE(obj)->tp_free((PyObject*)self);
131   }
132
133.. note::
134   There are limitations to what you can safely do in a deallocator function.
135   First, if your type supports garbage collection (using :c:member:`~PyTypeObject.tp_traverse`
136   and/or :c:member:`~PyTypeObject.tp_clear`), some of the object's members can have been
137   cleared or finalized by the time :c:member:`~PyTypeObject.tp_dealloc` is called.  Second, in
138   :c:member:`~PyTypeObject.tp_dealloc`, your object is in an unstable state: its reference
139   count is equal to zero.  Any call to a non-trivial object or API (as in the
140   example above) might end up calling :c:member:`~PyTypeObject.tp_dealloc` again, causing a
141   double free and a crash.
142
143   Starting with Python 3.4, it is recommended not to put any complex
144   finalization code in :c:member:`~PyTypeObject.tp_dealloc`, and instead use the new
145   :c:member:`~PyTypeObject.tp_finalize` type method.
146
147   .. seealso::
148      :pep:`442` explains the new finalization scheme.
149
150.. index::
151   single: string; object representation
152   pair: built-in function; repr
153
154Object Presentation
155-------------------
156
157In Python, there are two ways to generate a textual representation of an object:
158the :func:`repr` function, and the :func:`str` function.  (The :func:`print`
159function just calls :func:`str`.)  These handlers are both optional.
160
161::
162
163   reprfunc tp_repr;
164   reprfunc tp_str;
165
166The :c:member:`~PyTypeObject.tp_repr` handler should return a string object containing a
167representation of the instance for which it is called.  Here is a simple
168example::
169
170   static PyObject *
171   newdatatype_repr(newdatatypeobject *obj)
172   {
173       return PyUnicode_FromFormat("Repr-ified_newdatatype{{size:%d}}",
174                                   obj->obj_UnderlyingDatatypePtr->size);
175   }
176
177If no :c:member:`~PyTypeObject.tp_repr` handler is specified, the interpreter will supply a
178representation that uses the type's :c:member:`~PyTypeObject.tp_name` and a uniquely identifying
179value for the object.
180
181The :c:member:`~PyTypeObject.tp_str` handler is to :func:`str` what the :c:member:`~PyTypeObject.tp_repr` handler
182described above is to :func:`repr`; that is, it is called when Python code calls
183:func:`str` on an instance of your object.  Its implementation is very similar
184to the :c:member:`~PyTypeObject.tp_repr` function, but the resulting string is intended for human
185consumption.  If :c:member:`~PyTypeObject.tp_str` is not specified, the :c:member:`~PyTypeObject.tp_repr` handler is
186used instead.
187
188Here is a simple example::
189
190   static PyObject *
191   newdatatype_str(newdatatypeobject *obj)
192   {
193       return PyUnicode_FromFormat("Stringified_newdatatype{{size:%d}}",
194                                   obj->obj_UnderlyingDatatypePtr->size);
195   }
196
197
198
199Attribute Management
200--------------------
201
202For every object which can support attributes, the corresponding type must
203provide the functions that control how the attributes are resolved.  There needs
204to be a function which can retrieve attributes (if any are defined), and another
205to set attributes (if setting attributes is allowed).  Removing an attribute is
206a special case, for which the new value passed to the handler is ``NULL``.
207
208Python supports two pairs of attribute handlers; a type that supports attributes
209only needs to implement the functions for one pair.  The difference is that one
210pair takes the name of the attribute as a :c:expr:`char\*`, while the other
211accepts a :c:expr:`PyObject*`.  Each type can use whichever pair makes more
212sense for the implementation's convenience. ::
213
214   getattrfunc  tp_getattr;        /* char * version */
215   setattrfunc  tp_setattr;
216   /* ... */
217   getattrofunc tp_getattro;       /* PyObject * version */
218   setattrofunc tp_setattro;
219
220If accessing attributes of an object is always a simple operation (this will be
221explained shortly), there are generic implementations which can be used to
222provide the :c:expr:`PyObject*` version of the attribute management functions.
223The actual need for type-specific attribute handlers almost completely
224disappeared starting with Python 2.2, though there are many examples which have
225not been updated to use some of the new generic mechanism that is available.
226
227
228.. _generic-attribute-management:
229
230Generic Attribute Management
231^^^^^^^^^^^^^^^^^^^^^^^^^^^^
232
233Most extension types only use *simple* attributes.  So, what makes the
234attributes simple?  There are only a couple of conditions that must be met:
235
236#. The name of the attributes must be known when :c:func:`PyType_Ready` is
237   called.
238
239#. No special processing is needed to record that an attribute was looked up or
240   set, nor do actions need to be taken based on the value.
241
242Note that this list does not place any restrictions on the values of the
243attributes, when the values are computed, or how relevant data is stored.
244
245When :c:func:`PyType_Ready` is called, it uses three tables referenced by the
246type object to create :term:`descriptor`\s which are placed in the dictionary of the
247type object.  Each descriptor controls access to one attribute of the instance
248object.  Each of the tables is optional; if all three are ``NULL``, instances of
249the type will only have attributes that are inherited from their base type, and
250should leave the :c:member:`~PyTypeObject.tp_getattro` and :c:member:`~PyTypeObject.tp_setattro` fields ``NULL`` as
251well, allowing the base type to handle attributes.
252
253The tables are declared as three fields of the type object::
254
255   struct PyMethodDef *tp_methods;
256   struct PyMemberDef *tp_members;
257   struct PyGetSetDef *tp_getset;
258
259If :c:member:`~PyTypeObject.tp_methods` is not ``NULL``, it must refer to an array of
260:c:type:`PyMethodDef` structures.  Each entry in the table is an instance of this
261structure::
262
263   typedef struct PyMethodDef {
264       const char  *ml_name;       /* method name */
265       PyCFunction  ml_meth;       /* implementation function */
266       int          ml_flags;      /* flags */
267       const char  *ml_doc;        /* docstring */
268   } PyMethodDef;
269
270One entry should be defined for each method provided by the type; no entries are
271needed for methods inherited from a base type.  One additional entry is needed
272at the end; it is a sentinel that marks the end of the array.  The
273:c:member:`~PyMethodDef.ml_name` field of the sentinel must be ``NULL``.
274
275The second table is used to define attributes which map directly to data stored
276in the instance.  A variety of primitive C types are supported, and access may
277be read-only or read-write.  The structures in the table are defined as::
278
279   typedef struct PyMemberDef {
280       const char *name;
281       int         type;
282       int         offset;
283       int         flags;
284       const char *doc;
285   } PyMemberDef;
286
287For each entry in the table, a :term:`descriptor` will be constructed and added to the
288type which will be able to extract a value from the instance structure.  The
289:c:member:`~PyMemberDef.type` field should contain a type code like :c:macro:`Py_T_INT` or
290:c:macro:`Py_T_DOUBLE`; the value will be used to determine how to
291convert Python values to and from C values.  The :c:member:`~PyMemberDef.flags` field is used to
292store flags which control how the attribute can be accessed: you can set it to
293:c:macro:`Py_READONLY` to prevent Python code from setting it.
294
295An interesting advantage of using the :c:member:`~PyTypeObject.tp_members` table to build
296descriptors that are used at runtime is that any attribute defined this way can
297have an associated doc string simply by providing the text in the table.  An
298application can use the introspection API to retrieve the descriptor from the
299class object, and get the doc string using its :attr:`~type.__doc__` attribute.
300
301As with the :c:member:`~PyTypeObject.tp_methods` table, a sentinel entry with a :c:member:`~PyMethodDef.ml_name` value
302of ``NULL`` is required.
303
304.. XXX Descriptors need to be explained in more detail somewhere, but not here.
305
306   Descriptor objects have two handler functions which correspond to the
307   \member{tp_getattro} and \member{tp_setattro} handlers.  The
308   \method{__get__()} handler is a function which is passed the descriptor,
309   instance, and type objects, and returns the value of the attribute, or it
310   returns \NULL{} and sets an exception.  The \method{__set__()} handler is
311   passed the descriptor, instance, type, and new value;
312
313
314Type-specific Attribute Management
315^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
316
317For simplicity, only the :c:expr:`char\*` version will be demonstrated here; the
318type of the name parameter is the only difference between the :c:expr:`char\*`
319and :c:expr:`PyObject*` flavors of the interface. This example effectively does
320the same thing as the generic example above, but does not use the generic
321support added in Python 2.2.  It explains how the handler functions are
322called, so that if you do need to extend their functionality, you'll understand
323what needs to be done.
324
325The :c:member:`~PyTypeObject.tp_getattr` handler is called when the object requires an attribute
326look-up.  It is called in the same situations where the :meth:`~object.__getattr__`
327method of a class would be called.
328
329Here is an example::
330
331   static PyObject *
332   newdatatype_getattr(newdatatypeobject *obj, char *name)
333   {
334       if (strcmp(name, "data") == 0)
335       {
336           return PyLong_FromLong(obj->data);
337       }
338
339       PyErr_Format(PyExc_AttributeError,
340                    "'%.100s' object has no attribute '%.400s'",
341                    Py_TYPE(obj)->tp_name, name);
342       return NULL;
343   }
344
345The :c:member:`~PyTypeObject.tp_setattr` handler is called when the :meth:`~object.__setattr__` or
346:meth:`~object.__delattr__` method of a class instance would be called.  When an
347attribute should be deleted, the third parameter will be ``NULL``.  Here is an
348example that simply raises an exception; if this were really all you wanted, the
349:c:member:`~PyTypeObject.tp_setattr` handler should be set to ``NULL``. ::
350
351   static int
352   newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v)
353   {
354       PyErr_Format(PyExc_RuntimeError, "Read-only attribute: %s", name);
355       return -1;
356   }
357
358Object Comparison
359-----------------
360
361::
362
363   richcmpfunc tp_richcompare;
364
365The :c:member:`~PyTypeObject.tp_richcompare` handler is called when comparisons are needed.  It is
366analogous to the :ref:`rich comparison methods <richcmpfuncs>`, like
367:meth:`!__lt__`, and also called by :c:func:`PyObject_RichCompare` and
368:c:func:`PyObject_RichCompareBool`.
369
370This function is called with two Python objects and the operator as arguments,
371where the operator is one of ``Py_EQ``, ``Py_NE``, ``Py_LE``, ``Py_GE``,
372``Py_LT`` or ``Py_GT``.  It should compare the two objects with respect to the
373specified operator and return ``Py_True`` or ``Py_False`` if the comparison is
374successful, ``Py_NotImplemented`` to indicate that comparison is not
375implemented and the other object's comparison method should be tried, or ``NULL``
376if an exception was set.
377
378Here is a sample implementation, for a datatype that is considered equal if the
379size of an internal pointer is equal::
380
381   static PyObject *
382   newdatatype_richcmp(newdatatypeobject *obj1, newdatatypeobject *obj2, int op)
383   {
384       PyObject *result;
385       int c, size1, size2;
386
387       /* code to make sure that both arguments are of type
388          newdatatype omitted */
389
390       size1 = obj1->obj_UnderlyingDatatypePtr->size;
391       size2 = obj2->obj_UnderlyingDatatypePtr->size;
392
393       switch (op) {
394       case Py_LT: c = size1 <  size2; break;
395       case Py_LE: c = size1 <= size2; break;
396       case Py_EQ: c = size1 == size2; break;
397       case Py_NE: c = size1 != size2; break;
398       case Py_GT: c = size1 >  size2; break;
399       case Py_GE: c = size1 >= size2; break;
400       }
401       result = c ? Py_True : Py_False;
402       Py_INCREF(result);
403       return result;
404    }
405
406
407Abstract Protocol Support
408-------------------------
409
410Python supports a variety of *abstract* 'protocols;' the specific interfaces
411provided to use these interfaces are documented in :ref:`abstract`.
412
413
414A number of these abstract interfaces were defined early in the development of
415the Python implementation.  In particular, the number, mapping, and sequence
416protocols have been part of Python since the beginning.  Other protocols have
417been added over time.  For protocols which depend on several handler routines
418from the type implementation, the older protocols have been defined as optional
419blocks of handlers referenced by the type object.  For newer protocols there are
420additional slots in the main type object, with a flag bit being set to indicate
421that the slots are present and should be checked by the interpreter.  (The flag
422bit does not indicate that the slot values are non-``NULL``. The flag may be set
423to indicate the presence of a slot, but a slot may still be unfilled.) ::
424
425   PyNumberMethods   *tp_as_number;
426   PySequenceMethods *tp_as_sequence;
427   PyMappingMethods  *tp_as_mapping;
428
429If you wish your object to be able to act like a number, a sequence, or a
430mapping object, then you place the address of a structure that implements the C
431type :c:type:`PyNumberMethods`, :c:type:`PySequenceMethods`, or
432:c:type:`PyMappingMethods`, respectively. It is up to you to fill in this
433structure with appropriate values. You can find examples of the use of each of
434these in the :file:`Objects` directory of the Python source distribution. ::
435
436   hashfunc tp_hash;
437
438This function, if you choose to provide it, should return a hash number for an
439instance of your data type. Here is a simple example::
440
441   static Py_hash_t
442   newdatatype_hash(newdatatypeobject *obj)
443   {
444       Py_hash_t result;
445       result = obj->some_size + 32767 * obj->some_number;
446       if (result == -1)
447          result = -2;
448       return result;
449   }
450
451:c:type:`Py_hash_t` is a signed integer type with a platform-varying width.
452Returning ``-1`` from :c:member:`~PyTypeObject.tp_hash` indicates an error,
453which is why you should be careful to avoid returning it when hash computation
454is successful, as seen above.
455
456::
457
458   ternaryfunc tp_call;
459
460This function is called when an instance of your data type is "called", for
461example, if ``obj1`` is an instance of your data type and the Python script
462contains ``obj1('hello')``, the :c:member:`~PyTypeObject.tp_call` handler is invoked.
463
464This function takes three arguments:
465
466#. *self* is the instance of the data type which is the subject of the call.
467   If the call is ``obj1('hello')``, then *self* is ``obj1``.
468
469#. *args* is a tuple containing the arguments to the call.  You can use
470   :c:func:`PyArg_ParseTuple` to extract the arguments.
471
472#. *kwds* is a dictionary of keyword arguments that were passed. If this is
473   non-``NULL`` and you support keyword arguments, use
474   :c:func:`PyArg_ParseTupleAndKeywords` to extract the arguments.  If you
475   do not want to support keyword arguments and this is non-``NULL``, raise a
476   :exc:`TypeError` with a message saying that keyword arguments are not supported.
477
478Here is a toy ``tp_call`` implementation::
479
480   static PyObject *
481   newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *kwds)
482   {
483       PyObject *result;
484       const char *arg1;
485       const char *arg2;
486       const char *arg3;
487
488       if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) {
489           return NULL;
490       }
491       result = PyUnicode_FromFormat(
492           "Returning -- value: [%d] arg1: [%s] arg2: [%s] arg3: [%s]\n",
493           obj->obj_UnderlyingDatatypePtr->size,
494           arg1, arg2, arg3);
495       return result;
496   }
497
498::
499
500   /* Iterators */
501   getiterfunc tp_iter;
502   iternextfunc tp_iternext;
503
504These functions provide support for the iterator protocol.  Both handlers
505take exactly one parameter, the instance for which they are being called,
506and return a new reference.  In the case of an error, they should set an
507exception and return ``NULL``.  :c:member:`~PyTypeObject.tp_iter` corresponds
508to the Python :meth:`~object.__iter__` method, while :c:member:`~PyTypeObject.tp_iternext`
509corresponds to the Python :meth:`~iterator.__next__` method.
510
511Any :term:`iterable` object must implement the :c:member:`~PyTypeObject.tp_iter`
512handler, which must return an :term:`iterator` object.  Here the same guidelines
513apply as for Python classes:
514
515* For collections (such as lists and tuples) which can support multiple
516  independent iterators, a new iterator should be created and returned by
517  each call to :c:member:`~PyTypeObject.tp_iter`.
518* Objects which can only be iterated over once (usually due to side effects of
519  iteration, such as file objects) can implement :c:member:`~PyTypeObject.tp_iter`
520  by returning a new reference to themselves -- and should also therefore
521  implement the :c:member:`~PyTypeObject.tp_iternext`  handler.
522
523Any :term:`iterator` object should implement both :c:member:`~PyTypeObject.tp_iter`
524and :c:member:`~PyTypeObject.tp_iternext`.  An iterator's
525:c:member:`~PyTypeObject.tp_iter` handler should return a new reference
526to the iterator.  Its :c:member:`~PyTypeObject.tp_iternext` handler should
527return a new reference to the next object in the iteration, if there is one.
528If the iteration has reached the end, :c:member:`~PyTypeObject.tp_iternext`
529may return ``NULL`` without setting an exception, or it may set
530:exc:`StopIteration` *in addition* to returning ``NULL``; avoiding
531the exception can yield slightly better performance.  If an actual error
532occurs, :c:member:`~PyTypeObject.tp_iternext` should always set an exception
533and return ``NULL``.
534
535
536.. _weakref-support:
537
538Weak Reference Support
539----------------------
540
541One of the goals of Python's weak reference implementation is to allow any type
542to participate in the weak reference mechanism without incurring the overhead on
543performance-critical objects (such as numbers).
544
545.. seealso::
546   Documentation for the :mod:`weakref` module.
547
548For an object to be weakly referenceable, the extension type must set the
549``Py_TPFLAGS_MANAGED_WEAKREF`` bit of the :c:member:`~PyTypeObject.tp_flags`
550field. The legacy :c:member:`~PyTypeObject.tp_weaklistoffset` field should
551be left as zero.
552
553Concretely, here is how the statically declared type object would look::
554
555   static PyTypeObject TrivialType = {
556       PyVarObject_HEAD_INIT(NULL, 0)
557       /* ... other members omitted for brevity ... */
558       .tp_flags = Py_TPFLAGS_MANAGED_WEAKREF | ...,
559   };
560
561
562The only further addition is that ``tp_dealloc`` needs to clear any weak
563references (by calling :c:func:`PyObject_ClearWeakRefs`)::
564
565   static void
566   Trivial_dealloc(TrivialObject *self)
567   {
568       /* Clear weakrefs first before calling any destructors */
569       PyObject_ClearWeakRefs((PyObject *) self);
570       /* ... remainder of destruction code omitted for brevity ... */
571       Py_TYPE(self)->tp_free((PyObject *) self);
572   }
573
574
575More Suggestions
576----------------
577
578In order to learn how to implement any specific method for your new data type,
579get the :term:`CPython` source code.  Go to the :file:`Objects` directory,
580then search the C source files for ``tp_`` plus the function you want
581(for example, ``tp_richcompare``).  You will find examples of the function
582you want to implement.
583
584When you need to verify that an object is a concrete instance of the type you
585are implementing, use the :c:func:`PyObject_TypeCheck` function.  A sample of
586its use might be something like the following::
587
588   if (!PyObject_TypeCheck(some_object, &MyType)) {
589       PyErr_SetString(PyExc_TypeError, "arg #1 not a mything");
590       return NULL;
591   }
592
593.. seealso::
594   Download CPython source releases.
595      https://www.python.org/downloads/source/
596
597   The CPython project on GitHub, where the CPython source code is developed.
598      https://github.com/python/cpython
599