• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1.. highlight:: c
2
3.. _new-types-topics:
4
5*****************************************
6Defining Extension Types: Assorted Topics
7*****************************************
8
9.. _dnt-type-methods:
10
11This section aims to give a quick fly-by on the various type methods you can
12implement and what they do.
13
14Here is the definition of :c:type:`PyTypeObject`, with some fields only used in
15debug builds omitted:
16
17.. literalinclude:: ../includes/typestruct.h
18
19
20Now that's a *lot* of methods.  Don't worry too much though -- if you have
21a type you want to define, the chances are very good that you will only
22implement a handful of these.
23
24As you probably expect by now, we're going to go over this and give more
25information about the various handlers.  We won't go in the order they are
26defined in the structure, because there is a lot of historical baggage that
27impacts the ordering of the fields.  It's often easiest to find an example
28that includes the fields you need and then change the values to suit your new
29type. ::
30
31   const char *tp_name; /* For printing */
32
33The name of the type -- as mentioned in the previous chapter, this will appear in
34various places, almost entirely for diagnostic purposes. Try to choose something
35that will be helpful in such a situation! ::
36
37   Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */
38
39These fields tell the runtime how much memory to allocate when new objects of
40this type are created.  Python has some built-in support for variable length
41structures (think: strings, tuples) which is where the :c:member:`~PyTypeObject.tp_itemsize` field
42comes in.  This will be dealt with later. ::
43
44   const char *tp_doc;
45
46Here you can put a string (or its address) that you want returned when the
47Python script references ``obj.__doc__`` to retrieve the doc string.
48
49Now we come to the basic type methods -- the ones most extension types will
50implement.
51
52
53Finalization and De-allocation
54------------------------------
55
56.. index::
57   single: object; deallocation
58   single: deallocation, object
59   single: object; finalization
60   single: finalization, of objects
61
62::
63
64   destructor tp_dealloc;
65
66This function is called when the reference count of the instance of your type is
67reduced to zero and the Python interpreter wants to reclaim it.  If your type
68has memory to free or other clean-up to perform, you can put it here.  The
69object itself needs to be freed here as well.  Here is an example of this
70function::
71
72   static void
73   newdatatype_dealloc(newdatatypeobject *obj)
74   {
75       free(obj->obj_UnderlyingDatatypePtr);
76       Py_TYPE(obj)->tp_free(obj);
77   }
78
79.. index::
80   single: PyErr_Fetch()
81   single: PyErr_Restore()
82
83One important requirement of the deallocator function is that it leaves any
84pending exceptions alone.  This is important since deallocators are frequently
85called as the interpreter unwinds the Python stack; when the stack is unwound
86due to an exception (rather than normal returns), nothing is done to protect the
87deallocators from seeing that an exception has already been set.  Any actions
88which a deallocator performs which may cause additional Python code to be
89executed may detect that an exception has been set.  This can lead to misleading
90errors from the interpreter.  The proper way to protect against this is to save
91a pending exception before performing the unsafe action, and restoring it when
92done.  This can be done using the :c:func:`PyErr_Fetch` and
93:c:func:`PyErr_Restore` functions::
94
95   static void
96   my_dealloc(PyObject *obj)
97   {
98       MyObject *self = (MyObject *) obj;
99       PyObject *cbresult;
100
101       if (self->my_callback != NULL) {
102           PyObject *err_type, *err_value, *err_traceback;
103
104           /* This saves the current exception state */
105           PyErr_Fetch(&err_type, &err_value, &err_traceback);
106
107           cbresult = PyObject_CallObject(self->my_callback, NULL);
108           if (cbresult == NULL)
109               PyErr_WriteUnraisable(self->my_callback);
110           else
111               Py_DECREF(cbresult);
112
113           /* This restores the saved exception state */
114           PyErr_Restore(err_type, err_value, err_traceback);
115
116           Py_DECREF(self->my_callback);
117       }
118       Py_TYPE(obj)->tp_free((PyObject*)self);
119   }
120
121.. note::
122   There are limitations to what you can safely do in a deallocator function.
123   First, if your type supports garbage collection (using :c:member:`~PyTypeObject.tp_traverse`
124   and/or :c:member:`~PyTypeObject.tp_clear`), some of the object's members can have been
125   cleared or finalized by the time :c:member:`~PyTypeObject.tp_dealloc` is called.  Second, in
126   :c:member:`~PyTypeObject.tp_dealloc`, your object is in an unstable state: its reference
127   count is equal to zero.  Any call to a non-trivial object or API (as in the
128   example above) might end up calling :c:member:`~PyTypeObject.tp_dealloc` again, causing a
129   double free and a crash.
130
131   Starting with Python 3.4, it is recommended not to put any complex
132   finalization code in :c:member:`~PyTypeObject.tp_dealloc`, and instead use the new
133   :c:member:`~PyTypeObject.tp_finalize` type method.
134
135   .. seealso::
136      :pep:`442` explains the new finalization scheme.
137
138.. index::
139   single: string; object representation
140   builtin: repr
141
142Object Presentation
143-------------------
144
145In Python, there are two ways to generate a textual representation of an object:
146the :func:`repr` function, and the :func:`str` function.  (The :func:`print`
147function just calls :func:`str`.)  These handlers are both optional.
148
149::
150
151   reprfunc tp_repr;
152   reprfunc tp_str;
153
154The :c:member:`~PyTypeObject.tp_repr` handler should return a string object containing a
155representation of the instance for which it is called.  Here is a simple
156example::
157
158   static PyObject *
159   newdatatype_repr(newdatatypeobject * obj)
160   {
161       return PyUnicode_FromFormat("Repr-ified_newdatatype{{size:%d}}",
162                                   obj->obj_UnderlyingDatatypePtr->size);
163   }
164
165If no :c:member:`~PyTypeObject.tp_repr` handler is specified, the interpreter will supply a
166representation that uses the type's :c:member:`~PyTypeObject.tp_name` and a uniquely-identifying
167value for the object.
168
169The :c:member:`~PyTypeObject.tp_str` handler is to :func:`str` what the :c:member:`~PyTypeObject.tp_repr` handler
170described above is to :func:`repr`; that is, it is called when Python code calls
171:func:`str` on an instance of your object.  Its implementation is very similar
172to the :c:member:`~PyTypeObject.tp_repr` function, but the resulting string is intended for human
173consumption.  If :c:member:`~PyTypeObject.tp_str` is not specified, the :c:member:`~PyTypeObject.tp_repr` handler is
174used instead.
175
176Here is a simple example::
177
178   static PyObject *
179   newdatatype_str(newdatatypeobject * obj)
180   {
181       return PyUnicode_FromFormat("Stringified_newdatatype{{size:%d}}",
182                                   obj->obj_UnderlyingDatatypePtr->size);
183   }
184
185
186
187Attribute Management
188--------------------
189
190For every object which can support attributes, the corresponding type must
191provide the functions that control how the attributes are resolved.  There needs
192to be a function which can retrieve attributes (if any are defined), and another
193to set attributes (if setting attributes is allowed).  Removing an attribute is
194a special case, for which the new value passed to the handler is ``NULL``.
195
196Python supports two pairs of attribute handlers; a type that supports attributes
197only needs to implement the functions for one pair.  The difference is that one
198pair takes the name of the attribute as a :c:type:`char\*`, while the other
199accepts a :c:type:`PyObject\*`.  Each type can use whichever pair makes more
200sense for the implementation's convenience. ::
201
202   getattrfunc  tp_getattr;        /* char * version */
203   setattrfunc  tp_setattr;
204   /* ... */
205   getattrofunc tp_getattro;       /* PyObject * version */
206   setattrofunc tp_setattro;
207
208If accessing attributes of an object is always a simple operation (this will be
209explained shortly), there are generic implementations which can be used to
210provide the :c:type:`PyObject\*` version of the attribute management functions.
211The actual need for type-specific attribute handlers almost completely
212disappeared starting with Python 2.2, though there are many examples which have
213not been updated to use some of the new generic mechanism that is available.
214
215
216.. _generic-attribute-management:
217
218Generic Attribute Management
219^^^^^^^^^^^^^^^^^^^^^^^^^^^^
220
221Most extension types only use *simple* attributes.  So, what makes the
222attributes simple?  There are only a couple of conditions that must be met:
223
224#. The name of the attributes must be known when :c:func:`PyType_Ready` is
225   called.
226
227#. No special processing is needed to record that an attribute was looked up or
228   set, nor do actions need to be taken based on the value.
229
230Note that this list does not place any restrictions on the values of the
231attributes, when the values are computed, or how relevant data is stored.
232
233When :c:func:`PyType_Ready` is called, it uses three tables referenced by the
234type object to create :term:`descriptor`\s which are placed in the dictionary of the
235type object.  Each descriptor controls access to one attribute of the instance
236object.  Each of the tables is optional; if all three are ``NULL``, instances of
237the type will only have attributes that are inherited from their base type, and
238should leave the :c:member:`~PyTypeObject.tp_getattro` and :c:member:`~PyTypeObject.tp_setattro` fields ``NULL`` as
239well, allowing the base type to handle attributes.
240
241The tables are declared as three fields of the type object::
242
243   struct PyMethodDef *tp_methods;
244   struct PyMemberDef *tp_members;
245   struct PyGetSetDef *tp_getset;
246
247If :c:member:`~PyTypeObject.tp_methods` is not ``NULL``, it must refer to an array of
248:c:type:`PyMethodDef` structures.  Each entry in the table is an instance of this
249structure::
250
251   typedef struct PyMethodDef {
252       const char  *ml_name;       /* method name */
253       PyCFunction  ml_meth;       /* implementation function */
254       int          ml_flags;      /* flags */
255       const char  *ml_doc;        /* docstring */
256   } PyMethodDef;
257
258One entry should be defined for each method provided by the type; no entries are
259needed for methods inherited from a base type.  One additional entry is needed
260at the end; it is a sentinel that marks the end of the array.  The
261:attr:`ml_name` field of the sentinel must be ``NULL``.
262
263The second table is used to define attributes which map directly to data stored
264in the instance.  A variety of primitive C types are supported, and access may
265be read-only or read-write.  The structures in the table are defined as::
266
267   typedef struct PyMemberDef {
268       const char *name;
269       int         type;
270       int         offset;
271       int         flags;
272       const char *doc;
273   } PyMemberDef;
274
275For each entry in the table, a :term:`descriptor` will be constructed and added to the
276type which will be able to extract a value from the instance structure.  The
277:attr:`type` field should contain one of the type codes defined in the
278:file:`structmember.h` header; the value will be used to determine how to
279convert Python values to and from C values.  The :attr:`flags` field is used to
280store flags which control how the attribute can be accessed.
281
282The following flag constants are defined in :file:`structmember.h`; they may be
283combined using bitwise-OR.
284
285+---------------------------+----------------------------------------------+
286| Constant                  | Meaning                                      |
287+===========================+==============================================+
288| :const:`READONLY`         | Never writable.                              |
289+---------------------------+----------------------------------------------+
290| :const:`READ_RESTRICTED`  | Not readable in restricted mode.             |
291+---------------------------+----------------------------------------------+
292| :const:`WRITE_RESTRICTED` | Not writable in restricted mode.             |
293+---------------------------+----------------------------------------------+
294| :const:`RESTRICTED`       | Not readable or writable in restricted mode. |
295+---------------------------+----------------------------------------------+
296
297.. index::
298   single: READONLY
299   single: READ_RESTRICTED
300   single: WRITE_RESTRICTED
301   single: RESTRICTED
302
303An interesting advantage of using the :c:member:`~PyTypeObject.tp_members` table to build
304descriptors that are used at runtime is that any attribute defined this way can
305have an associated doc string simply by providing the text in the table.  An
306application can use the introspection API to retrieve the descriptor from the
307class object, and get the doc string using its :attr:`__doc__` attribute.
308
309As with the :c:member:`~PyTypeObject.tp_methods` table, a sentinel entry with a :attr:`name` value
310of ``NULL`` is required.
311
312.. XXX Descriptors need to be explained in more detail somewhere, but not here.
313
314   Descriptor objects have two handler functions which correspond to the
315   \member{tp_getattro} and \member{tp_setattro} handlers.  The
316   \method{__get__()} handler is a function which is passed the descriptor,
317   instance, and type objects, and returns the value of the attribute, or it
318   returns \NULL{} and sets an exception.  The \method{__set__()} handler is
319   passed the descriptor, instance, type, and new value;
320
321
322Type-specific Attribute Management
323^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
324
325For simplicity, only the :c:type:`char\*` version will be demonstrated here; the
326type of the name parameter is the only difference between the :c:type:`char\*`
327and :c:type:`PyObject\*` flavors of the interface. This example effectively does
328the same thing as the generic example above, but does not use the generic
329support added in Python 2.2.  It explains how the handler functions are
330called, so that if you do need to extend their functionality, you'll understand
331what needs to be done.
332
333The :c:member:`~PyTypeObject.tp_getattr` handler is called when the object requires an attribute
334look-up.  It is called in the same situations where the :meth:`__getattr__`
335method of a class would be called.
336
337Here is an example::
338
339   static PyObject *
340   newdatatype_getattr(newdatatypeobject *obj, char *name)
341   {
342       if (strcmp(name, "data") == 0)
343       {
344           return PyLong_FromLong(obj->data);
345       }
346
347       PyErr_Format(PyExc_AttributeError,
348                    "'%.50s' object has no attribute '%.400s'",
349                    tp->tp_name, name);
350       return NULL;
351   }
352
353The :c:member:`~PyTypeObject.tp_setattr` handler is called when the :meth:`__setattr__` or
354:meth:`__delattr__` method of a class instance would be called.  When an
355attribute should be deleted, the third parameter will be ``NULL``.  Here is an
356example that simply raises an exception; if this were really all you wanted, the
357:c:member:`~PyTypeObject.tp_setattr` handler should be set to ``NULL``. ::
358
359   static int
360   newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v)
361   {
362       PyErr_Format(PyExc_RuntimeError, "Read-only attribute: %s", name);
363       return -1;
364   }
365
366Object Comparison
367-----------------
368
369::
370
371   richcmpfunc tp_richcompare;
372
373The :c:member:`~PyTypeObject.tp_richcompare` handler is called when comparisons are needed.  It is
374analogous to the :ref:`rich comparison methods <richcmpfuncs>`, like
375:meth:`__lt__`, and also called by :c:func:`PyObject_RichCompare` and
376:c:func:`PyObject_RichCompareBool`.
377
378This function is called with two Python objects and the operator as arguments,
379where the operator is one of ``Py_EQ``, ``Py_NE``, ``Py_LE``, ``Py_GT``,
380``Py_LT`` or ``Py_GT``.  It should compare the two objects with respect to the
381specified operator and return ``Py_True`` or ``Py_False`` if the comparison is
382successful, ``Py_NotImplemented`` to indicate that comparison is not
383implemented and the other object's comparison method should be tried, or ``NULL``
384if an exception was set.
385
386Here is a sample implementation, for a datatype that is considered equal if the
387size of an internal pointer is equal::
388
389   static PyObject *
390   newdatatype_richcmp(PyObject *obj1, PyObject *obj2, int op)
391   {
392       PyObject *result;
393       int c, size1, size2;
394
395       /* code to make sure that both arguments are of type
396          newdatatype omitted */
397
398       size1 = obj1->obj_UnderlyingDatatypePtr->size;
399       size2 = obj2->obj_UnderlyingDatatypePtr->size;
400
401       switch (op) {
402       case Py_LT: c = size1 <  size2; break;
403       case Py_LE: c = size1 <= size2; break;
404       case Py_EQ: c = size1 == size2; break;
405       case Py_NE: c = size1 != size2; break;
406       case Py_GT: c = size1 >  size2; break;
407       case Py_GE: c = size1 >= size2; break;
408       }
409       result = c ? Py_True : Py_False;
410       Py_INCREF(result);
411       return result;
412    }
413
414
415Abstract Protocol Support
416-------------------------
417
418Python supports a variety of *abstract* 'protocols;' the specific interfaces
419provided to use these interfaces are documented in :ref:`abstract`.
420
421
422A number of these abstract interfaces were defined early in the development of
423the Python implementation.  In particular, the number, mapping, and sequence
424protocols have been part of Python since the beginning.  Other protocols have
425been added over time.  For protocols which depend on several handler routines
426from the type implementation, the older protocols have been defined as optional
427blocks of handlers referenced by the type object.  For newer protocols there are
428additional slots in the main type object, with a flag bit being set to indicate
429that the slots are present and should be checked by the interpreter.  (The flag
430bit does not indicate that the slot values are non-``NULL``. The flag may be set
431to indicate the presence of a slot, but a slot may still be unfilled.) ::
432
433   PyNumberMethods   *tp_as_number;
434   PySequenceMethods *tp_as_sequence;
435   PyMappingMethods  *tp_as_mapping;
436
437If you wish your object to be able to act like a number, a sequence, or a
438mapping object, then you place the address of a structure that implements the C
439type :c:type:`PyNumberMethods`, :c:type:`PySequenceMethods`, or
440:c:type:`PyMappingMethods`, respectively. It is up to you to fill in this
441structure with appropriate values. You can find examples of the use of each of
442these in the :file:`Objects` directory of the Python source distribution. ::
443
444   hashfunc tp_hash;
445
446This function, if you choose to provide it, should return a hash number for an
447instance of your data type. Here is a simple example::
448
449   static Py_hash_t
450   newdatatype_hash(newdatatypeobject *obj)
451   {
452       Py_hash_t result;
453       result = obj->some_size + 32767 * obj->some_number;
454       if (result == -1)
455          result = -2;
456       return result;
457   }
458
459:c:type:`Py_hash_t` is a signed integer type with a platform-varying width.
460Returning ``-1`` from :c:member:`~PyTypeObject.tp_hash` indicates an error,
461which is why you should be careful to avoid returning it when hash computation
462is successful, as seen above.
463
464::
465
466   ternaryfunc tp_call;
467
468This function is called when an instance of your data type is "called", for
469example, if ``obj1`` is an instance of your data type and the Python script
470contains ``obj1('hello')``, the :c:member:`~PyTypeObject.tp_call` handler is invoked.
471
472This function takes three arguments:
473
474#. *self* is the instance of the data type which is the subject of the call.
475   If the call is ``obj1('hello')``, then *self* is ``obj1``.
476
477#. *args* is a tuple containing the arguments to the call.  You can use
478   :c:func:`PyArg_ParseTuple` to extract the arguments.
479
480#. *kwds* is a dictionary of keyword arguments that were passed. If this is
481   non-``NULL`` and you support keyword arguments, use
482   :c:func:`PyArg_ParseTupleAndKeywords` to extract the arguments.  If you
483   do not want to support keyword arguments and this is non-``NULL``, raise a
484   :exc:`TypeError` with a message saying that keyword arguments are not supported.
485
486Here is a toy ``tp_call`` implementation::
487
488   static PyObject *
489   newdatatype_call(newdatatypeobject *self, PyObject *args, PyObject *kwds)
490   {
491       PyObject *result;
492       const char *arg1;
493       const char *arg2;
494       const char *arg3;
495
496       if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) {
497           return NULL;
498       }
499       result = PyUnicode_FromFormat(
500           "Returning -- value: [%d] arg1: [%s] arg2: [%s] arg3: [%s]\n",
501           obj->obj_UnderlyingDatatypePtr->size,
502           arg1, arg2, arg3);
503       return result;
504   }
505
506::
507
508   /* Iterators */
509   getiterfunc tp_iter;
510   iternextfunc tp_iternext;
511
512These functions provide support for the iterator protocol.  Both handlers
513take exactly one parameter, the instance for which they are being called,
514and return a new reference.  In the case of an error, they should set an
515exception and return ``NULL``.  :c:member:`~PyTypeObject.tp_iter` corresponds
516to the Python :meth:`__iter__` method, while :c:member:`~PyTypeObject.tp_iternext`
517corresponds to the Python :meth:`~iterator.__next__` method.
518
519Any :term:`iterable` object must implement the :c:member:`~PyTypeObject.tp_iter`
520handler, which must return an :term:`iterator` object.  Here the same guidelines
521apply as for Python classes:
522
523* For collections (such as lists and tuples) which can support multiple
524  independent iterators, a new iterator should be created and returned by
525  each call to :c:member:`~PyTypeObject.tp_iter`.
526* Objects which can only be iterated over once (usually due to side effects of
527  iteration, such as file objects) can implement :c:member:`~PyTypeObject.tp_iter`
528  by returning a new reference to themselves -- and should also therefore
529  implement the :c:member:`~PyTypeObject.tp_iternext`  handler.
530
531Any :term:`iterator` object should implement both :c:member:`~PyTypeObject.tp_iter`
532and :c:member:`~PyTypeObject.tp_iternext`.  An iterator's
533:c:member:`~PyTypeObject.tp_iter` handler should return a new reference
534to the iterator.  Its :c:member:`~PyTypeObject.tp_iternext` handler should
535return a new reference to the next object in the iteration, if there is one.
536If the iteration has reached the end, :c:member:`~PyTypeObject.tp_iternext`
537may return ``NULL`` without setting an exception, or it may set
538:exc:`StopIteration` *in addition* to returning ``NULL``; avoiding
539the exception can yield slightly better performance.  If an actual error
540occurs, :c:member:`~PyTypeObject.tp_iternext` should always set an exception
541and return ``NULL``.
542
543
544.. _weakref-support:
545
546Weak Reference Support
547----------------------
548
549One of the goals of Python's weak reference implementation is to allow any type
550to participate in the weak reference mechanism without incurring the overhead on
551performance-critical objects (such as numbers).
552
553.. seealso::
554   Documentation for the :mod:`weakref` module.
555
556For an object to be weakly referencable, the extension type must do two things:
557
558#. Include a :c:type:`PyObject\*` field in the C object structure dedicated to
559   the weak reference mechanism.  The object's constructor should leave it
560   ``NULL`` (which is automatic when using the default
561   :c:member:`~PyTypeObject.tp_alloc`).
562
563#. Set the :c:member:`~PyTypeObject.tp_weaklistoffset` type member
564   to the offset of the aforementioned field in the C object structure,
565   so that the interpreter knows how to access and modify that field.
566
567Concretely, here is how a trivial object structure would be augmented
568with the required field::
569
570   typedef struct {
571       PyObject_HEAD
572       PyObject *weakreflist;  /* List of weak references */
573   } TrivialObject;
574
575And the corresponding member in the statically-declared type object::
576
577   static PyTypeObject TrivialType = {
578       PyVarObject_HEAD_INIT(NULL, 0)
579       /* ... other members omitted for brevity ... */
580       .tp_weaklistoffset = offsetof(TrivialObject, weakreflist),
581   };
582
583The only further addition is that ``tp_dealloc`` needs to clear any weak
584references (by calling :c:func:`PyObject_ClearWeakRefs`) if the field is
585non-``NULL``::
586
587   static void
588   Trivial_dealloc(TrivialObject *self)
589   {
590       /* Clear weakrefs first before calling any destructors */
591       if (self->weakreflist != NULL)
592           PyObject_ClearWeakRefs((PyObject *) self);
593       /* ... remainder of destruction code omitted for brevity ... */
594       Py_TYPE(self)->tp_free((PyObject *) self);
595   }
596
597
598More Suggestions
599----------------
600
601In order to learn how to implement any specific method for your new data type,
602get the :term:`CPython` source code.  Go to the :file:`Objects` directory,
603then search the C source files for ``tp_`` plus the function you want
604(for example, ``tp_richcompare``).  You will find examples of the function
605you want to implement.
606
607When you need to verify that an object is a concrete instance of the type you
608are implementing, use the :c:func:`PyObject_TypeCheck` function.  A sample of
609its use might be something like the following::
610
611   if (!PyObject_TypeCheck(some_object, &MyType)) {
612       PyErr_SetString(PyExc_TypeError, "arg #1 not a mything");
613       return NULL;
614   }
615
616.. seealso::
617   Download CPython source releases.
618      https://www.python.org/downloads/source/
619
620   The CPython project on GitHub, where the CPython source code is developed.
621      https://github.com/python/cpython
622