• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1+++++++++++++++++++++++++++++++++++++++++++
2 Building Hybrid Systems with Boost.Python
3+++++++++++++++++++++++++++++++++++++++++++
4
5:Author: David Abrahams
6:Contact: dave@boost-consulting.com
7:organization: `Boost Consulting`_
8:date: 2003-05-14
9
10:Author: Ralf W. Grosse-Kunstleve
11
12:copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved
13
14.. contents:: Table of Contents
15
16.. _`Boost Consulting`: http://www.boost-consulting.com
17
18==========
19 Abstract
20==========
21
22Boost.Python is an open source C++ library which provides a concise
23IDL-like interface for binding C++ classes and functions to
24Python. Leveraging the full power of C++ compile-time introspection
25and of recently developed metaprogramming techniques, this is achieved
26entirely in pure C++, without introducing a new syntax.
27Boost.Python's rich set of features and high-level interface make it
28possible to engineer packages from the ground up as hybrid systems,
29giving programmers easy and coherent access to both the efficient
30compile-time polymorphism of C++ and the extremely convenient run-time
31polymorphism of Python.
32
33==============
34 Introduction
35==============
36
37Python and C++ are in many ways as different as two languages could
38be: while C++ is usually compiled to machine-code, Python is
39interpreted.  Python's dynamic type system is often cited as the
40foundation of its flexibility, while in C++ static typing is the
41cornerstone of its efficiency. C++ has an intricate and difficult
42compile-time meta-language, while in Python, practically everything
43happens at runtime.
44
45Yet for many programmers, these very differences mean that Python and
46C++ complement one another perfectly.  Performance bottlenecks in
47Python programs can be rewritten in C++ for maximal speed, and
48authors of powerful C++ libraries choose Python as a middleware
49language for its flexible system integration capabilities.
50Furthermore, the surface differences mask some strong similarities:
51
52* 'C'-family control structures (if, while, for...)
53
54* Support for object-orientation, functional programming, and generic
55  programming (these are both *multi-paradigm* programming languages.)
56
57* Comprehensive operator overloading facilities, recognizing the
58  importance of syntactic variability for readability and
59  expressivity.
60
61* High-level concepts such as collections and iterators.
62
63* High-level encapsulation facilities (C++: namespaces, Python: modules)
64  to support the design of re-usable libraries.
65
66* Exception-handling for effective management of error conditions.
67
68* C++ idioms in common use, such as handle/body classes and
69  reference-counted smart pointers mirror Python reference semantics.
70
71Given Python's rich 'C' interoperability API, it should in principle
72be possible to expose C++ type and function interfaces to Python with
73an analogous interface to their C++ counterparts.  However, the
74facilities provided by Python alone for integration with C++ are
75relatively meager.  Compared to C++ and Python, 'C' has only very
76rudimentary abstraction facilities, and support for exception-handling
77is completely missing.  'C' extension module writers are required to
78manually manage Python reference counts, which is both annoyingly
79tedious and extremely error-prone. Traditional extension modules also
80tend to contain a great deal of boilerplate code repetition which
81makes them difficult to maintain, especially when wrapping an evolving
82API.
83
84These limitations have lead to the development of a variety of wrapping
85systems.  SWIG_ is probably the most popular package for the
86integration of C/C++ and Python. A more recent development is SIP_,
87which was specifically designed for interfacing Python with the Qt_
88graphical user interface library.  Both SWIG and SIP introduce their
89own specialized languages for customizing inter-language bindings.
90This has certain advantages, but having to deal with three different
91languages (Python, C/C++ and the interface language) also introduces
92practical and mental difficulties.  The CXX_ package demonstrates an
93interesting alternative.  It shows that at least some parts of
94Python's 'C' API can be wrapped and presented through a much more
95user-friendly C++ interface. However, unlike SWIG and SIP, CXX does
96not include support for wrapping C++ classes as new Python types.
97
98The features and goals of Boost.Python_ overlap significantly with
99many of these other systems.  That said, Boost.Python attempts to
100maximize convenience and flexibility without introducing a separate
101wrapping language.  Instead, it presents the user with a high-level
102C++ interface for wrapping C++ classes and functions, managing much of
103the complexity behind-the-scenes with static metaprogramming.
104Boost.Python also goes beyond the scope of earlier systems by
105providing:
106
107* Support for C++ virtual functions that can be overridden in Python.
108
109* Comprehensive lifetime management facilities for low-level C++
110  pointers and references.
111
112* Support for organizing extensions as Python packages,
113  with a central registry for inter-language type conversions.
114
115* A safe and convenient mechanism for tying into Python's powerful
116  serialization engine (pickle).
117
118* Coherence with the rules for handling C++ lvalues and rvalues that
119  can only come from a deep understanding of both the Python and C++
120  type systems.
121
122The key insight that sparked the development of Boost.Python is that
123much of the boilerplate code in traditional extension modules could be
124eliminated using C++ compile-time introspection.  Each argument of a
125wrapped C++ function must be extracted from a Python object using a
126procedure that depends on the argument type.  Similarly the function's
127return type determines how the return value will be converted from C++
128to Python.  Of course argument and return types are part of each
129function's type, and this is exactly the source from which
130Boost.Python deduces most of the information required.
131
132This approach leads to *user guided wrapping*: as much information is
133extracted directly from the source code to be wrapped as is possible
134within the framework of pure C++, and some additional information is
135supplied explicitly by the user.  Mostly the guidance is mechanical
136and little real intervention is required.  Because the interface
137specification is written in the same full-featured language as the
138code being exposed, the user has unprecedented power available when
139she does need to take control.
140
141.. _Python: http://www.python.org/
142.. _SWIG: http://www.swig.org/
143.. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php
144.. _Qt: http://www.trolltech.com/
145.. _CXX: http://cxx.sourceforge.net/
146.. _Boost.Python: http://www.boost.org/libs/python/doc
147
148===========================
149 Boost.Python Design Goals
150===========================
151
152The primary goal of Boost.Python is to allow users to expose C++
153classes and functions to Python using nothing more than a C++
154compiler.  In broad strokes, the user experience should be one of
155directly manipulating C++ objects from Python.
156
157However, it's also important not to translate all interfaces *too*
158literally: the idioms of each language must be respected.  For
159example, though C++ and Python both have an iterator concept, they are
160expressed very differently.  Boost.Python has to be able to bridge the
161interface gap.
162
163It must be possible to insulate Python users from crashes resulting
164from trivial misuses of C++ interfaces, such as accessing
165already-deleted objects.  By the same token the library should
166insulate C++ users from low-level Python 'C' API, replacing
167error-prone 'C' interfaces like manual reference-count management and
168raw ``PyObject`` pointers with more-robust alternatives.
169
170Support for component-based development is crucial, so that C++ types
171exposed in one extension module can be passed to functions exposed in
172another without loss of crucial information like C++ inheritance
173relationships.
174
175Finally, all wrapping must be *non-intrusive*, without modifying or
176even seeing the original C++ source code.  Existing C++ libraries have
177to be wrappable by third parties who only have access to header files
178and binaries.
179
180==========================
181 Hello Boost.Python World
182==========================
183
184And now for a preview of Boost.Python, and how it improves on the raw
185facilities offered by Python. Here's a function we might want to
186expose::
187
188    char const* greet(unsigned x)
189    {
190       static char const* const msgs[] = { "hello", "Boost.Python", "world!" };
191
192       if (x > 2)
193           throw std::range_error("greet: index out of range");
194
195       return msgs[x];
196    }
197
198To wrap this function in standard C++ using the Python 'C' API, we'd
199need something like this::
200
201    extern "C" // all Python interactions use 'C' linkage and calling convention
202    {
203        // Wrapper to handle argument/result conversion and checking
204        PyObject* greet_wrap(PyObject* args, PyObject * keywords)
205        {
206             int x;
207             if (PyArg_ParseTuple(args, "i", &x))    // extract/check arguments
208             {
209                 char const* result = greet(x);      // invoke wrapped function
210                 return PyString_FromString(result); // convert result to Python
211             }
212             return 0;                               // error occurred
213        }
214
215        // Table of wrapped functions to be exposed by the module
216        static PyMethodDef methods[] = {
217            { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
218            , { NULL, NULL, 0, NULL } // sentinel
219        };
220
221        // module initialization function
222        DL_EXPORT init_hello()
223        {
224            (void) Py_InitModule("hello", methods); // add the methods to the module
225        }
226    }
227
228Now here's the wrapping code we'd use to expose it with Boost.Python::
229
230    #include <boost/python.hpp>
231    using namespace boost::python;
232    BOOST_PYTHON_MODULE(hello)
233    {
234        def("greet", greet, "return one of 3 parts of a greeting");
235    }
236
237and here it is in action::
238
239    >>> import hello
240    >>> for x in range(3):
241    ...     print hello.greet(x)
242    ...
243    hello
244    Boost.Python
245    world!
246
247Aside from the fact that the 'C' API version is much more verbose,
248it's worth noting a few things that it doesn't handle correctly:
249
250* The original function accepts an unsigned integer, and the Python
251  'C' API only gives us a way of extracting signed integers. The
252  Boost.Python version will raise a Python exception if we try to pass
253  a negative number to ``hello.greet``, but the other one will proceed
254  to do whatever the C++ implementation does when converting an
255  negative integer to unsigned (usually wrapping to some very large
256  number), and pass the incorrect translation on to the wrapped
257  function.
258
259* That brings us to the second problem: if the C++ ``greet()``
260  function is called with a number greater than 2, it will throw an
261  exception.  Typically, if a C++ exception propagates across the
262  boundary with code generated by a 'C' compiler, it will cause a
263  crash.  As you can see in the first version, there's no C++
264  scaffolding there to prevent this from happening.  Functions wrapped
265  by Boost.Python automatically include an exception-handling layer
266  which protects Python users by translating unhandled C++ exceptions
267  into a corresponding Python exception.
268
269* A slightly more-subtle limitation is that the argument conversion
270  used in the Python 'C' API case can only get that integer ``x`` in
271  *one way*.  PyArg_ParseTuple can't convert Python ``long`` objects
272  (arbitrary-precision integers) which happen to fit in an ``unsigned
273  int`` but not in a ``signed long``, nor will it ever handle a
274  wrapped C++ class with a user-defined implicit ``operator unsigned
275  int()`` conversion. Boost.Python's dynamic type conversion
276  registry allows users to add arbitrary conversion methods.
277
278==================
279 Library Overview
280==================
281
282This section outlines some of the library's major features.  Except as
283neccessary to avoid confusion, details of library implementation are
284omitted.
285
286------------------
287 Exposing Classes
288------------------
289
290C++ classes and structs are exposed with a similarly-terse interface.
291Given::
292
293    struct World
294    {
295        void set(std::string msg) { this->msg = msg; }
296        std::string greet() { return msg; }
297        std::string msg;
298    };
299
300The following code will expose it in our extension module::
301
302    #include <boost/python.hpp>
303    BOOST_PYTHON_MODULE(hello)
304    {
305        class_<World>("World")
306            .def("greet", &World::greet)
307            .def("set", &World::set)
308        ;
309    }
310
311Although this code has a certain pythonic familiarity, people
312sometimes find the syntax bit confusing because it doesn't look like
313most of the C++ code they're used to. All the same, this is just
314standard C++.  Because of their flexible syntax and operator
315overloading, C++ and Python are great for defining domain-specific
316(sub)languages
317(DSLs), and that's what we've done in Boost.Python. To break it down::
318
319    class_<World>("World")
320
321constructs an unnamed object of type ``class_<World>`` and passes
322``"World"`` to its constructor.  This creates a new-style Python class
323called ``World`` in the extension module, and associates it with the
324C++ type ``World`` in the Boost.Python type conversion registry.  We
325might have also written::
326
327    class_<World> w("World");
328
329but that would've been more verbose, since we'd have to name ``w``
330again to invoke its ``def()`` member function::
331
332        w.def("greet", &World::greet)
333
334There's nothing special about the location of the dot for member
335access in the original example: C++ allows any amount of whitespace on
336either side of a token, and placing the dot at the beginning of each
337line allows us to chain as many successive calls to member functions
338as we like with a uniform syntax.  The other key fact that allows
339chaining is that ``class_<>`` member functions all return a reference
340to ``*this``.
341
342So the example is equivalent to::
343
344    class_<World> w("World");
345    w.def("greet", &World::greet);
346    w.def("set", &World::set);
347
348It's occasionally useful to be able to break down the components of a
349Boost.Python class wrapper in this way, but the rest of this article
350will stick to the terse syntax.
351
352For completeness, here's the wrapped class in use: ::
353
354    >>> import hello
355    >>> planet = hello.World()
356    >>> planet.set('howdy')
357    >>> planet.greet()
358    'howdy'
359
360Constructors
361============
362
363Since our ``World`` class is just a plain ``struct``, it has an
364implicit no-argument (nullary) constructor.  Boost.Python exposes the
365nullary constructor by default, which is why we were able to write: ::
366
367  >>> planet = hello.World()
368
369However, well-designed classes in any language may require constructor
370arguments in order to establish their invariants.  Unlike Python,
371where ``__init__`` is just a specially-named method, In C++
372constructors cannot be handled like ordinary member functions.  In
373particular, we can't take their address: ``&World::World`` is an
374error.  The library provides a different interface for specifying
375constructors.  Given::
376
377    struct World
378    {
379        World(std::string msg); // added constructor
380        ...
381
382we can modify our wrapping code as follows::
383
384    class_<World>("World", init<std::string>())
385        ...
386
387of course, a C++ class may have additional constructors, and we can
388expose those as well by passing more instances of ``init<...>`` to
389``def()``::
390
391    class_<World>("World", init<std::string>())
392        .def(init<double, double>())
393        ...
394
395Boost.Python allows wrapped functions, member functions, and
396constructors to be overloaded to mirror C++ overloading.
397
398Data Members and Properties
399===========================
400
401Any publicly-accessible data members in a C++ class can be easily
402exposed as either ``readonly`` or ``readwrite`` attributes::
403
404    class_<World>("World", init<std::string>())
405        .def_readonly("msg", &World::msg)
406        ...
407
408and can be used directly in Python: ::
409
410    >>> planet = hello.World('howdy')
411    >>> planet.msg
412    'howdy'
413
414This does *not* result in adding attributes to the ``World`` instance
415``__dict__``, which can result in substantial memory savings when
416wrapping large data structures.  In fact, no instance ``__dict__``
417will be created at all unless attributes are explicitly added from
418Python. Boost.Python owes this capability to the new Python 2.2 type
419system, in particular the descriptor interface and ``property`` type.
420
421In C++, publicly-accessible data members are considered a sign of poor
422design because they break encapsulation, and style guides usually
423dictate the use of "getter" and "setter" functions instead.  In
424Python, however, ``__getattr__``, ``__setattr__``, and since 2.2,
425``property`` mean that attribute access is just one more
426well-encapsulated syntactic tool at the programmer's disposal.
427Boost.Python bridges this idiomatic gap by making Python ``property``
428creation directly available to users.  If ``msg`` were private, we
429could still expose it as attribute in Python as follows::
430
431    class_<World>("World", init<std::string>())
432        .add_property("msg", &World::greet, &World::set)
433        ...
434
435The example above mirrors the familiar usage of properties in Python
4362.2+: ::
437
438    >>> class World(object):
439    ...     __init__(self, msg):
440    ...         self.__msg = msg
441    ...     def greet(self):
442    ...         return self.__msg
443    ...     def set(self, msg):
444    ...         self.__msg = msg
445    ...     msg = property(greet, set)
446
447Operator Overloading
448====================
449
450The ability to write arithmetic operators for user-defined types has
451been a major factor in the success of both languages for numerical
452computation, and the success of packages like NumPy_ attests to the
453power of exposing operators in extension modules.  Boost.Python
454provides a concise mechanism for wrapping operator overloads. The
455example below shows a fragment from a wrapper for the Boost rational
456number library::
457
458    class_<rational<int> >("rational_int")
459      .def(init<int, int>()) // constructor, e.g. rational_int(3,4)
460      .def("numerator", &rational<int>::numerator)
461      .def("denominator", &rational<int>::denominator)
462      .def(-self)        // __neg__ (unary minus)
463      .def(self + self)  // __add__ (homogeneous)
464      .def(self * self)  // __mul__
465      .def(self + int()) // __add__ (heterogenous)
466      .def(int() + self) // __radd__
467      ...
468
469The magic is performed using a simplified application of "expression
470templates" [VELD1995]_, a technique originally developed for
471optimization of high-performance matrix algebra expressions.  The
472essence is that instead of performing the computation immediately,
473operators are overloaded to construct a type *representing* the
474computation.  In matrix algebra, dramatic optimizations are often
475available when the structure of an entire expression can be taken into
476account, rather than evaluating each operation "greedily".
477Boost.Python uses the same technique to build an appropriate Python
478method object based on expressions involving ``self``.
479
480.. _NumPy: http://www.pfdubois.com/numpy/
481
482Inheritance
483===========
484
485C++ inheritance relationships can be represented to Boost.Python by adding
486an optional ``bases<...>`` argument to the ``class_<...>`` template
487parameter list as follows::
488
489     class_<Derived, bases<Base1,Base2> >("Derived")
490          ...
491
492This has two effects:
493
4941. When the ``class_<...>`` is created, Python type objects
495   corresponding to ``Base1`` and ``Base2`` are looked up in
496   Boost.Python's registry, and are used as bases for the new Python
497   ``Derived`` type object, so methods exposed for the Python ``Base1``
498   and ``Base2`` types are automatically members of the ``Derived``
499   type.  Because the registry is global, this works correctly even if
500   ``Derived`` is exposed in a different module from either of its
501   bases.
502
5032. C++ conversions from ``Derived`` to its bases are added to the
504   Boost.Python registry.  Thus wrapped C++ methods expecting (a
505   pointer or reference to) an object of either base type can be
506   called with an object wrapping a ``Derived`` instance.  Wrapped
507   member functions of class ``T`` are treated as though they have an
508   implicit first argument of ``T&``, so these conversions are
509   neccessary to allow the base class methods to be called for derived
510   objects.
511
512Of course it's possible to derive new Python classes from wrapped C++
513class instances.  Because Boost.Python uses the new-style class
514system, that works very much as for the Python built-in types.  There
515is one significant detail in which it differs: the built-in types
516generally establish their invariants in their ``__new__`` function, so
517that derived classes do not need to call ``__init__`` on the base
518class before invoking its methods : ::
519
520    >>> class L(list):
521    ...      def __init__(self):
522    ...          pass
523    ...
524    >>> L().reverse()
525    >>>
526
527Because C++ object construction is a one-step operation, C++ instance
528data cannot be constructed until the arguments are available, in the
529``__init__`` function: ::
530
531    >>> class D(SomeBoostPythonClass):
532    ...      def __init__(self):
533    ...          pass
534    ...
535    >>> D().some_boost_python_method()
536    Traceback (most recent call last):
537      File "<stdin>", line 1, in ?
538    TypeError: bad argument type for built-in operation
539
540This happened because Boost.Python couldn't find instance data of type
541``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__``
542function masked construction of the base class.  It could be corrected
543by either removing ``D``'s ``__init__`` function or having it call
544``SomeBoostPythonClass.__init__(...)`` explicitly.
545
546Virtual Functions
547=================
548
549Deriving new types in Python from extension classes is not very
550interesting unless they can be used polymorphically from C++.  In
551other words, Python method implementations should appear to override
552the implementation of C++ virtual functions when called *through base
553class pointers/references from C++*.  Since the only way to alter the
554behavior of a virtual function is to override it in a derived class,
555the user must build a special derived class to dispatch a polymorphic
556class' virtual functions::
557
558    //
559    // interface to wrap:
560    //
561    class Base
562    {
563     public:
564        virtual int f(std::string x) { return 42; }
565        virtual ~Base();
566    };
567
568    int calls_f(Base const& b, std::string x) { return b.f(x); }
569
570    //
571    // Wrapping Code
572    //
573
574    // Dispatcher class
575    struct BaseWrap : Base
576    {
577        // Store a pointer to the Python object
578        BaseWrap(PyObject* self_) : self(self_) {}
579        PyObject* self;
580
581        // Default implementation, for when f is not overridden
582        int f_default(std::string x) { return this->Base::f(x); }
583        // Dispatch implementation
584        int f(std::string x) { return call_method<int>(self, "f", x); }
585    };
586
587    ...
588        def("calls_f", calls_f);
589        class_<Base, BaseWrap>("Base")
590            .def("f", &Base::f, &BaseWrap::f_default)
591            ;
592
593Now here's some Python code which demonstrates: ::
594
595    >>> class Derived(Base):
596    ...     def f(self, s):
597    ...          return len(s)
598    ...
599    >>> calls_f(Base(), 'foo')
600    42
601    >>> calls_f(Derived(), 'forty-two')
602    9
603
604Things to notice about the dispatcher class:
605
606* The key element which allows overriding in Python is the
607  ``call_method`` invocation, which uses the same global type
608  conversion registry as the C++ function wrapping does to convert its
609  arguments from C++ to Python and its return type from Python to C++.
610
611* Any constructor signatures you wish to wrap must be replicated with
612  an initial ``PyObject*`` argument
613
614* The dispatcher must store this argument so that it can be used to
615  invoke ``call_method``
616
617* The ``f_default`` member function is needed when the function being
618  exposed is not pure virtual; there's no other way ``Base::f`` can be
619  called on an object of type ``BaseWrap``, since it overrides ``f``.
620
621Deeper Reflection on the Horizon?
622=================================
623
624Admittedly, this formula is tedious to repeat, especially on a project
625with many polymorphic classes.  That it is neccessary reflects some
626limitations in C++'s compile-time introspection capabilities: there's
627no way to enumerate the members of a class and find out which are
628virtual functions.  At least one very promising project has been
629started to write a front-end which can generate these dispatchers (and
630other wrapping code) automatically from C++ headers.
631
632Pyste_ is being developed by Bruno da Silva de Oliveira.  It builds on
633GCC_XML_, which generates an XML version of GCC's internal program
634representation.  Since GCC is a highly-conformant C++ compiler, this
635ensures correct handling of the most-sophisticated template code and
636full access to the underlying type system.  In keeping with the
637Boost.Python philosophy, a Pyste interface description is neither
638intrusive on the code being wrapped, nor expressed in some unfamiliar
639language: instead it is a 100% pure Python script.  If Pyste is
640successful it will mark a move away from wrapping everything directly
641in C++ for many of our users.  It will also allow us the choice to
642shift some of the metaprogram code from C++ to Python.  We expect that
643soon, not only our users but the Boost.Python developers themselves
644will be "thinking hybrid" about their own code.
645
646.. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html
647.. _`Pyste`: http://www.boost.org/libs/python/pyste
648
649---------------
650 Serialization
651---------------
652
653*Serialization* is the process of converting objects in memory to a
654form that can be stored on disk or sent over a network connection. The
655serialized object (most often a plain string) can be retrieved and
656converted back to the original object. A good serialization system will
657automatically convert entire object hierarchies. Python's standard
658``pickle`` module is just such a system.  It leverages the language's strong
659runtime introspection facilities for serializing practically arbitrary
660user-defined objects. With a few simple and unintrusive provisions this
661powerful machinery can be extended to also work for wrapped C++ objects.
662Here is an example::
663
664    #include <string>
665
666    struct World
667    {
668        World(std::string a_msg) : msg(a_msg) {}
669        std::string greet() const { return msg; }
670        std::string msg;
671    };
672
673    #include <boost/python.hpp>
674    using namespace boost::python;
675
676    struct World_picklers : pickle_suite
677    {
678      static tuple
679      getinitargs(World const& w) { return make_tuple(w.greet()); }
680    };
681
682    BOOST_PYTHON_MODULE(hello)
683    {
684        class_<World>("World", init<std::string>())
685            .def("greet", &World::greet)
686            .def_pickle(World_picklers())
687        ;
688    }
689
690Now let's create a ``World`` object and put it to rest on disk::
691
692    >>> import hello
693    >>> import pickle
694    >>> a_world = hello.World("howdy")
695    >>> pickle.dump(a_world, open("my_world", "w"))
696
697In a potentially *different script* on a potentially *different
698computer* with a potentially *different operating system*::
699
700    >>> import pickle
701    >>> resurrected_world = pickle.load(open("my_world", "r"))
702    >>> resurrected_world.greet()
703    'howdy'
704
705Of course the ``cPickle`` module can also be used for faster
706processing.
707
708Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol
709defined in the standard Python documentation. Like a __getinitargs__
710function in Python, the pickle_suite's getinitargs() is responsible for
711creating the argument tuple that will be use to reconstruct the pickled
712object.  The other elements of the Python pickling protocol,
713__getstate__ and __setstate__ can be optionally provided via C++
714getstate and setstate functions.  C++'s static type system allows the
715library to ensure at compile-time that nonsensical combinations of
716functions (e.g. getstate without setstate) are not used.
717
718Enabling serialization of more complex C++ objects requires a little
719more work than is shown in the example above. Fortunately the
720``object`` interface (see next section) greatly helps in keeping the
721code manageable.
722
723------------------
724 Object interface
725------------------
726
727Experienced 'C' language extension module authors will be familiar
728with the ubiquitous ``PyObject*``, manual reference-counting, and the
729need to remember which API calls return "new" (owned) references or
730"borrowed" (raw) references.  These constraints are not just
731cumbersome but also a major source of errors, especially in the
732presence of exceptions.
733
734Boost.Python provides a class ``object`` which automates reference
735counting and provides conversion to Python from C++ objects of
736arbitrary type.  This significantly reduces the learning effort for
737prospective extension module writers.
738
739Creating an ``object`` from any other type is extremely simple::
740
741    object s("hello, world");  // s manages a Python string
742
743``object`` has templated interactions with all other types, with
744automatic to-python conversions. It happens so naturally that it's
745easily overlooked::
746
747   object ten_Os = 10 * s[4]; // -> "oooooooooo"
748
749In the example above, ``4`` and ``10`` are converted to Python objects
750before the indexing and multiplication operations are invoked.
751
752The ``extract<T>`` class template can be used to convert Python objects
753to C++ types::
754
755    double x = extract<double>(o);
756
757If a conversion in either direction cannot be performed, an
758appropriate exception is thrown at runtime.
759
760The ``object`` type is accompanied by a set of derived types
761that mirror the Python built-in types such as ``list``, ``dict``,
762``tuple``, etc. as much as possible. This enables convenient
763manipulation of these high-level types from C++::
764
765    dict d;
766    d["some"] = "thing";
767    d["lucky_number"] = 13;
768    list l = d.keys();
769
770This almost looks and works like regular Python code, but it is pure
771C++.  Of course we can wrap C++ functions which accept or return
772``object`` instances.
773
774=================
775 Thinking hybrid
776=================
777
778Because of the practical and mental difficulties of combining
779programming languages, it is common to settle a single language at the
780outset of any development effort.  For many applications, performance
781considerations dictate the use of a compiled language for the core
782algorithms.  Unfortunately, due to the complexity of the static type
783system, the price we pay for runtime performance is often a
784significant increase in development time.  Experience shows that
785writing maintainable C++ code usually takes longer and requires *far*
786more hard-earned working experience than developing comparable Python
787code.  Even when developers are comfortable working exclusively in
788compiled languages, they often augment their systems by some type of
789ad hoc scripting layer for the benefit of their users without ever
790availing themselves of the same advantages.
791
792Boost.Python enables us to *think hybrid*.  Python can be used for
793rapidly prototyping a new application; its ease of use and the large
794pool of standard libraries give us a head start on the way to a
795working system.  If necessary, the working code can be used to
796discover rate-limiting hotspots.  To maximize performance these can
797be reimplemented in C++, together with the Boost.Python bindings
798needed to tie them back into the existing higher-level procedure.
799
800Of course, this *top-down* approach is less attractive if it is clear
801from the start that many algorithms will eventually have to be
802implemented in C++.  Fortunately Boost.Python also enables us to
803pursue a *bottom-up* approach.  We have used this approach very
804successfully in the development of a toolbox for scientific
805applications.  The toolbox started out mainly as a library of C++
806classes with Boost.Python bindings, and for a while the growth was
807mainly concentrated on the C++ parts.  However, as the toolbox is
808becoming more complete, more and more newly added functionality can be
809implemented in Python.
810
811.. image:: images/python_cpp_mix.png
812
813This figure shows the estimated ratio of newly added C++ and Python
814code over time as new algorithms are implemented.  We expect this
815ratio to level out near 70% Python.  Being able to solve new problems
816mostly in Python rather than a more difficult statically typed
817language is the return on our investment in Boost.Python.  The ability
818to access all of our code from Python allows a broader group of
819developers to use it in the rapid development of new applications.
820
821=====================
822 Development history
823=====================
824
825The first version of Boost.Python was developed in 2000 by Dave
826Abrahams at Dragon Systems, where he was privileged to have Tim Peters
827as a guide to "The Zen of Python".  One of Dave's jobs was to develop
828a Python-based natural language processing system.  Since it was
829eventually going to be targeting embedded hardware, it was always
830assumed that the compute-intensive core would be rewritten in C++ to
831optimize speed and memory footprint [#proto]_.  The project also wanted to
832test all of its C++ code using Python test scripts [#test]_.  The only
833tool we knew of for binding C++ and Python was SWIG_, and at the time
834its handling of C++ was weak.  It would be false to claim any deep
835insight into the possible advantages of Boost.Python's approach at
836this point.  Dave's interest and expertise in fancy C++ template
837tricks had just reached the point where he could do some real damage,
838and Boost.Python emerged as it did because it filled a need and
839because it seemed like a cool thing to try.
840
841This early version was aimed at many of the same basic goals we've
842described in this paper, differing most-noticeably by having a
843slightly more cumbersome syntax and by lack of special support for
844operator overloading, pickling, and component-based development.
845These last three features were quickly added by Ullrich Koethe and
846Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived
847on the scene to contribute enhancements like support for nested
848modules and static member functions.
849
850By early 2001 development had stabilized and few new features were
851being added, however a disturbing new fact came to light: Ralf had
852begun testing Boost.Python on pre-release versions of a compiler using
853the EDG_ front-end, and the mechanism at the core of Boost.Python
854responsible for handling conversions between Python and C++ types was
855failing to compile.  As it turned out, we had been exploiting a very
856common bug in the implementation of all the C++ compilers we had
857tested.  We knew that as C++ compilers rapidly became more
858standards-compliant, the library would begin failing on more
859platforms.  Unfortunately, because the mechanism was so central to the
860functioning of the library, fixing the problem looked very difficult.
861
862Fortunately, later that year Lawrence Berkeley and later Lawrence
863Livermore National labs contracted with `Boost Consulting`_ for support
864and development of Boost.Python, and there was a new opportunity to
865address fundamental issues and ensure a future for the library.  A
866redesign effort began with the low level type conversion architecture,
867building in standards-compliance and support for component-based
868development (in contrast to version 1 where conversions had to be
869explicitly imported and exported across module boundaries).  A new
870analysis of the relationship between the Python and C++ objects was
871done, resulting in more intuitive handling for C++ lvalues and
872rvalues.
873
874The emergence of a powerful new type system in Python 2.2 made the
875choice of whether to maintain compatibility with Python 1.5.2 easy:
876the opportunity to throw away a great deal of elaborate code for
877emulating classic Python classes alone was too good to pass up.  In
878addition, Python iterators and descriptors provided crucial and
879elegant tools for representing similar C++ constructs.  The
880development of the generalized ``object`` interface allowed us to
881further shield C++ programmers from the dangers and syntactic burdens
882of the Python 'C' API.  A great number of other features including C++
883exception translation, improved support for overloaded functions, and
884most significantly, CallPolicies for handling pointers and
885references, were added during this period.
886
887In October 2002, version 2 of Boost.Python was released.  Development
888since then has concentrated on improved support for C++ runtime
889polymorphism and smart pointers.  Peter Dimov's ingenious
890``boost::shared_ptr`` design in particular has allowed us to give the
891hybrid developer a consistent interface for moving objects back and
892forth across the language barrier without loss of information.  At
893first, we were concerned that the sophistication and complexity of the
894Boost.Python v2 implementation might discourage contributors, but the
895emergence of Pyste_ and several other significant feature
896contributions have laid those fears to rest.  Daily questions on the
897Python C++-sig and a backlog of desired improvements show that the
898library is getting used.  To us, the future looks bright.
899
900.. _`EDG`: http://www.edg.com
901
902=============
903 Conclusions
904=============
905
906Boost.Python achieves seamless interoperability between two rich and
907complimentary language environments.  Because it leverages template
908metaprogramming to introspect about types and functions, the user
909never has to learn a third syntax: the interface definitions are
910written in concise and maintainable C++.  Also, the wrapping system
911doesn't have to parse C++ headers or represent the type system: the
912compiler does that work for us.
913
914Computationally intensive tasks play to the strengths of C++ and are
915often impossible to implement efficiently in pure Python, while jobs
916like serialization that are trivial in Python can be very difficult in
917pure C++.  Given the luxury of building a hybrid software system from
918the ground up, we can approach design with new confidence and power.
919
920===========
921 Citations
922===========
923
924.. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report,
925   Vol. 7 No. 5 June 1995, pp. 26-31.
926   http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
927
928===========
929 Footnotes
930===========
931
932.. [#proto] In retrospect, it seems that "thinking hybrid" from the
933        ground up might have been better for the NLP system: the
934        natural component boundaries defined by the pure python
935        prototype turned out to be inappropriate for getting the
936        desired performance and memory footprint out of the C++ core,
937        which eventually caused some redesign overhead on the Python
938        side when the core was moved to C++.
939
940.. [#test] We also have some reservations about driving all C++
941        testing through a Python interface, unless that's the only way
942        it will be ultimately used.  Any transition across language
943        boundaries with such different object models can inevitably
944        mask bugs.
945
946.. [#feature] These features were expressed very differently in v1 of
947        Boost.Python
948