• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0" encoding="utf-8" ?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4<head>
5<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
6<meta name="generator" content="Docutils 0.12: http://docutils.sourceforge.net/" />
7<title>Building Hybrid Systems with Boost.Python</title>
8<meta name="author" content="David Abrahams" />
9<meta name="organization" content="Boost Consulting" />
10<meta name="date" content="2003-05-14" />
11<meta name="author" content="Ralf W. Grosse-Kunstleve" />
12<meta name="copyright" content="Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved" />
13<link rel="stylesheet" href="rst.css" type="text/css" />
14</head>
15<body>
16<div class="document" id="building-hybrid-systems-with-boost-python">
17<h1 class="title">Building Hybrid Systems with Boost.Python</h1>
18<table class="docinfo" frame="void" rules="none">
19<col class="docinfo-name" />
20<col class="docinfo-content" />
21<tbody valign="top">
22<tr><th class="docinfo-name">Author:</th>
23<td>David Abrahams</td></tr>
24<tr><th class="docinfo-name">Contact:</th>
25<td><a class="first last reference external" href="mailto:dave&#64;boost-consulting.com">dave&#64;boost-consulting.com</a></td></tr>
26<tr><th class="docinfo-name">Organization:</th>
27<td><a class="first last reference external" href="http://www.boost-consulting.com">Boost Consulting</a></td></tr>
28<tr><th class="docinfo-name">Date:</th>
29<td>2003-05-14</td></tr>
30<tr><th class="docinfo-name">Author:</th>
31<td>Ralf W. Grosse-Kunstleve</td></tr>
32<tr><th class="docinfo-name">Copyright:</th>
33<td>Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved</td></tr>
34</tbody>
35</table>
36<div class="contents topic" id="table-of-contents">
37<p class="topic-title first">Table of Contents</p>
38<ul class="simple">
39<li><a class="reference internal" href="#abstract" id="id5">Abstract</a></li>
40<li><a class="reference internal" href="#introduction" id="id6">Introduction</a></li>
41<li><a class="reference internal" href="#boost-python-design-goals" id="id7">Boost.Python Design Goals</a></li>
42<li><a class="reference internal" href="#hello-boost-python-world" id="id8">Hello Boost.Python World</a></li>
43<li><a class="reference internal" href="#library-overview" id="id9">Library Overview</a><ul>
44<li><a class="reference internal" href="#exposing-classes" id="id10">Exposing Classes</a><ul>
45<li><a class="reference internal" href="#constructors" id="id11">Constructors</a></li>
46<li><a class="reference internal" href="#data-members-and-properties" id="id12">Data Members and Properties</a></li>
47<li><a class="reference internal" href="#operator-overloading" id="id13">Operator Overloading</a></li>
48<li><a class="reference internal" href="#inheritance" id="id14">Inheritance</a></li>
49<li><a class="reference internal" href="#virtual-functions" id="id15">Virtual Functions</a></li>
50<li><a class="reference internal" href="#deeper-reflection-on-the-horizon" id="id16">Deeper Reflection on the Horizon?</a></li>
51</ul>
52</li>
53<li><a class="reference internal" href="#serialization" id="id17">Serialization</a></li>
54<li><a class="reference internal" href="#object-interface" id="id18">Object interface</a></li>
55</ul>
56</li>
57<li><a class="reference internal" href="#thinking-hybrid" id="id19">Thinking hybrid</a></li>
58<li><a class="reference internal" href="#development-history" id="id20">Development history</a></li>
59<li><a class="reference internal" href="#conclusions" id="id21">Conclusions</a></li>
60<li><a class="reference internal" href="#citations" id="id22">Citations</a></li>
61<li><a class="reference internal" href="#footnotes" id="id23">Footnotes</a></li>
62</ul>
63</div>
64<div class="section" id="abstract">
65<h1><a class="toc-backref" href="#id5">Abstract</a></h1>
66<p>Boost.Python is an open source C++ library which provides a concise
67IDL-like interface for binding C++ classes and functions to
68Python. Leveraging the full power of C++ compile-time introspection
69and of recently developed metaprogramming techniques, this is achieved
70entirely in pure C++, without introducing a new syntax.
71Boost.Python's rich set of features and high-level interface make it
72possible to engineer packages from the ground up as hybrid systems,
73giving programmers easy and coherent access to both the efficient
74compile-time polymorphism of C++ and the extremely convenient run-time
75polymorphism of Python.</p>
76</div>
77<div class="section" id="introduction">
78<h1><a class="toc-backref" href="#id6">Introduction</a></h1>
79<p>Python and C++ are in many ways as different as two languages could
80be: while C++ is usually compiled to machine-code, Python is
81interpreted.  Python's dynamic type system is often cited as the
82foundation of its flexibility, while in C++ static typing is the
83cornerstone of its efficiency. C++ has an intricate and difficult
84compile-time meta-language, while in Python, practically everything
85happens at runtime.</p>
86<p>Yet for many programmers, these very differences mean that Python and
87C++ complement one another perfectly.  Performance bottlenecks in
88Python programs can be rewritten in C++ for maximal speed, and
89authors of powerful C++ libraries choose Python as a middleware
90language for its flexible system integration capabilities.
91Furthermore, the surface differences mask some strong similarities:</p>
92<ul class="simple">
93<li>'C'-family control structures (if, while, for...)</li>
94<li>Support for object-orientation, functional programming, and generic
95programming (these are both <em>multi-paradigm</em> programming languages.)</li>
96<li>Comprehensive operator overloading facilities, recognizing the
97importance of syntactic variability for readability and
98expressivity.</li>
99<li>High-level concepts such as collections and iterators.</li>
100<li>High-level encapsulation facilities (C++: namespaces, Python: modules)
101to support the design of re-usable libraries.</li>
102<li>Exception-handling for effective management of error conditions.</li>
103<li>C++ idioms in common use, such as handle/body classes and
104reference-counted smart pointers mirror Python reference semantics.</li>
105</ul>
106<p>Given Python's rich 'C' interoperability API, it should in principle
107be possible to expose C++ type and function interfaces to Python with
108an analogous interface to their C++ counterparts.  However, the
109facilities provided by Python alone for integration with C++ are
110relatively meager.  Compared to C++ and Python, 'C' has only very
111rudimentary abstraction facilities, and support for exception-handling
112is completely missing.  'C' extension module writers are required to
113manually manage Python reference counts, which is both annoyingly
114tedious and extremely error-prone. Traditional extension modules also
115tend to contain a great deal of boilerplate code repetition which
116makes them difficult to maintain, especially when wrapping an evolving
117API.</p>
118<p>These limitations have lead to the development of a variety of wrapping
119systems.  <a class="reference external" href="http://www.swig.org/">SWIG</a> is probably the most popular package for the
120integration of C/C++ and Python. A more recent development is <a class="reference external" href="http://www.riverbankcomputing.co.uk/sip/index.php">SIP</a>,
121which was specifically designed for interfacing Python with the <a class="reference external" href="http://www.trolltech.com/">Qt</a>
122graphical user interface library.  Both SWIG and SIP introduce their
123own specialized languages for customizing inter-language bindings.
124This has certain advantages, but having to deal with three different
125languages (Python, C/C++ and the interface language) also introduces
126practical and mental difficulties.  The <a class="reference external" href="http://cxx.sourceforge.net/">CXX</a> package demonstrates an
127interesting alternative.  It shows that at least some parts of
128Python's 'C' API can be wrapped and presented through a much more
129user-friendly C++ interface. However, unlike SWIG and SIP, CXX does
130not include support for wrapping C++ classes as new Python types.</p>
131<p>The features and goals of <a class="reference external" href="http://www.boost.org/libs/python/doc">Boost.Python</a> overlap significantly with
132many of these other systems.  That said, Boost.Python attempts to
133maximize convenience and flexibility without introducing a separate
134wrapping language.  Instead, it presents the user with a high-level
135C++ interface for wrapping C++ classes and functions, managing much of
136the complexity behind-the-scenes with static metaprogramming.
137Boost.Python also goes beyond the scope of earlier systems by
138providing:</p>
139<ul class="simple">
140<li>Support for C++ virtual functions that can be overridden in Python.</li>
141<li>Comprehensive lifetime management facilities for low-level C++
142pointers and references.</li>
143<li>Support for organizing extensions as Python packages,
144with a central registry for inter-language type conversions.</li>
145<li>A safe and convenient mechanism for tying into Python's powerful
146serialization engine (pickle).</li>
147<li>Coherence with the rules for handling C++ lvalues and rvalues that
148can only come from a deep understanding of both the Python and C++
149type systems.</li>
150</ul>
151<p>The key insight that sparked the development of Boost.Python is that
152much of the boilerplate code in traditional extension modules could be
153eliminated using C++ compile-time introspection.  Each argument of a
154wrapped C++ function must be extracted from a Python object using a
155procedure that depends on the argument type.  Similarly the function's
156return type determines how the return value will be converted from C++
157to Python.  Of course argument and return types are part of each
158function's type, and this is exactly the source from which
159Boost.Python deduces most of the information required.</p>
160<p>This approach leads to <em>user guided wrapping</em>: as much information is
161extracted directly from the source code to be wrapped as is possible
162within the framework of pure C++, and some additional information is
163supplied explicitly by the user.  Mostly the guidance is mechanical
164and little real intervention is required.  Because the interface
165specification is written in the same full-featured language as the
166code being exposed, the user has unprecedented power available when
167she does need to take control.</p>
168</div>
169<div class="section" id="boost-python-design-goals">
170<h1><a class="toc-backref" href="#id7">Boost.Python Design Goals</a></h1>
171<p>The primary goal of Boost.Python is to allow users to expose C++
172classes and functions to Python using nothing more than a C++
173compiler.  In broad strokes, the user experience should be one of
174directly manipulating C++ objects from Python.</p>
175<p>However, it's also important not to translate all interfaces <em>too</em>
176literally: the idioms of each language must be respected.  For
177example, though C++ and Python both have an iterator concept, they are
178expressed very differently.  Boost.Python has to be able to bridge the
179interface gap.</p>
180<p>It must be possible to insulate Python users from crashes resulting
181from trivial misuses of C++ interfaces, such as accessing
182already-deleted objects.  By the same token the library should
183insulate C++ users from low-level Python 'C' API, replacing
184error-prone 'C' interfaces like manual reference-count management and
185raw <tt class="docutils literal">PyObject</tt> pointers with more-robust alternatives.</p>
186<p>Support for component-based development is crucial, so that C++ types
187exposed in one extension module can be passed to functions exposed in
188another without loss of crucial information like C++ inheritance
189relationships.</p>
190<p>Finally, all wrapping must be <em>non-intrusive</em>, without modifying or
191even seeing the original C++ source code.  Existing C++ libraries have
192to be wrappable by third parties who only have access to header files
193and binaries.</p>
194</div>
195<div class="section" id="hello-boost-python-world">
196<h1><a class="toc-backref" href="#id8">Hello Boost.Python World</a></h1>
197<p>And now for a preview of Boost.Python, and how it improves on the raw
198facilities offered by Python. Here's a function we might want to
199expose:</p>
200<pre class="literal-block">
201char const* greet(unsigned x)
202{
203   static char const* const msgs[] = { &quot;hello&quot;, &quot;Boost.Python&quot;, &quot;world!&quot; };
204
205   if (x &gt; 2)
206       throw std::range_error(&quot;greet: index out of range&quot;);
207
208   return msgs[x];
209}
210</pre>
211<p>To wrap this function in standard C++ using the Python 'C' API, we'd
212need something like this:</p>
213<pre class="literal-block">
214extern &quot;C&quot; // all Python interactions use 'C' linkage and calling convention
215{
216    // Wrapper to handle argument/result conversion and checking
217    PyObject* greet_wrap(PyObject* args, PyObject * keywords)
218    {
219         int x;
220         if (PyArg_ParseTuple(args, &quot;i&quot;, &amp;x))    // extract/check arguments
221         {
222             char const* result = greet(x);      // invoke wrapped function
223             return PyString_FromString(result); // convert result to Python
224         }
225         return 0;                               // error occurred
226    }
227
228    // Table of wrapped functions to be exposed by the module
229    static PyMethodDef methods[] = {
230        { &quot;greet&quot;, greet_wrap, METH_VARARGS, &quot;return one of 3 parts of a greeting&quot; }
231        , { NULL, NULL, 0, NULL } // sentinel
232    };
233
234    // module initialization function
235    DL_EXPORT init_hello()
236    {
237        (void) Py_InitModule(&quot;hello&quot;, methods); // add the methods to the module
238    }
239}
240</pre>
241<p>Now here's the wrapping code we'd use to expose it with Boost.Python:</p>
242<pre class="literal-block">
243#include &lt;boost/python.hpp&gt;
244using namespace boost::python;
245BOOST_PYTHON_MODULE(hello)
246{
247    def(&quot;greet&quot;, greet, &quot;return one of 3 parts of a greeting&quot;);
248}
249</pre>
250<p>and here it is in action:</p>
251<pre class="literal-block">
252&gt;&gt;&gt; import hello
253&gt;&gt;&gt; for x in range(3):
254...     print hello.greet(x)
255...
256hello
257Boost.Python
258world!
259</pre>
260<p>Aside from the fact that the 'C' API version is much more verbose,
261it's worth noting a few things that it doesn't handle correctly:</p>
262<ul class="simple">
263<li>The original function accepts an unsigned integer, and the Python
264'C' API only gives us a way of extracting signed integers. The
265Boost.Python version will raise a Python exception if we try to pass
266a negative number to <tt class="docutils literal">hello.greet</tt>, but the other one will proceed
267to do whatever the C++ implementation does when converting an
268negative integer to unsigned (usually wrapping to some very large
269number), and pass the incorrect translation on to the wrapped
270function.</li>
271<li>That brings us to the second problem: if the C++ <tt class="docutils literal">greet()</tt>
272function is called with a number greater than 2, it will throw an
273exception.  Typically, if a C++ exception propagates across the
274boundary with code generated by a 'C' compiler, it will cause a
275crash.  As you can see in the first version, there's no C++
276scaffolding there to prevent this from happening.  Functions wrapped
277by Boost.Python automatically include an exception-handling layer
278which protects Python users by translating unhandled C++ exceptions
279into a corresponding Python exception.</li>
280<li>A slightly more-subtle limitation is that the argument conversion
281used in the Python 'C' API case can only get that integer <tt class="docutils literal">x</tt> in
282<em>one way</em>.  PyArg_ParseTuple can't convert Python <tt class="docutils literal">long</tt> objects
283(arbitrary-precision integers) which happen to fit in an <tt class="docutils literal">unsigned
284int</tt> but not in a <tt class="docutils literal">signed long</tt>, nor will it ever handle a
285wrapped C++ class with a user-defined implicit <tt class="docutils literal">operator unsigned
286int()</tt> conversion. Boost.Python's dynamic type conversion
287registry allows users to add arbitrary conversion methods.</li>
288</ul>
289</div>
290<div class="section" id="library-overview">
291<h1><a class="toc-backref" href="#id9">Library Overview</a></h1>
292<p>This section outlines some of the library's major features.  Except as
293neccessary to avoid confusion, details of library implementation are
294omitted.</p>
295<div class="section" id="exposing-classes">
296<h2><a class="toc-backref" href="#id10">Exposing Classes</a></h2>
297<p>C++ classes and structs are exposed with a similarly-terse interface.
298Given:</p>
299<pre class="literal-block">
300struct World
301{
302    void set(std::string msg) { this-&gt;msg = msg; }
303    std::string greet() { return msg; }
304    std::string msg;
305};
306</pre>
307<p>The following code will expose it in our extension module:</p>
308<pre class="literal-block">
309#include &lt;boost/python.hpp&gt;
310BOOST_PYTHON_MODULE(hello)
311{
312    class_&lt;World&gt;(&quot;World&quot;)
313        .def(&quot;greet&quot;, &amp;World::greet)
314        .def(&quot;set&quot;, &amp;World::set)
315    ;
316}
317</pre>
318<p>Although this code has a certain pythonic familiarity, people
319sometimes find the syntax bit confusing because it doesn't look like
320most of the C++ code they're used to. All the same, this is just
321standard C++.  Because of their flexible syntax and operator
322overloading, C++ and Python are great for defining domain-specific
323(sub)languages
324(DSLs), and that's what we've done in Boost.Python. To break it down:</p>
325<pre class="literal-block">
326class_&lt;World&gt;(&quot;World&quot;)
327</pre>
328<p>constructs an unnamed object of type <tt class="docutils literal">class_&lt;World&gt;</tt> and passes
329<tt class="docutils literal">&quot;World&quot;</tt> to its constructor.  This creates a new-style Python class
330called <tt class="docutils literal">World</tt> in the extension module, and associates it with the
331C++ type <tt class="docutils literal">World</tt> in the Boost.Python type conversion registry.  We
332might have also written:</p>
333<pre class="literal-block">
334class_&lt;World&gt; w(&quot;World&quot;);
335</pre>
336<p>but that would've been more verbose, since we'd have to name <tt class="docutils literal">w</tt>
337again to invoke its <tt class="docutils literal">def()</tt> member function:</p>
338<pre class="literal-block">
339w.def(&quot;greet&quot;, &amp;World::greet)
340</pre>
341<p>There's nothing special about the location of the dot for member
342access in the original example: C++ allows any amount of whitespace on
343either side of a token, and placing the dot at the beginning of each
344line allows us to chain as many successive calls to member functions
345as we like with a uniform syntax.  The other key fact that allows
346chaining is that <tt class="docutils literal">class_&lt;&gt;</tt> member functions all return a reference
347to <tt class="docutils literal">*this</tt>.</p>
348<p>So the example is equivalent to:</p>
349<pre class="literal-block">
350class_&lt;World&gt; w(&quot;World&quot;);
351w.def(&quot;greet&quot;, &amp;World::greet);
352w.def(&quot;set&quot;, &amp;World::set);
353</pre>
354<p>It's occasionally useful to be able to break down the components of a
355Boost.Python class wrapper in this way, but the rest of this article
356will stick to the terse syntax.</p>
357<p>For completeness, here's the wrapped class in use:</p>
358<pre class="literal-block">
359&gt;&gt;&gt; import hello
360&gt;&gt;&gt; planet = hello.World()
361&gt;&gt;&gt; planet.set('howdy')
362&gt;&gt;&gt; planet.greet()
363'howdy'
364</pre>
365<div class="section" id="constructors">
366<h3><a class="toc-backref" href="#id11">Constructors</a></h3>
367<p>Since our <tt class="docutils literal">World</tt> class is just a plain <tt class="docutils literal">struct</tt>, it has an
368implicit no-argument (nullary) constructor.  Boost.Python exposes the
369nullary constructor by default, which is why we were able to write:</p>
370<pre class="literal-block">
371&gt;&gt;&gt; planet = hello.World()
372</pre>
373<p>However, well-designed classes in any language may require constructor
374arguments in order to establish their invariants.  Unlike Python,
375where <tt class="docutils literal">__init__</tt> is just a specially-named method, In C++
376constructors cannot be handled like ordinary member functions.  In
377particular, we can't take their address: <tt class="docutils literal"><span class="pre">&amp;World::World</span></tt> is an
378error.  The library provides a different interface for specifying
379constructors.  Given:</p>
380<pre class="literal-block">
381struct World
382{
383    World(std::string msg); // added constructor
384    ...
385</pre>
386<p>we can modify our wrapping code as follows:</p>
387<pre class="literal-block">
388class_&lt;World&gt;(&quot;World&quot;, init&lt;std::string&gt;())
389    ...
390</pre>
391<p>of course, a C++ class may have additional constructors, and we can
392expose those as well by passing more instances of <tt class="docutils literal"><span class="pre">init&lt;...&gt;</span></tt> to
393<tt class="docutils literal">def()</tt>:</p>
394<pre class="literal-block">
395class_&lt;World&gt;(&quot;World&quot;, init&lt;std::string&gt;())
396    .def(init&lt;double, double&gt;())
397    ...
398</pre>
399<p>Boost.Python allows wrapped functions, member functions, and
400constructors to be overloaded to mirror C++ overloading.</p>
401</div>
402<div class="section" id="data-members-and-properties">
403<h3><a class="toc-backref" href="#id12">Data Members and Properties</a></h3>
404<p>Any publicly-accessible data members in a C++ class can be easily
405exposed as either <tt class="docutils literal">readonly</tt> or <tt class="docutils literal">readwrite</tt> attributes:</p>
406<pre class="literal-block">
407class_&lt;World&gt;(&quot;World&quot;, init&lt;std::string&gt;())
408    .def_readonly(&quot;msg&quot;, &amp;World::msg)
409    ...
410</pre>
411<p>and can be used directly in Python:</p>
412<pre class="literal-block">
413&gt;&gt;&gt; planet = hello.World('howdy')
414&gt;&gt;&gt; planet.msg
415'howdy'
416</pre>
417<p>This does <em>not</em> result in adding attributes to the <tt class="docutils literal">World</tt> instance
418<tt class="docutils literal">__dict__</tt>, which can result in substantial memory savings when
419wrapping large data structures.  In fact, no instance <tt class="docutils literal">__dict__</tt>
420will be created at all unless attributes are explicitly added from
421Python. Boost.Python owes this capability to the new Python 2.2 type
422system, in particular the descriptor interface and <tt class="docutils literal">property</tt> type.</p>
423<p>In C++, publicly-accessible data members are considered a sign of poor
424design because they break encapsulation, and style guides usually
425dictate the use of &quot;getter&quot; and &quot;setter&quot; functions instead.  In
426Python, however, <tt class="docutils literal">__getattr__</tt>, <tt class="docutils literal">__setattr__</tt>, and since 2.2,
427<tt class="docutils literal">property</tt> mean that attribute access is just one more
428well-encapsulated syntactic tool at the programmer's disposal.
429Boost.Python bridges this idiomatic gap by making Python <tt class="docutils literal">property</tt>
430creation directly available to users.  If <tt class="docutils literal">msg</tt> were private, we
431could still expose it as attribute in Python as follows:</p>
432<pre class="literal-block">
433class_&lt;World&gt;(&quot;World&quot;, init&lt;std::string&gt;())
434    .add_property(&quot;msg&quot;, &amp;World::greet, &amp;World::set)
435    ...
436</pre>
437<p>The example above mirrors the familiar usage of properties in Python
4382.2+:</p>
439<pre class="literal-block">
440&gt;&gt;&gt; class World(object):
441...     __init__(self, msg):
442...         self.__msg = msg
443...     def greet(self):
444...         return self.__msg
445...     def set(self, msg):
446...         self.__msg = msg
447...     msg = property(greet, set)
448</pre>
449</div>
450<div class="section" id="operator-overloading">
451<h3><a class="toc-backref" href="#id13">Operator Overloading</a></h3>
452<p>The ability to write arithmetic operators for user-defined types has
453been a major factor in the success of both languages for numerical
454computation, and the success of packages like <a class="reference external" href="http://www.pfdubois.com/numpy/">NumPy</a> attests to the
455power of exposing operators in extension modules.  Boost.Python
456provides a concise mechanism for wrapping operator overloads. The
457example below shows a fragment from a wrapper for the Boost rational
458number library:</p>
459<pre class="literal-block">
460class_&lt;rational&lt;int&gt; &gt;(&quot;rational_int&quot;)
461  .def(init&lt;int, int&gt;()) // constructor, e.g. rational_int(3,4)
462  .def(&quot;numerator&quot;, &amp;rational&lt;int&gt;::numerator)
463  .def(&quot;denominator&quot;, &amp;rational&lt;int&gt;::denominator)
464  .def(-self)        // __neg__ (unary minus)
465  .def(self + self)  // __add__ (homogeneous)
466  .def(self * self)  // __mul__
467  .def(self + int()) // __add__ (heterogenous)
468  .def(int() + self) // __radd__
469  ...
470</pre>
471<p>The magic is performed using a simplified application of &quot;expression
472templates&quot; <a class="citation-reference" href="#veld1995" id="id1">[VELD1995]</a>, a technique originally developed for
473optimization of high-performance matrix algebra expressions.  The
474essence is that instead of performing the computation immediately,
475operators are overloaded to construct a type <em>representing</em> the
476computation.  In matrix algebra, dramatic optimizations are often
477available when the structure of an entire expression can be taken into
478account, rather than evaluating each operation &quot;greedily&quot;.
479Boost.Python uses the same technique to build an appropriate Python
480method object based on expressions involving <tt class="docutils literal">self</tt>.</p>
481</div>
482<div class="section" id="inheritance">
483<h3><a class="toc-backref" href="#id14">Inheritance</a></h3>
484<p>C++ inheritance relationships can be represented to Boost.Python by adding
485an optional <tt class="docutils literal"><span class="pre">bases&lt;...&gt;</span></tt> argument to the <tt class="docutils literal"><span class="pre">class_&lt;...&gt;</span></tt> template
486parameter list as follows:</p>
487<pre class="literal-block">
488class_&lt;Derived, bases&lt;Base1,Base2&gt; &gt;(&quot;Derived&quot;)
489     ...
490</pre>
491<p>This has two effects:</p>
492<ol class="arabic simple">
493<li>When the <tt class="docutils literal"><span class="pre">class_&lt;...&gt;</span></tt> is created, Python type objects
494corresponding to <tt class="docutils literal">Base1</tt> and <tt class="docutils literal">Base2</tt> are looked up in
495Boost.Python's registry, and are used as bases for the new Python
496<tt class="docutils literal">Derived</tt> type object, so methods exposed for the Python <tt class="docutils literal">Base1</tt>
497and <tt class="docutils literal">Base2</tt> types are automatically members of the <tt class="docutils literal">Derived</tt>
498type.  Because the registry is global, this works correctly even if
499<tt class="docutils literal">Derived</tt> is exposed in a different module from either of its
500bases.</li>
501<li>C++ conversions from <tt class="docutils literal">Derived</tt> to its bases are added to the
502Boost.Python registry.  Thus wrapped C++ methods expecting (a
503pointer or reference to) an object of either base type can be
504called with an object wrapping a <tt class="docutils literal">Derived</tt> instance.  Wrapped
505member functions of class <tt class="docutils literal">T</tt> are treated as though they have an
506implicit first argument of <tt class="docutils literal">T&amp;</tt>, so these conversions are
507neccessary to allow the base class methods to be called for derived
508objects.</li>
509</ol>
510<p>Of course it's possible to derive new Python classes from wrapped C++
511class instances.  Because Boost.Python uses the new-style class
512system, that works very much as for the Python built-in types.  There
513is one significant detail in which it differs: the built-in types
514generally establish their invariants in their <tt class="docutils literal">__new__</tt> function, so
515that derived classes do not need to call <tt class="docutils literal">__init__</tt> on the base
516class before invoking its methods :</p>
517<pre class="literal-block">
518&gt;&gt;&gt; class L(list):
519...      def __init__(self):
520...          pass
521...
522&gt;&gt;&gt; L().reverse()
523&gt;&gt;&gt;
524</pre>
525<p>Because C++ object construction is a one-step operation, C++ instance
526data cannot be constructed until the arguments are available, in the
527<tt class="docutils literal">__init__</tt> function:</p>
528<pre class="literal-block">
529&gt;&gt;&gt; class D(SomeBoostPythonClass):
530...      def __init__(self):
531...          pass
532...
533&gt;&gt;&gt; D().some_boost_python_method()
534Traceback (most recent call last):
535  File &quot;&lt;stdin&gt;&quot;, line 1, in ?
536TypeError: bad argument type for built-in operation
537</pre>
538<p>This happened because Boost.Python couldn't find instance data of type
539<tt class="docutils literal">SomeBoostPythonClass</tt> within the <tt class="docutils literal">D</tt> instance; <tt class="docutils literal">D</tt>'s <tt class="docutils literal">__init__</tt>
540function masked construction of the base class.  It could be corrected
541by either removing <tt class="docutils literal">D</tt>'s <tt class="docutils literal">__init__</tt> function or having it call
542<tt class="docutils literal"><span class="pre">SomeBoostPythonClass.__init__(...)</span></tt> explicitly.</p>
543</div>
544<div class="section" id="virtual-functions">
545<h3><a class="toc-backref" href="#id15">Virtual Functions</a></h3>
546<p>Deriving new types in Python from extension classes is not very
547interesting unless they can be used polymorphically from C++.  In
548other words, Python method implementations should appear to override
549the implementation of C++ virtual functions when called <em>through base
550class pointers/references from C++</em>.  Since the only way to alter the
551behavior of a virtual function is to override it in a derived class,
552the user must build a special derived class to dispatch a polymorphic
553class' virtual functions:</p>
554<pre class="literal-block">
555//
556// interface to wrap:
557//
558class Base
559{
560 public:
561    virtual int f(std::string x) { return 42; }
562    virtual ~Base();
563};
564
565int calls_f(Base const&amp; b, std::string x) { return b.f(x); }
566
567//
568// Wrapping Code
569//
570
571// Dispatcher class
572struct BaseWrap : Base
573{
574    // Store a pointer to the Python object
575    BaseWrap(PyObject* self_) : self(self_) {}
576    PyObject* self;
577
578    // Default implementation, for when f is not overridden
579    int f_default(std::string x) { return this-&gt;Base::f(x); }
580    // Dispatch implementation
581    int f(std::string x) { return call_method&lt;int&gt;(self, &quot;f&quot;, x); }
582};
583
584...
585    def(&quot;calls_f&quot;, calls_f);
586    class_&lt;Base, BaseWrap&gt;(&quot;Base&quot;)
587        .def(&quot;f&quot;, &amp;Base::f, &amp;BaseWrap::f_default)
588        ;
589</pre>
590<p>Now here's some Python code which demonstrates:</p>
591<pre class="literal-block">
592&gt;&gt;&gt; class Derived(Base):
593...     def f(self, s):
594...          return len(s)
595...
596&gt;&gt;&gt; calls_f(Base(), 'foo')
59742
598&gt;&gt;&gt; calls_f(Derived(), 'forty-two')
5999
600</pre>
601<p>Things to notice about the dispatcher class:</p>
602<ul class="simple">
603<li>The key element which allows overriding in Python is the
604<tt class="docutils literal">call_method</tt> invocation, which uses the same global type
605conversion registry as the C++ function wrapping does to convert its
606arguments from C++ to Python and its return type from Python to C++.</li>
607<li>Any constructor signatures you wish to wrap must be replicated with
608an initial <tt class="docutils literal">PyObject*</tt> argument</li>
609<li>The dispatcher must store this argument so that it can be used to
610invoke <tt class="docutils literal">call_method</tt></li>
611<li>The <tt class="docutils literal">f_default</tt> member function is needed when the function being
612exposed is not pure virtual; there's no other way <tt class="docutils literal"><span class="pre">Base::f</span></tt> can be
613called on an object of type <tt class="docutils literal">BaseWrap</tt>, since it overrides <tt class="docutils literal">f</tt>.</li>
614</ul>
615</div>
616<div class="section" id="deeper-reflection-on-the-horizon">
617<h3><a class="toc-backref" href="#id16">Deeper Reflection on the Horizon?</a></h3>
618<p>Admittedly, this formula is tedious to repeat, especially on a project
619with many polymorphic classes.  That it is neccessary reflects some
620limitations in C++'s compile-time introspection capabilities: there's
621no way to enumerate the members of a class and find out which are
622virtual functions.  At least one very promising project has been
623started to write a front-end which can generate these dispatchers (and
624other wrapping code) automatically from C++ headers.</p>
625<p><a class="reference external" href="http://www.boost.org/libs/python/pyste">Pyste</a> is being developed by Bruno da Silva de Oliveira.  It builds on
626<a class="reference external" href="http://www.gccxml.org/HTML/Index.html">GCC_XML</a>, which generates an XML version of GCC's internal program
627representation.  Since GCC is a highly-conformant C++ compiler, this
628ensures correct handling of the most-sophisticated template code and
629full access to the underlying type system.  In keeping with the
630Boost.Python philosophy, a Pyste interface description is neither
631intrusive on the code being wrapped, nor expressed in some unfamiliar
632language: instead it is a 100% pure Python script.  If Pyste is
633successful it will mark a move away from wrapping everything directly
634in C++ for many of our users.  It will also allow us the choice to
635shift some of the metaprogram code from C++ to Python.  We expect that
636soon, not only our users but the Boost.Python developers themselves
637will be &quot;thinking hybrid&quot; about their own code.</p>
638</div>
639</div>
640<div class="section" id="serialization">
641<h2><a class="toc-backref" href="#id17">Serialization</a></h2>
642<p><em>Serialization</em> is the process of converting objects in memory to a
643form that can be stored on disk or sent over a network connection. The
644serialized object (most often a plain string) can be retrieved and
645converted back to the original object. A good serialization system will
646automatically convert entire object hierarchies. Python's standard
647<tt class="docutils literal">pickle</tt> module is just such a system.  It leverages the language's strong
648runtime introspection facilities for serializing practically arbitrary
649user-defined objects. With a few simple and unintrusive provisions this
650powerful machinery can be extended to also work for wrapped C++ objects.
651Here is an example:</p>
652<pre class="literal-block">
653#include &lt;string&gt;
654
655struct World
656{
657    World(std::string a_msg) : msg(a_msg) {}
658    std::string greet() const { return msg; }
659    std::string msg;
660};
661
662#include &lt;boost/python.hpp&gt;
663using namespace boost::python;
664
665struct World_picklers : pickle_suite
666{
667  static tuple
668  getinitargs(World const&amp; w) { return make_tuple(w.greet()); }
669};
670
671BOOST_PYTHON_MODULE(hello)
672{
673    class_&lt;World&gt;(&quot;World&quot;, init&lt;std::string&gt;())
674        .def(&quot;greet&quot;, &amp;World::greet)
675        .def_pickle(World_picklers())
676    ;
677}
678</pre>
679<p>Now let's create a <tt class="docutils literal">World</tt> object and put it to rest on disk:</p>
680<pre class="literal-block">
681&gt;&gt;&gt; import hello
682&gt;&gt;&gt; import pickle
683&gt;&gt;&gt; a_world = hello.World(&quot;howdy&quot;)
684&gt;&gt;&gt; pickle.dump(a_world, open(&quot;my_world&quot;, &quot;w&quot;))
685</pre>
686<p>In a potentially <em>different script</em> on a potentially <em>different
687computer</em> with a potentially <em>different operating system</em>:</p>
688<pre class="literal-block">
689&gt;&gt;&gt; import pickle
690&gt;&gt;&gt; resurrected_world = pickle.load(open(&quot;my_world&quot;, &quot;r&quot;))
691&gt;&gt;&gt; resurrected_world.greet()
692'howdy'
693</pre>
694<p>Of course the <tt class="docutils literal">cPickle</tt> module can also be used for faster
695processing.</p>
696<p>Boost.Python's <tt class="docutils literal">pickle_suite</tt> fully supports the <tt class="docutils literal">pickle</tt> protocol
697defined in the standard Python documentation. Like a __getinitargs__
698function in Python, the pickle_suite's getinitargs() is responsible for
699creating the argument tuple that will be use to reconstruct the pickled
700object.  The other elements of the Python pickling protocol,
701__getstate__ and __setstate__ can be optionally provided via C++
702getstate and setstate functions.  C++'s static type system allows the
703library to ensure at compile-time that nonsensical combinations of
704functions (e.g. getstate without setstate) are not used.</p>
705<p>Enabling serialization of more complex C++ objects requires a little
706more work than is shown in the example above. Fortunately the
707<tt class="docutils literal">object</tt> interface (see next section) greatly helps in keeping the
708code manageable.</p>
709</div>
710<div class="section" id="object-interface">
711<h2><a class="toc-backref" href="#id18">Object interface</a></h2>
712<p>Experienced 'C' language extension module authors will be familiar
713with the ubiquitous <tt class="docutils literal">PyObject*</tt>, manual reference-counting, and the
714need to remember which API calls return &quot;new&quot; (owned) references or
715&quot;borrowed&quot; (raw) references.  These constraints are not just
716cumbersome but also a major source of errors, especially in the
717presence of exceptions.</p>
718<p>Boost.Python provides a class <tt class="docutils literal">object</tt> which automates reference
719counting and provides conversion to Python from C++ objects of
720arbitrary type.  This significantly reduces the learning effort for
721prospective extension module writers.</p>
722<p>Creating an <tt class="docutils literal">object</tt> from any other type is extremely simple:</p>
723<pre class="literal-block">
724object s(&quot;hello, world&quot;);  // s manages a Python string
725</pre>
726<p><tt class="docutils literal">object</tt> has templated interactions with all other types, with
727automatic to-python conversions. It happens so naturally that it's
728easily overlooked:</p>
729<pre class="literal-block">
730object ten_Os = 10 * s[4]; // -&gt; &quot;oooooooooo&quot;
731</pre>
732<p>In the example above, <tt class="docutils literal">4</tt> and <tt class="docutils literal">10</tt> are converted to Python objects
733before the indexing and multiplication operations are invoked.</p>
734<p>The <tt class="docutils literal">extract&lt;T&gt;</tt> class template can be used to convert Python objects
735to C++ types:</p>
736<pre class="literal-block">
737double x = extract&lt;double&gt;(o);
738</pre>
739<p>If a conversion in either direction cannot be performed, an
740appropriate exception is thrown at runtime.</p>
741<p>The <tt class="docutils literal">object</tt> type is accompanied by a set of derived types
742that mirror the Python built-in types such as <tt class="docutils literal">list</tt>, <tt class="docutils literal">dict</tt>,
743<tt class="docutils literal">tuple</tt>, etc. as much as possible. This enables convenient
744manipulation of these high-level types from C++:</p>
745<pre class="literal-block">
746dict d;
747d[&quot;some&quot;] = &quot;thing&quot;;
748d[&quot;lucky_number&quot;] = 13;
749list l = d.keys();
750</pre>
751<p>This almost looks and works like regular Python code, but it is pure
752C++.  Of course we can wrap C++ functions which accept or return
753<tt class="docutils literal">object</tt> instances.</p>
754</div>
755</div>
756<div class="section" id="thinking-hybrid">
757<h1><a class="toc-backref" href="#id19">Thinking hybrid</a></h1>
758<p>Because of the practical and mental difficulties of combining
759programming languages, it is common to settle a single language at the
760outset of any development effort.  For many applications, performance
761considerations dictate the use of a compiled language for the core
762algorithms.  Unfortunately, due to the complexity of the static type
763system, the price we pay for runtime performance is often a
764significant increase in development time.  Experience shows that
765writing maintainable C++ code usually takes longer and requires <em>far</em>
766more hard-earned working experience than developing comparable Python
767code.  Even when developers are comfortable working exclusively in
768compiled languages, they often augment their systems by some type of
769ad hoc scripting layer for the benefit of their users without ever
770availing themselves of the same advantages.</p>
771<p>Boost.Python enables us to <em>think hybrid</em>.  Python can be used for
772rapidly prototyping a new application; its ease of use and the large
773pool of standard libraries give us a head start on the way to a
774working system.  If necessary, the working code can be used to
775discover rate-limiting hotspots.  To maximize performance these can
776be reimplemented in C++, together with the Boost.Python bindings
777needed to tie them back into the existing higher-level procedure.</p>
778<p>Of course, this <em>top-down</em> approach is less attractive if it is clear
779from the start that many algorithms will eventually have to be
780implemented in C++.  Fortunately Boost.Python also enables us to
781pursue a <em>bottom-up</em> approach.  We have used this approach very
782successfully in the development of a toolbox for scientific
783applications.  The toolbox started out mainly as a library of C++
784classes with Boost.Python bindings, and for a while the growth was
785mainly concentrated on the C++ parts.  However, as the toolbox is
786becoming more complete, more and more newly added functionality can be
787implemented in Python.</p>
788<img alt="images/python_cpp_mix.png" src="images/python_cpp_mix.png" />
789<p>This figure shows the estimated ratio of newly added C++ and Python
790code over time as new algorithms are implemented.  We expect this
791ratio to level out near 70% Python.  Being able to solve new problems
792mostly in Python rather than a more difficult statically typed
793language is the return on our investment in Boost.Python.  The ability
794to access all of our code from Python allows a broader group of
795developers to use it in the rapid development of new applications.</p>
796</div>
797<div class="section" id="development-history">
798<h1><a class="toc-backref" href="#id20">Development history</a></h1>
799<p>The first version of Boost.Python was developed in 2000 by Dave
800Abrahams at Dragon Systems, where he was privileged to have Tim Peters
801as a guide to &quot;The Zen of Python&quot;.  One of Dave's jobs was to develop
802a Python-based natural language processing system.  Since it was
803eventually going to be targeting embedded hardware, it was always
804assumed that the compute-intensive core would be rewritten in C++ to
805optimize speed and memory footprint<a class="footnote-reference" href="#proto" id="id2"><sup>1</sup></a>.  The project also wanted to
806test all of its C++ code using Python test scripts<a class="footnote-reference" href="#test" id="id3"><sup>2</sup></a>.  The only
807tool we knew of for binding C++ and Python was <a class="reference external" href="http://www.swig.org/">SWIG</a>, and at the time
808its handling of C++ was weak.  It would be false to claim any deep
809insight into the possible advantages of Boost.Python's approach at
810this point.  Dave's interest and expertise in fancy C++ template
811tricks had just reached the point where he could do some real damage,
812and Boost.Python emerged as it did because it filled a need and
813because it seemed like a cool thing to try.</p>
814<p>This early version was aimed at many of the same basic goals we've
815described in this paper, differing most-noticeably by having a
816slightly more cumbersome syntax and by lack of special support for
817operator overloading, pickling, and component-based development.
818These last three features were quickly added by Ullrich Koethe and
819Ralf Grosse-Kunstleve<a class="footnote-reference" href="#feature" id="id4"><sup>3</sup></a>, and other enthusiastic contributors arrived
820on the scene to contribute enhancements like support for nested
821modules and static member functions.</p>
822<p>By early 2001 development had stabilized and few new features were
823being added, however a disturbing new fact came to light: Ralf had
824begun testing Boost.Python on pre-release versions of a compiler using
825the <a class="reference external" href="http://www.edg.com">EDG</a> front-end, and the mechanism at the core of Boost.Python
826responsible for handling conversions between Python and C++ types was
827failing to compile.  As it turned out, we had been exploiting a very
828common bug in the implementation of all the C++ compilers we had
829tested.  We knew that as C++ compilers rapidly became more
830standards-compliant, the library would begin failing on more
831platforms.  Unfortunately, because the mechanism was so central to the
832functioning of the library, fixing the problem looked very difficult.</p>
833<p>Fortunately, later that year Lawrence Berkeley and later Lawrence
834Livermore National labs contracted with <a class="reference external" href="http://www.boost-consulting.com">Boost Consulting</a> for support
835and development of Boost.Python, and there was a new opportunity to
836address fundamental issues and ensure a future for the library.  A
837redesign effort began with the low level type conversion architecture,
838building in standards-compliance and support for component-based
839development (in contrast to version 1 where conversions had to be
840explicitly imported and exported across module boundaries).  A new
841analysis of the relationship between the Python and C++ objects was
842done, resulting in more intuitive handling for C++ lvalues and
843rvalues.</p>
844<p>The emergence of a powerful new type system in Python 2.2 made the
845choice of whether to maintain compatibility with Python 1.5.2 easy:
846the opportunity to throw away a great deal of elaborate code for
847emulating classic Python classes alone was too good to pass up.  In
848addition, Python iterators and descriptors provided crucial and
849elegant tools for representing similar C++ constructs.  The
850development of the generalized <tt class="docutils literal">object</tt> interface allowed us to
851further shield C++ programmers from the dangers and syntactic burdens
852of the Python 'C' API.  A great number of other features including C++
853exception translation, improved support for overloaded functions, and
854most significantly, CallPolicies for handling pointers and
855references, were added during this period.</p>
856<p>In October 2002, version 2 of Boost.Python was released.  Development
857since then has concentrated on improved support for C++ runtime
858polymorphism and smart pointers.  Peter Dimov's ingenious
859<tt class="docutils literal"><span class="pre">boost::shared_ptr</span></tt> design in particular has allowed us to give the
860hybrid developer a consistent interface for moving objects back and
861forth across the language barrier without loss of information.  At
862first, we were concerned that the sophistication and complexity of the
863Boost.Python v2 implementation might discourage contributors, but the
864emergence of <a class="reference external" href="http://www.boost.org/libs/python/pyste">Pyste</a> and several other significant feature
865contributions have laid those fears to rest.  Daily questions on the
866Python C++-sig and a backlog of desired improvements show that the
867library is getting used.  To us, the future looks bright.</p>
868</div>
869<div class="section" id="conclusions">
870<h1><a class="toc-backref" href="#id21">Conclusions</a></h1>
871<p>Boost.Python achieves seamless interoperability between two rich and
872complimentary language environments.  Because it leverages template
873metaprogramming to introspect about types and functions, the user
874never has to learn a third syntax: the interface definitions are
875written in concise and maintainable C++.  Also, the wrapping system
876doesn't have to parse C++ headers or represent the type system: the
877compiler does that work for us.</p>
878<p>Computationally intensive tasks play to the strengths of C++ and are
879often impossible to implement efficiently in pure Python, while jobs
880like serialization that are trivial in Python can be very difficult in
881pure C++.  Given the luxury of building a hybrid software system from
882the ground up, we can approach design with new confidence and power.</p>
883</div>
884<div class="section" id="citations">
885<h1><a class="toc-backref" href="#id22">Citations</a></h1>
886<table class="docutils citation" frame="void" id="veld1995" rules="none">
887<colgroup><col class="label" /><col /></colgroup>
888<tbody valign="top">
889<tr><td class="label"><a class="fn-backref" href="#id1">[VELD1995]</a></td><td>T. Veldhuizen, &quot;Expression Templates,&quot; C++ Report,
890Vol. 7 No. 5 June 1995, pp. 26-31.
891<a class="reference external" href="http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html">http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html</a></td></tr>
892</tbody>
893</table>
894</div>
895<div class="section" id="footnotes">
896<h1><a class="toc-backref" href="#id23">Footnotes</a></h1>
897<table class="docutils footnote" frame="void" id="proto" rules="none">
898<colgroup><col class="label" /><col /></colgroup>
899<tbody valign="top">
900<tr><td class="label"><a class="fn-backref" href="#id2">[1]</a></td><td>In retrospect, it seems that &quot;thinking hybrid&quot; from the
901ground up might have been better for the NLP system: the
902natural component boundaries defined by the pure python
903prototype turned out to be inappropriate for getting the
904desired performance and memory footprint out of the C++ core,
905which eventually caused some redesign overhead on the Python
906side when the core was moved to C++.</td></tr>
907</tbody>
908</table>
909<table class="docutils footnote" frame="void" id="test" rules="none">
910<colgroup><col class="label" /><col /></colgroup>
911<tbody valign="top">
912<tr><td class="label"><a class="fn-backref" href="#id3">[2]</a></td><td>We also have some reservations about driving all C++
913testing through a Python interface, unless that's the only way
914it will be ultimately used.  Any transition across language
915boundaries with such different object models can inevitably
916mask bugs.</td></tr>
917</tbody>
918</table>
919<table class="docutils footnote" frame="void" id="feature" rules="none">
920<colgroup><col class="label" /><col /></colgroup>
921<tbody valign="top">
922<tr><td class="label"><a class="fn-backref" href="#id4">[3]</a></td><td>These features were expressed very differently in v1 of
923Boost.Python</td></tr>
924</tbody>
925</table>
926</div>
927</div>
928</body>
929</html>
930