1<?xml version="1.0" encoding="utf-8" ?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 4<head> 5<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 6<meta name="generator" content="Docutils 0.12: http://docutils.sourceforge.net/" /> 7<title>Building Hybrid Systems with Boost.Python</title> 8<meta name="author" content="David Abrahams" /> 9<meta name="organization" content="Boost Consulting" /> 10<meta name="date" content="2003-05-14" /> 11<meta name="author" content="Ralf W. Grosse-Kunstleve" /> 12<meta name="copyright" content="Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved" /> 13<link rel="stylesheet" href="rst.css" type="text/css" /> 14</head> 15<body> 16<div class="document" id="building-hybrid-systems-with-boost-python"> 17<h1 class="title">Building Hybrid Systems with Boost.Python</h1> 18<table class="docinfo" frame="void" rules="none"> 19<col class="docinfo-name" /> 20<col class="docinfo-content" /> 21<tbody valign="top"> 22<tr><th class="docinfo-name">Author:</th> 23<td>David Abrahams</td></tr> 24<tr><th class="docinfo-name">Contact:</th> 25<td><a class="first last reference external" href="mailto:dave@boost-consulting.com">dave@boost-consulting.com</a></td></tr> 26<tr><th class="docinfo-name">Organization:</th> 27<td><a class="first last reference external" href="http://www.boost-consulting.com">Boost Consulting</a></td></tr> 28<tr><th class="docinfo-name">Date:</th> 29<td>2003-05-14</td></tr> 30<tr><th class="docinfo-name">Author:</th> 31<td>Ralf W. Grosse-Kunstleve</td></tr> 32<tr><th class="docinfo-name">Copyright:</th> 33<td>Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved</td></tr> 34</tbody> 35</table> 36<div class="contents topic" id="table-of-contents"> 37<p class="topic-title first">Table of Contents</p> 38<ul class="simple"> 39<li><a class="reference internal" href="#abstract" id="id5">Abstract</a></li> 40<li><a class="reference internal" href="#introduction" id="id6">Introduction</a></li> 41<li><a class="reference internal" href="#boost-python-design-goals" id="id7">Boost.Python Design Goals</a></li> 42<li><a class="reference internal" href="#hello-boost-python-world" id="id8">Hello Boost.Python World</a></li> 43<li><a class="reference internal" href="#library-overview" id="id9">Library Overview</a><ul> 44<li><a class="reference internal" href="#exposing-classes" id="id10">Exposing Classes</a><ul> 45<li><a class="reference internal" href="#constructors" id="id11">Constructors</a></li> 46<li><a class="reference internal" href="#data-members-and-properties" id="id12">Data Members and Properties</a></li> 47<li><a class="reference internal" href="#operator-overloading" id="id13">Operator Overloading</a></li> 48<li><a class="reference internal" href="#inheritance" id="id14">Inheritance</a></li> 49<li><a class="reference internal" href="#virtual-functions" id="id15">Virtual Functions</a></li> 50<li><a class="reference internal" href="#deeper-reflection-on-the-horizon" id="id16">Deeper Reflection on the Horizon?</a></li> 51</ul> 52</li> 53<li><a class="reference internal" href="#serialization" id="id17">Serialization</a></li> 54<li><a class="reference internal" href="#object-interface" id="id18">Object interface</a></li> 55</ul> 56</li> 57<li><a class="reference internal" href="#thinking-hybrid" id="id19">Thinking hybrid</a></li> 58<li><a class="reference internal" href="#development-history" id="id20">Development history</a></li> 59<li><a class="reference internal" href="#conclusions" id="id21">Conclusions</a></li> 60<li><a class="reference internal" href="#citations" id="id22">Citations</a></li> 61<li><a class="reference internal" href="#footnotes" id="id23">Footnotes</a></li> 62</ul> 63</div> 64<div class="section" id="abstract"> 65<h1><a class="toc-backref" href="#id5">Abstract</a></h1> 66<p>Boost.Python is an open source C++ library which provides a concise 67IDL-like interface for binding C++ classes and functions to 68Python. Leveraging the full power of C++ compile-time introspection 69and of recently developed metaprogramming techniques, this is achieved 70entirely in pure C++, without introducing a new syntax. 71Boost.Python's rich set of features and high-level interface make it 72possible to engineer packages from the ground up as hybrid systems, 73giving programmers easy and coherent access to both the efficient 74compile-time polymorphism of C++ and the extremely convenient run-time 75polymorphism of Python.</p> 76</div> 77<div class="section" id="introduction"> 78<h1><a class="toc-backref" href="#id6">Introduction</a></h1> 79<p>Python and C++ are in many ways as different as two languages could 80be: while C++ is usually compiled to machine-code, Python is 81interpreted. Python's dynamic type system is often cited as the 82foundation of its flexibility, while in C++ static typing is the 83cornerstone of its efficiency. C++ has an intricate and difficult 84compile-time meta-language, while in Python, practically everything 85happens at runtime.</p> 86<p>Yet for many programmers, these very differences mean that Python and 87C++ complement one another perfectly. Performance bottlenecks in 88Python programs can be rewritten in C++ for maximal speed, and 89authors of powerful C++ libraries choose Python as a middleware 90language for its flexible system integration capabilities. 91Furthermore, the surface differences mask some strong similarities:</p> 92<ul class="simple"> 93<li>'C'-family control structures (if, while, for...)</li> 94<li>Support for object-orientation, functional programming, and generic 95programming (these are both <em>multi-paradigm</em> programming languages.)</li> 96<li>Comprehensive operator overloading facilities, recognizing the 97importance of syntactic variability for readability and 98expressivity.</li> 99<li>High-level concepts such as collections and iterators.</li> 100<li>High-level encapsulation facilities (C++: namespaces, Python: modules) 101to support the design of re-usable libraries.</li> 102<li>Exception-handling for effective management of error conditions.</li> 103<li>C++ idioms in common use, such as handle/body classes and 104reference-counted smart pointers mirror Python reference semantics.</li> 105</ul> 106<p>Given Python's rich 'C' interoperability API, it should in principle 107be possible to expose C++ type and function interfaces to Python with 108an analogous interface to their C++ counterparts. However, the 109facilities provided by Python alone for integration with C++ are 110relatively meager. Compared to C++ and Python, 'C' has only very 111rudimentary abstraction facilities, and support for exception-handling 112is completely missing. 'C' extension module writers are required to 113manually manage Python reference counts, which is both annoyingly 114tedious and extremely error-prone. Traditional extension modules also 115tend to contain a great deal of boilerplate code repetition which 116makes them difficult to maintain, especially when wrapping an evolving 117API.</p> 118<p>These limitations have lead to the development of a variety of wrapping 119systems. <a class="reference external" href="http://www.swig.org/">SWIG</a> is probably the most popular package for the 120integration of C/C++ and Python. A more recent development is <a class="reference external" href="http://www.riverbankcomputing.co.uk/sip/index.php">SIP</a>, 121which was specifically designed for interfacing Python with the <a class="reference external" href="http://www.trolltech.com/">Qt</a> 122graphical user interface library. Both SWIG and SIP introduce their 123own specialized languages for customizing inter-language bindings. 124This has certain advantages, but having to deal with three different 125languages (Python, C/C++ and the interface language) also introduces 126practical and mental difficulties. The <a class="reference external" href="http://cxx.sourceforge.net/">CXX</a> package demonstrates an 127interesting alternative. It shows that at least some parts of 128Python's 'C' API can be wrapped and presented through a much more 129user-friendly C++ interface. However, unlike SWIG and SIP, CXX does 130not include support for wrapping C++ classes as new Python types.</p> 131<p>The features and goals of <a class="reference external" href="http://www.boost.org/libs/python/doc">Boost.Python</a> overlap significantly with 132many of these other systems. That said, Boost.Python attempts to 133maximize convenience and flexibility without introducing a separate 134wrapping language. Instead, it presents the user with a high-level 135C++ interface for wrapping C++ classes and functions, managing much of 136the complexity behind-the-scenes with static metaprogramming. 137Boost.Python also goes beyond the scope of earlier systems by 138providing:</p> 139<ul class="simple"> 140<li>Support for C++ virtual functions that can be overridden in Python.</li> 141<li>Comprehensive lifetime management facilities for low-level C++ 142pointers and references.</li> 143<li>Support for organizing extensions as Python packages, 144with a central registry for inter-language type conversions.</li> 145<li>A safe and convenient mechanism for tying into Python's powerful 146serialization engine (pickle).</li> 147<li>Coherence with the rules for handling C++ lvalues and rvalues that 148can only come from a deep understanding of both the Python and C++ 149type systems.</li> 150</ul> 151<p>The key insight that sparked the development of Boost.Python is that 152much of the boilerplate code in traditional extension modules could be 153eliminated using C++ compile-time introspection. Each argument of a 154wrapped C++ function must be extracted from a Python object using a 155procedure that depends on the argument type. Similarly the function's 156return type determines how the return value will be converted from C++ 157to Python. Of course argument and return types are part of each 158function's type, and this is exactly the source from which 159Boost.Python deduces most of the information required.</p> 160<p>This approach leads to <em>user guided wrapping</em>: as much information is 161extracted directly from the source code to be wrapped as is possible 162within the framework of pure C++, and some additional information is 163supplied explicitly by the user. Mostly the guidance is mechanical 164and little real intervention is required. Because the interface 165specification is written in the same full-featured language as the 166code being exposed, the user has unprecedented power available when 167she does need to take control.</p> 168</div> 169<div class="section" id="boost-python-design-goals"> 170<h1><a class="toc-backref" href="#id7">Boost.Python Design Goals</a></h1> 171<p>The primary goal of Boost.Python is to allow users to expose C++ 172classes and functions to Python using nothing more than a C++ 173compiler. In broad strokes, the user experience should be one of 174directly manipulating C++ objects from Python.</p> 175<p>However, it's also important not to translate all interfaces <em>too</em> 176literally: the idioms of each language must be respected. For 177example, though C++ and Python both have an iterator concept, they are 178expressed very differently. Boost.Python has to be able to bridge the 179interface gap.</p> 180<p>It must be possible to insulate Python users from crashes resulting 181from trivial misuses of C++ interfaces, such as accessing 182already-deleted objects. By the same token the library should 183insulate C++ users from low-level Python 'C' API, replacing 184error-prone 'C' interfaces like manual reference-count management and 185raw <tt class="docutils literal">PyObject</tt> pointers with more-robust alternatives.</p> 186<p>Support for component-based development is crucial, so that C++ types 187exposed in one extension module can be passed to functions exposed in 188another without loss of crucial information like C++ inheritance 189relationships.</p> 190<p>Finally, all wrapping must be <em>non-intrusive</em>, without modifying or 191even seeing the original C++ source code. Existing C++ libraries have 192to be wrappable by third parties who only have access to header files 193and binaries.</p> 194</div> 195<div class="section" id="hello-boost-python-world"> 196<h1><a class="toc-backref" href="#id8">Hello Boost.Python World</a></h1> 197<p>And now for a preview of Boost.Python, and how it improves on the raw 198facilities offered by Python. Here's a function we might want to 199expose:</p> 200<pre class="literal-block"> 201char const* greet(unsigned x) 202{ 203 static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; 204 205 if (x > 2) 206 throw std::range_error("greet: index out of range"); 207 208 return msgs[x]; 209} 210</pre> 211<p>To wrap this function in standard C++ using the Python 'C' API, we'd 212need something like this:</p> 213<pre class="literal-block"> 214extern "C" // all Python interactions use 'C' linkage and calling convention 215{ 216 // Wrapper to handle argument/result conversion and checking 217 PyObject* greet_wrap(PyObject* args, PyObject * keywords) 218 { 219 int x; 220 if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments 221 { 222 char const* result = greet(x); // invoke wrapped function 223 return PyString_FromString(result); // convert result to Python 224 } 225 return 0; // error occurred 226 } 227 228 // Table of wrapped functions to be exposed by the module 229 static PyMethodDef methods[] = { 230 { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } 231 , { NULL, NULL, 0, NULL } // sentinel 232 }; 233 234 // module initialization function 235 DL_EXPORT init_hello() 236 { 237 (void) Py_InitModule("hello", methods); // add the methods to the module 238 } 239} 240</pre> 241<p>Now here's the wrapping code we'd use to expose it with Boost.Python:</p> 242<pre class="literal-block"> 243#include <boost/python.hpp> 244using namespace boost::python; 245BOOST_PYTHON_MODULE(hello) 246{ 247 def("greet", greet, "return one of 3 parts of a greeting"); 248} 249</pre> 250<p>and here it is in action:</p> 251<pre class="literal-block"> 252>>> import hello 253>>> for x in range(3): 254... print hello.greet(x) 255... 256hello 257Boost.Python 258world! 259</pre> 260<p>Aside from the fact that the 'C' API version is much more verbose, 261it's worth noting a few things that it doesn't handle correctly:</p> 262<ul class="simple"> 263<li>The original function accepts an unsigned integer, and the Python 264'C' API only gives us a way of extracting signed integers. The 265Boost.Python version will raise a Python exception if we try to pass 266a negative number to <tt class="docutils literal">hello.greet</tt>, but the other one will proceed 267to do whatever the C++ implementation does when converting an 268negative integer to unsigned (usually wrapping to some very large 269number), and pass the incorrect translation on to the wrapped 270function.</li> 271<li>That brings us to the second problem: if the C++ <tt class="docutils literal">greet()</tt> 272function is called with a number greater than 2, it will throw an 273exception. Typically, if a C++ exception propagates across the 274boundary with code generated by a 'C' compiler, it will cause a 275crash. As you can see in the first version, there's no C++ 276scaffolding there to prevent this from happening. Functions wrapped 277by Boost.Python automatically include an exception-handling layer 278which protects Python users by translating unhandled C++ exceptions 279into a corresponding Python exception.</li> 280<li>A slightly more-subtle limitation is that the argument conversion 281used in the Python 'C' API case can only get that integer <tt class="docutils literal">x</tt> in 282<em>one way</em>. PyArg_ParseTuple can't convert Python <tt class="docutils literal">long</tt> objects 283(arbitrary-precision integers) which happen to fit in an <tt class="docutils literal">unsigned 284int</tt> but not in a <tt class="docutils literal">signed long</tt>, nor will it ever handle a 285wrapped C++ class with a user-defined implicit <tt class="docutils literal">operator unsigned 286int()</tt> conversion. Boost.Python's dynamic type conversion 287registry allows users to add arbitrary conversion methods.</li> 288</ul> 289</div> 290<div class="section" id="library-overview"> 291<h1><a class="toc-backref" href="#id9">Library Overview</a></h1> 292<p>This section outlines some of the library's major features. Except as 293neccessary to avoid confusion, details of library implementation are 294omitted.</p> 295<div class="section" id="exposing-classes"> 296<h2><a class="toc-backref" href="#id10">Exposing Classes</a></h2> 297<p>C++ classes and structs are exposed with a similarly-terse interface. 298Given:</p> 299<pre class="literal-block"> 300struct World 301{ 302 void set(std::string msg) { this->msg = msg; } 303 std::string greet() { return msg; } 304 std::string msg; 305}; 306</pre> 307<p>The following code will expose it in our extension module:</p> 308<pre class="literal-block"> 309#include <boost/python.hpp> 310BOOST_PYTHON_MODULE(hello) 311{ 312 class_<World>("World") 313 .def("greet", &World::greet) 314 .def("set", &World::set) 315 ; 316} 317</pre> 318<p>Although this code has a certain pythonic familiarity, people 319sometimes find the syntax bit confusing because it doesn't look like 320most of the C++ code they're used to. All the same, this is just 321standard C++. Because of their flexible syntax and operator 322overloading, C++ and Python are great for defining domain-specific 323(sub)languages 324(DSLs), and that's what we've done in Boost.Python. To break it down:</p> 325<pre class="literal-block"> 326class_<World>("World") 327</pre> 328<p>constructs an unnamed object of type <tt class="docutils literal">class_<World></tt> and passes 329<tt class="docutils literal">"World"</tt> to its constructor. This creates a new-style Python class 330called <tt class="docutils literal">World</tt> in the extension module, and associates it with the 331C++ type <tt class="docutils literal">World</tt> in the Boost.Python type conversion registry. We 332might have also written:</p> 333<pre class="literal-block"> 334class_<World> w("World"); 335</pre> 336<p>but that would've been more verbose, since we'd have to name <tt class="docutils literal">w</tt> 337again to invoke its <tt class="docutils literal">def()</tt> member function:</p> 338<pre class="literal-block"> 339w.def("greet", &World::greet) 340</pre> 341<p>There's nothing special about the location of the dot for member 342access in the original example: C++ allows any amount of whitespace on 343either side of a token, and placing the dot at the beginning of each 344line allows us to chain as many successive calls to member functions 345as we like with a uniform syntax. The other key fact that allows 346chaining is that <tt class="docutils literal">class_<></tt> member functions all return a reference 347to <tt class="docutils literal">*this</tt>.</p> 348<p>So the example is equivalent to:</p> 349<pre class="literal-block"> 350class_<World> w("World"); 351w.def("greet", &World::greet); 352w.def("set", &World::set); 353</pre> 354<p>It's occasionally useful to be able to break down the components of a 355Boost.Python class wrapper in this way, but the rest of this article 356will stick to the terse syntax.</p> 357<p>For completeness, here's the wrapped class in use:</p> 358<pre class="literal-block"> 359>>> import hello 360>>> planet = hello.World() 361>>> planet.set('howdy') 362>>> planet.greet() 363'howdy' 364</pre> 365<div class="section" id="constructors"> 366<h3><a class="toc-backref" href="#id11">Constructors</a></h3> 367<p>Since our <tt class="docutils literal">World</tt> class is just a plain <tt class="docutils literal">struct</tt>, it has an 368implicit no-argument (nullary) constructor. Boost.Python exposes the 369nullary constructor by default, which is why we were able to write:</p> 370<pre class="literal-block"> 371>>> planet = hello.World() 372</pre> 373<p>However, well-designed classes in any language may require constructor 374arguments in order to establish their invariants. Unlike Python, 375where <tt class="docutils literal">__init__</tt> is just a specially-named method, In C++ 376constructors cannot be handled like ordinary member functions. In 377particular, we can't take their address: <tt class="docutils literal"><span class="pre">&World::World</span></tt> is an 378error. The library provides a different interface for specifying 379constructors. Given:</p> 380<pre class="literal-block"> 381struct World 382{ 383 World(std::string msg); // added constructor 384 ... 385</pre> 386<p>we can modify our wrapping code as follows:</p> 387<pre class="literal-block"> 388class_<World>("World", init<std::string>()) 389 ... 390</pre> 391<p>of course, a C++ class may have additional constructors, and we can 392expose those as well by passing more instances of <tt class="docutils literal"><span class="pre">init<...></span></tt> to 393<tt class="docutils literal">def()</tt>:</p> 394<pre class="literal-block"> 395class_<World>("World", init<std::string>()) 396 .def(init<double, double>()) 397 ... 398</pre> 399<p>Boost.Python allows wrapped functions, member functions, and 400constructors to be overloaded to mirror C++ overloading.</p> 401</div> 402<div class="section" id="data-members-and-properties"> 403<h3><a class="toc-backref" href="#id12">Data Members and Properties</a></h3> 404<p>Any publicly-accessible data members in a C++ class can be easily 405exposed as either <tt class="docutils literal">readonly</tt> or <tt class="docutils literal">readwrite</tt> attributes:</p> 406<pre class="literal-block"> 407class_<World>("World", init<std::string>()) 408 .def_readonly("msg", &World::msg) 409 ... 410</pre> 411<p>and can be used directly in Python:</p> 412<pre class="literal-block"> 413>>> planet = hello.World('howdy') 414>>> planet.msg 415'howdy' 416</pre> 417<p>This does <em>not</em> result in adding attributes to the <tt class="docutils literal">World</tt> instance 418<tt class="docutils literal">__dict__</tt>, which can result in substantial memory savings when 419wrapping large data structures. In fact, no instance <tt class="docutils literal">__dict__</tt> 420will be created at all unless attributes are explicitly added from 421Python. Boost.Python owes this capability to the new Python 2.2 type 422system, in particular the descriptor interface and <tt class="docutils literal">property</tt> type.</p> 423<p>In C++, publicly-accessible data members are considered a sign of poor 424design because they break encapsulation, and style guides usually 425dictate the use of "getter" and "setter" functions instead. In 426Python, however, <tt class="docutils literal">__getattr__</tt>, <tt class="docutils literal">__setattr__</tt>, and since 2.2, 427<tt class="docutils literal">property</tt> mean that attribute access is just one more 428well-encapsulated syntactic tool at the programmer's disposal. 429Boost.Python bridges this idiomatic gap by making Python <tt class="docutils literal">property</tt> 430creation directly available to users. If <tt class="docutils literal">msg</tt> were private, we 431could still expose it as attribute in Python as follows:</p> 432<pre class="literal-block"> 433class_<World>("World", init<std::string>()) 434 .add_property("msg", &World::greet, &World::set) 435 ... 436</pre> 437<p>The example above mirrors the familiar usage of properties in Python 4382.2+:</p> 439<pre class="literal-block"> 440>>> class World(object): 441... __init__(self, msg): 442... self.__msg = msg 443... def greet(self): 444... return self.__msg 445... def set(self, msg): 446... self.__msg = msg 447... msg = property(greet, set) 448</pre> 449</div> 450<div class="section" id="operator-overloading"> 451<h3><a class="toc-backref" href="#id13">Operator Overloading</a></h3> 452<p>The ability to write arithmetic operators for user-defined types has 453been a major factor in the success of both languages for numerical 454computation, and the success of packages like <a class="reference external" href="http://www.pfdubois.com/numpy/">NumPy</a> attests to the 455power of exposing operators in extension modules. Boost.Python 456provides a concise mechanism for wrapping operator overloads. The 457example below shows a fragment from a wrapper for the Boost rational 458number library:</p> 459<pre class="literal-block"> 460class_<rational<int> >("rational_int") 461 .def(init<int, int>()) // constructor, e.g. rational_int(3,4) 462 .def("numerator", &rational<int>::numerator) 463 .def("denominator", &rational<int>::denominator) 464 .def(-self) // __neg__ (unary minus) 465 .def(self + self) // __add__ (homogeneous) 466 .def(self * self) // __mul__ 467 .def(self + int()) // __add__ (heterogenous) 468 .def(int() + self) // __radd__ 469 ... 470</pre> 471<p>The magic is performed using a simplified application of "expression 472templates" <a class="citation-reference" href="#veld1995" id="id1">[VELD1995]</a>, a technique originally developed for 473optimization of high-performance matrix algebra expressions. The 474essence is that instead of performing the computation immediately, 475operators are overloaded to construct a type <em>representing</em> the 476computation. In matrix algebra, dramatic optimizations are often 477available when the structure of an entire expression can be taken into 478account, rather than evaluating each operation "greedily". 479Boost.Python uses the same technique to build an appropriate Python 480method object based on expressions involving <tt class="docutils literal">self</tt>.</p> 481</div> 482<div class="section" id="inheritance"> 483<h3><a class="toc-backref" href="#id14">Inheritance</a></h3> 484<p>C++ inheritance relationships can be represented to Boost.Python by adding 485an optional <tt class="docutils literal"><span class="pre">bases<...></span></tt> argument to the <tt class="docutils literal"><span class="pre">class_<...></span></tt> template 486parameter list as follows:</p> 487<pre class="literal-block"> 488class_<Derived, bases<Base1,Base2> >("Derived") 489 ... 490</pre> 491<p>This has two effects:</p> 492<ol class="arabic simple"> 493<li>When the <tt class="docutils literal"><span class="pre">class_<...></span></tt> is created, Python type objects 494corresponding to <tt class="docutils literal">Base1</tt> and <tt class="docutils literal">Base2</tt> are looked up in 495Boost.Python's registry, and are used as bases for the new Python 496<tt class="docutils literal">Derived</tt> type object, so methods exposed for the Python <tt class="docutils literal">Base1</tt> 497and <tt class="docutils literal">Base2</tt> types are automatically members of the <tt class="docutils literal">Derived</tt> 498type. Because the registry is global, this works correctly even if 499<tt class="docutils literal">Derived</tt> is exposed in a different module from either of its 500bases.</li> 501<li>C++ conversions from <tt class="docutils literal">Derived</tt> to its bases are added to the 502Boost.Python registry. Thus wrapped C++ methods expecting (a 503pointer or reference to) an object of either base type can be 504called with an object wrapping a <tt class="docutils literal">Derived</tt> instance. Wrapped 505member functions of class <tt class="docutils literal">T</tt> are treated as though they have an 506implicit first argument of <tt class="docutils literal">T&</tt>, so these conversions are 507neccessary to allow the base class methods to be called for derived 508objects.</li> 509</ol> 510<p>Of course it's possible to derive new Python classes from wrapped C++ 511class instances. Because Boost.Python uses the new-style class 512system, that works very much as for the Python built-in types. There 513is one significant detail in which it differs: the built-in types 514generally establish their invariants in their <tt class="docutils literal">__new__</tt> function, so 515that derived classes do not need to call <tt class="docutils literal">__init__</tt> on the base 516class before invoking its methods :</p> 517<pre class="literal-block"> 518>>> class L(list): 519... def __init__(self): 520... pass 521... 522>>> L().reverse() 523>>> 524</pre> 525<p>Because C++ object construction is a one-step operation, C++ instance 526data cannot be constructed until the arguments are available, in the 527<tt class="docutils literal">__init__</tt> function:</p> 528<pre class="literal-block"> 529>>> class D(SomeBoostPythonClass): 530... def __init__(self): 531... pass 532... 533>>> D().some_boost_python_method() 534Traceback (most recent call last): 535 File "<stdin>", line 1, in ? 536TypeError: bad argument type for built-in operation 537</pre> 538<p>This happened because Boost.Python couldn't find instance data of type 539<tt class="docutils literal">SomeBoostPythonClass</tt> within the <tt class="docutils literal">D</tt> instance; <tt class="docutils literal">D</tt>'s <tt class="docutils literal">__init__</tt> 540function masked construction of the base class. It could be corrected 541by either removing <tt class="docutils literal">D</tt>'s <tt class="docutils literal">__init__</tt> function or having it call 542<tt class="docutils literal"><span class="pre">SomeBoostPythonClass.__init__(...)</span></tt> explicitly.</p> 543</div> 544<div class="section" id="virtual-functions"> 545<h3><a class="toc-backref" href="#id15">Virtual Functions</a></h3> 546<p>Deriving new types in Python from extension classes is not very 547interesting unless they can be used polymorphically from C++. In 548other words, Python method implementations should appear to override 549the implementation of C++ virtual functions when called <em>through base 550class pointers/references from C++</em>. Since the only way to alter the 551behavior of a virtual function is to override it in a derived class, 552the user must build a special derived class to dispatch a polymorphic 553class' virtual functions:</p> 554<pre class="literal-block"> 555// 556// interface to wrap: 557// 558class Base 559{ 560 public: 561 virtual int f(std::string x) { return 42; } 562 virtual ~Base(); 563}; 564 565int calls_f(Base const& b, std::string x) { return b.f(x); } 566 567// 568// Wrapping Code 569// 570 571// Dispatcher class 572struct BaseWrap : Base 573{ 574 // Store a pointer to the Python object 575 BaseWrap(PyObject* self_) : self(self_) {} 576 PyObject* self; 577 578 // Default implementation, for when f is not overridden 579 int f_default(std::string x) { return this->Base::f(x); } 580 // Dispatch implementation 581 int f(std::string x) { return call_method<int>(self, "f", x); } 582}; 583 584... 585 def("calls_f", calls_f); 586 class_<Base, BaseWrap>("Base") 587 .def("f", &Base::f, &BaseWrap::f_default) 588 ; 589</pre> 590<p>Now here's some Python code which demonstrates:</p> 591<pre class="literal-block"> 592>>> class Derived(Base): 593... def f(self, s): 594... return len(s) 595... 596>>> calls_f(Base(), 'foo') 59742 598>>> calls_f(Derived(), 'forty-two') 5999 600</pre> 601<p>Things to notice about the dispatcher class:</p> 602<ul class="simple"> 603<li>The key element which allows overriding in Python is the 604<tt class="docutils literal">call_method</tt> invocation, which uses the same global type 605conversion registry as the C++ function wrapping does to convert its 606arguments from C++ to Python and its return type from Python to C++.</li> 607<li>Any constructor signatures you wish to wrap must be replicated with 608an initial <tt class="docutils literal">PyObject*</tt> argument</li> 609<li>The dispatcher must store this argument so that it can be used to 610invoke <tt class="docutils literal">call_method</tt></li> 611<li>The <tt class="docutils literal">f_default</tt> member function is needed when the function being 612exposed is not pure virtual; there's no other way <tt class="docutils literal"><span class="pre">Base::f</span></tt> can be 613called on an object of type <tt class="docutils literal">BaseWrap</tt>, since it overrides <tt class="docutils literal">f</tt>.</li> 614</ul> 615</div> 616<div class="section" id="deeper-reflection-on-the-horizon"> 617<h3><a class="toc-backref" href="#id16">Deeper Reflection on the Horizon?</a></h3> 618<p>Admittedly, this formula is tedious to repeat, especially on a project 619with many polymorphic classes. That it is neccessary reflects some 620limitations in C++'s compile-time introspection capabilities: there's 621no way to enumerate the members of a class and find out which are 622virtual functions. At least one very promising project has been 623started to write a front-end which can generate these dispatchers (and 624other wrapping code) automatically from C++ headers.</p> 625<p><a class="reference external" href="http://www.boost.org/libs/python/pyste">Pyste</a> is being developed by Bruno da Silva de Oliveira. It builds on 626<a class="reference external" href="http://www.gccxml.org/HTML/Index.html">GCC_XML</a>, which generates an XML version of GCC's internal program 627representation. Since GCC is a highly-conformant C++ compiler, this 628ensures correct handling of the most-sophisticated template code and 629full access to the underlying type system. In keeping with the 630Boost.Python philosophy, a Pyste interface description is neither 631intrusive on the code being wrapped, nor expressed in some unfamiliar 632language: instead it is a 100% pure Python script. If Pyste is 633successful it will mark a move away from wrapping everything directly 634in C++ for many of our users. It will also allow us the choice to 635shift some of the metaprogram code from C++ to Python. We expect that 636soon, not only our users but the Boost.Python developers themselves 637will be "thinking hybrid" about their own code.</p> 638</div> 639</div> 640<div class="section" id="serialization"> 641<h2><a class="toc-backref" href="#id17">Serialization</a></h2> 642<p><em>Serialization</em> is the process of converting objects in memory to a 643form that can be stored on disk or sent over a network connection. The 644serialized object (most often a plain string) can be retrieved and 645converted back to the original object. A good serialization system will 646automatically convert entire object hierarchies. Python's standard 647<tt class="docutils literal">pickle</tt> module is just such a system. It leverages the language's strong 648runtime introspection facilities for serializing practically arbitrary 649user-defined objects. With a few simple and unintrusive provisions this 650powerful machinery can be extended to also work for wrapped C++ objects. 651Here is an example:</p> 652<pre class="literal-block"> 653#include <string> 654 655struct World 656{ 657 World(std::string a_msg) : msg(a_msg) {} 658 std::string greet() const { return msg; } 659 std::string msg; 660}; 661 662#include <boost/python.hpp> 663using namespace boost::python; 664 665struct World_picklers : pickle_suite 666{ 667 static tuple 668 getinitargs(World const& w) { return make_tuple(w.greet()); } 669}; 670 671BOOST_PYTHON_MODULE(hello) 672{ 673 class_<World>("World", init<std::string>()) 674 .def("greet", &World::greet) 675 .def_pickle(World_picklers()) 676 ; 677} 678</pre> 679<p>Now let's create a <tt class="docutils literal">World</tt> object and put it to rest on disk:</p> 680<pre class="literal-block"> 681>>> import hello 682>>> import pickle 683>>> a_world = hello.World("howdy") 684>>> pickle.dump(a_world, open("my_world", "w")) 685</pre> 686<p>In a potentially <em>different script</em> on a potentially <em>different 687computer</em> with a potentially <em>different operating system</em>:</p> 688<pre class="literal-block"> 689>>> import pickle 690>>> resurrected_world = pickle.load(open("my_world", "r")) 691>>> resurrected_world.greet() 692'howdy' 693</pre> 694<p>Of course the <tt class="docutils literal">cPickle</tt> module can also be used for faster 695processing.</p> 696<p>Boost.Python's <tt class="docutils literal">pickle_suite</tt> fully supports the <tt class="docutils literal">pickle</tt> protocol 697defined in the standard Python documentation. Like a __getinitargs__ 698function in Python, the pickle_suite's getinitargs() is responsible for 699creating the argument tuple that will be use to reconstruct the pickled 700object. The other elements of the Python pickling protocol, 701__getstate__ and __setstate__ can be optionally provided via C++ 702getstate and setstate functions. C++'s static type system allows the 703library to ensure at compile-time that nonsensical combinations of 704functions (e.g. getstate without setstate) are not used.</p> 705<p>Enabling serialization of more complex C++ objects requires a little 706more work than is shown in the example above. Fortunately the 707<tt class="docutils literal">object</tt> interface (see next section) greatly helps in keeping the 708code manageable.</p> 709</div> 710<div class="section" id="object-interface"> 711<h2><a class="toc-backref" href="#id18">Object interface</a></h2> 712<p>Experienced 'C' language extension module authors will be familiar 713with the ubiquitous <tt class="docutils literal">PyObject*</tt>, manual reference-counting, and the 714need to remember which API calls return "new" (owned) references or 715"borrowed" (raw) references. These constraints are not just 716cumbersome but also a major source of errors, especially in the 717presence of exceptions.</p> 718<p>Boost.Python provides a class <tt class="docutils literal">object</tt> which automates reference 719counting and provides conversion to Python from C++ objects of 720arbitrary type. This significantly reduces the learning effort for 721prospective extension module writers.</p> 722<p>Creating an <tt class="docutils literal">object</tt> from any other type is extremely simple:</p> 723<pre class="literal-block"> 724object s("hello, world"); // s manages a Python string 725</pre> 726<p><tt class="docutils literal">object</tt> has templated interactions with all other types, with 727automatic to-python conversions. It happens so naturally that it's 728easily overlooked:</p> 729<pre class="literal-block"> 730object ten_Os = 10 * s[4]; // -> "oooooooooo" 731</pre> 732<p>In the example above, <tt class="docutils literal">4</tt> and <tt class="docutils literal">10</tt> are converted to Python objects 733before the indexing and multiplication operations are invoked.</p> 734<p>The <tt class="docutils literal">extract<T></tt> class template can be used to convert Python objects 735to C++ types:</p> 736<pre class="literal-block"> 737double x = extract<double>(o); 738</pre> 739<p>If a conversion in either direction cannot be performed, an 740appropriate exception is thrown at runtime.</p> 741<p>The <tt class="docutils literal">object</tt> type is accompanied by a set of derived types 742that mirror the Python built-in types such as <tt class="docutils literal">list</tt>, <tt class="docutils literal">dict</tt>, 743<tt class="docutils literal">tuple</tt>, etc. as much as possible. This enables convenient 744manipulation of these high-level types from C++:</p> 745<pre class="literal-block"> 746dict d; 747d["some"] = "thing"; 748d["lucky_number"] = 13; 749list l = d.keys(); 750</pre> 751<p>This almost looks and works like regular Python code, but it is pure 752C++. Of course we can wrap C++ functions which accept or return 753<tt class="docutils literal">object</tt> instances.</p> 754</div> 755</div> 756<div class="section" id="thinking-hybrid"> 757<h1><a class="toc-backref" href="#id19">Thinking hybrid</a></h1> 758<p>Because of the practical and mental difficulties of combining 759programming languages, it is common to settle a single language at the 760outset of any development effort. For many applications, performance 761considerations dictate the use of a compiled language for the core 762algorithms. Unfortunately, due to the complexity of the static type 763system, the price we pay for runtime performance is often a 764significant increase in development time. Experience shows that 765writing maintainable C++ code usually takes longer and requires <em>far</em> 766more hard-earned working experience than developing comparable Python 767code. Even when developers are comfortable working exclusively in 768compiled languages, they often augment their systems by some type of 769ad hoc scripting layer for the benefit of their users without ever 770availing themselves of the same advantages.</p> 771<p>Boost.Python enables us to <em>think hybrid</em>. Python can be used for 772rapidly prototyping a new application; its ease of use and the large 773pool of standard libraries give us a head start on the way to a 774working system. If necessary, the working code can be used to 775discover rate-limiting hotspots. To maximize performance these can 776be reimplemented in C++, together with the Boost.Python bindings 777needed to tie them back into the existing higher-level procedure.</p> 778<p>Of course, this <em>top-down</em> approach is less attractive if it is clear 779from the start that many algorithms will eventually have to be 780implemented in C++. Fortunately Boost.Python also enables us to 781pursue a <em>bottom-up</em> approach. We have used this approach very 782successfully in the development of a toolbox for scientific 783applications. The toolbox started out mainly as a library of C++ 784classes with Boost.Python bindings, and for a while the growth was 785mainly concentrated on the C++ parts. However, as the toolbox is 786becoming more complete, more and more newly added functionality can be 787implemented in Python.</p> 788<img alt="images/python_cpp_mix.png" src="images/python_cpp_mix.png" /> 789<p>This figure shows the estimated ratio of newly added C++ and Python 790code over time as new algorithms are implemented. We expect this 791ratio to level out near 70% Python. Being able to solve new problems 792mostly in Python rather than a more difficult statically typed 793language is the return on our investment in Boost.Python. The ability 794to access all of our code from Python allows a broader group of 795developers to use it in the rapid development of new applications.</p> 796</div> 797<div class="section" id="development-history"> 798<h1><a class="toc-backref" href="#id20">Development history</a></h1> 799<p>The first version of Boost.Python was developed in 2000 by Dave 800Abrahams at Dragon Systems, where he was privileged to have Tim Peters 801as a guide to "The Zen of Python". One of Dave's jobs was to develop 802a Python-based natural language processing system. Since it was 803eventually going to be targeting embedded hardware, it was always 804assumed that the compute-intensive core would be rewritten in C++ to 805optimize speed and memory footprint<a class="footnote-reference" href="#proto" id="id2"><sup>1</sup></a>. The project also wanted to 806test all of its C++ code using Python test scripts<a class="footnote-reference" href="#test" id="id3"><sup>2</sup></a>. The only 807tool we knew of for binding C++ and Python was <a class="reference external" href="http://www.swig.org/">SWIG</a>, and at the time 808its handling of C++ was weak. It would be false to claim any deep 809insight into the possible advantages of Boost.Python's approach at 810this point. Dave's interest and expertise in fancy C++ template 811tricks had just reached the point where he could do some real damage, 812and Boost.Python emerged as it did because it filled a need and 813because it seemed like a cool thing to try.</p> 814<p>This early version was aimed at many of the same basic goals we've 815described in this paper, differing most-noticeably by having a 816slightly more cumbersome syntax and by lack of special support for 817operator overloading, pickling, and component-based development. 818These last three features were quickly added by Ullrich Koethe and 819Ralf Grosse-Kunstleve<a class="footnote-reference" href="#feature" id="id4"><sup>3</sup></a>, and other enthusiastic contributors arrived 820on the scene to contribute enhancements like support for nested 821modules and static member functions.</p> 822<p>By early 2001 development had stabilized and few new features were 823being added, however a disturbing new fact came to light: Ralf had 824begun testing Boost.Python on pre-release versions of a compiler using 825the <a class="reference external" href="http://www.edg.com">EDG</a> front-end, and the mechanism at the core of Boost.Python 826responsible for handling conversions between Python and C++ types was 827failing to compile. As it turned out, we had been exploiting a very 828common bug in the implementation of all the C++ compilers we had 829tested. We knew that as C++ compilers rapidly became more 830standards-compliant, the library would begin failing on more 831platforms. Unfortunately, because the mechanism was so central to the 832functioning of the library, fixing the problem looked very difficult.</p> 833<p>Fortunately, later that year Lawrence Berkeley and later Lawrence 834Livermore National labs contracted with <a class="reference external" href="http://www.boost-consulting.com">Boost Consulting</a> for support 835and development of Boost.Python, and there was a new opportunity to 836address fundamental issues and ensure a future for the library. A 837redesign effort began with the low level type conversion architecture, 838building in standards-compliance and support for component-based 839development (in contrast to version 1 where conversions had to be 840explicitly imported and exported across module boundaries). A new 841analysis of the relationship between the Python and C++ objects was 842done, resulting in more intuitive handling for C++ lvalues and 843rvalues.</p> 844<p>The emergence of a powerful new type system in Python 2.2 made the 845choice of whether to maintain compatibility with Python 1.5.2 easy: 846the opportunity to throw away a great deal of elaborate code for 847emulating classic Python classes alone was too good to pass up. In 848addition, Python iterators and descriptors provided crucial and 849elegant tools for representing similar C++ constructs. The 850development of the generalized <tt class="docutils literal">object</tt> interface allowed us to 851further shield C++ programmers from the dangers and syntactic burdens 852of the Python 'C' API. A great number of other features including C++ 853exception translation, improved support for overloaded functions, and 854most significantly, CallPolicies for handling pointers and 855references, were added during this period.</p> 856<p>In October 2002, version 2 of Boost.Python was released. Development 857since then has concentrated on improved support for C++ runtime 858polymorphism and smart pointers. Peter Dimov's ingenious 859<tt class="docutils literal"><span class="pre">boost::shared_ptr</span></tt> design in particular has allowed us to give the 860hybrid developer a consistent interface for moving objects back and 861forth across the language barrier without loss of information. At 862first, we were concerned that the sophistication and complexity of the 863Boost.Python v2 implementation might discourage contributors, but the 864emergence of <a class="reference external" href="http://www.boost.org/libs/python/pyste">Pyste</a> and several other significant feature 865contributions have laid those fears to rest. Daily questions on the 866Python C++-sig and a backlog of desired improvements show that the 867library is getting used. To us, the future looks bright.</p> 868</div> 869<div class="section" id="conclusions"> 870<h1><a class="toc-backref" href="#id21">Conclusions</a></h1> 871<p>Boost.Python achieves seamless interoperability between two rich and 872complimentary language environments. Because it leverages template 873metaprogramming to introspect about types and functions, the user 874never has to learn a third syntax: the interface definitions are 875written in concise and maintainable C++. Also, the wrapping system 876doesn't have to parse C++ headers or represent the type system: the 877compiler does that work for us.</p> 878<p>Computationally intensive tasks play to the strengths of C++ and are 879often impossible to implement efficiently in pure Python, while jobs 880like serialization that are trivial in Python can be very difficult in 881pure C++. Given the luxury of building a hybrid software system from 882the ground up, we can approach design with new confidence and power.</p> 883</div> 884<div class="section" id="citations"> 885<h1><a class="toc-backref" href="#id22">Citations</a></h1> 886<table class="docutils citation" frame="void" id="veld1995" rules="none"> 887<colgroup><col class="label" /><col /></colgroup> 888<tbody valign="top"> 889<tr><td class="label"><a class="fn-backref" href="#id1">[VELD1995]</a></td><td>T. Veldhuizen, "Expression Templates," C++ Report, 890Vol. 7 No. 5 June 1995, pp. 26-31. 891<a class="reference external" href="http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html">http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html</a></td></tr> 892</tbody> 893</table> 894</div> 895<div class="section" id="footnotes"> 896<h1><a class="toc-backref" href="#id23">Footnotes</a></h1> 897<table class="docutils footnote" frame="void" id="proto" rules="none"> 898<colgroup><col class="label" /><col /></colgroup> 899<tbody valign="top"> 900<tr><td class="label"><a class="fn-backref" href="#id2">[1]</a></td><td>In retrospect, it seems that "thinking hybrid" from the 901ground up might have been better for the NLP system: the 902natural component boundaries defined by the pure python 903prototype turned out to be inappropriate for getting the 904desired performance and memory footprint out of the C++ core, 905which eventually caused some redesign overhead on the Python 906side when the core was moved to C++.</td></tr> 907</tbody> 908</table> 909<table class="docutils footnote" frame="void" id="test" rules="none"> 910<colgroup><col class="label" /><col /></colgroup> 911<tbody valign="top"> 912<tr><td class="label"><a class="fn-backref" href="#id3">[2]</a></td><td>We also have some reservations about driving all C++ 913testing through a Python interface, unless that's the only way 914it will be ultimately used. Any transition across language 915boundaries with such different object models can inevitably 916mask bugs.</td></tr> 917</tbody> 918</table> 919<table class="docutils footnote" frame="void" id="feature" rules="none"> 920<colgroup><col class="label" /><col /></colgroup> 921<tbody valign="top"> 922<tr><td class="label"><a class="fn-backref" href="#id4">[3]</a></td><td>These features were expressed very differently in v1 of 923Boost.Python</td></tr> 924</tbody> 925</table> 926</div> 927</div> 928</body> 929</html> 930