1+++++++++++++++++++++++++++++++++++++++++++ 2 Building Hybrid Systems with Boost.Python 3+++++++++++++++++++++++++++++++++++++++++++ 4 5:Author: David Abrahams 6:Contact: dave@boost-consulting.com 7:organization: `Boost Consulting`_ 8:date: 2003-05-14 9 10:Author: Ralf W. Grosse-Kunstleve 11 12:copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved 13 14.. contents:: Table of Contents 15 16.. _`Boost Consulting`: http://www.boost-consulting.com 17 18========== 19 Abstract 20========== 21 22Boost.Python is an open source C++ library which provides a concise 23IDL-like interface for binding C++ classes and functions to 24Python. Leveraging the full power of C++ compile-time introspection 25and of recently developed metaprogramming techniques, this is achieved 26entirely in pure C++, without introducing a new syntax. 27Boost.Python's rich set of features and high-level interface make it 28possible to engineer packages from the ground up as hybrid systems, 29giving programmers easy and coherent access to both the efficient 30compile-time polymorphism of C++ and the extremely convenient run-time 31polymorphism of Python. 32 33============== 34 Introduction 35============== 36 37Python and C++ are in many ways as different as two languages could 38be: while C++ is usually compiled to machine-code, Python is 39interpreted. Python's dynamic type system is often cited as the 40foundation of its flexibility, while in C++ static typing is the 41cornerstone of its efficiency. C++ has an intricate and difficult 42compile-time meta-language, while in Python, practically everything 43happens at runtime. 44 45Yet for many programmers, these very differences mean that Python and 46C++ complement one another perfectly. Performance bottlenecks in 47Python programs can be rewritten in C++ for maximal speed, and 48authors of powerful C++ libraries choose Python as a middleware 49language for its flexible system integration capabilities. 50Furthermore, the surface differences mask some strong similarities: 51 52* 'C'-family control structures (if, while, for...) 53 54* Support for object-orientation, functional programming, and generic 55 programming (these are both *multi-paradigm* programming languages.) 56 57* Comprehensive operator overloading facilities, recognizing the 58 importance of syntactic variability for readability and 59 expressivity. 60 61* High-level concepts such as collections and iterators. 62 63* High-level encapsulation facilities (C++: namespaces, Python: modules) 64 to support the design of re-usable libraries. 65 66* Exception-handling for effective management of error conditions. 67 68* C++ idioms in common use, such as handle/body classes and 69 reference-counted smart pointers mirror Python reference semantics. 70 71Given Python's rich 'C' interoperability API, it should in principle 72be possible to expose C++ type and function interfaces to Python with 73an analogous interface to their C++ counterparts. However, the 74facilities provided by Python alone for integration with C++ are 75relatively meager. Compared to C++ and Python, 'C' has only very 76rudimentary abstraction facilities, and support for exception-handling 77is completely missing. 'C' extension module writers are required to 78manually manage Python reference counts, which is both annoyingly 79tedious and extremely error-prone. Traditional extension modules also 80tend to contain a great deal of boilerplate code repetition which 81makes them difficult to maintain, especially when wrapping an evolving 82API. 83 84These limitations have lead to the development of a variety of wrapping 85systems. SWIG_ is probably the most popular package for the 86integration of C/C++ and Python. A more recent development is SIP_, 87which was specifically designed for interfacing Python with the Qt_ 88graphical user interface library. Both SWIG and SIP introduce their 89own specialized languages for customizing inter-language bindings. 90This has certain advantages, but having to deal with three different 91languages (Python, C/C++ and the interface language) also introduces 92practical and mental difficulties. The CXX_ package demonstrates an 93interesting alternative. It shows that at least some parts of 94Python's 'C' API can be wrapped and presented through a much more 95user-friendly C++ interface. However, unlike SWIG and SIP, CXX does 96not include support for wrapping C++ classes as new Python types. 97 98The features and goals of Boost.Python_ overlap significantly with 99many of these other systems. That said, Boost.Python attempts to 100maximize convenience and flexibility without introducing a separate 101wrapping language. Instead, it presents the user with a high-level 102C++ interface for wrapping C++ classes and functions, managing much of 103the complexity behind-the-scenes with static metaprogramming. 104Boost.Python also goes beyond the scope of earlier systems by 105providing: 106 107* Support for C++ virtual functions that can be overridden in Python. 108 109* Comprehensive lifetime management facilities for low-level C++ 110 pointers and references. 111 112* Support for organizing extensions as Python packages, 113 with a central registry for inter-language type conversions. 114 115* A safe and convenient mechanism for tying into Python's powerful 116 serialization engine (pickle). 117 118* Coherence with the rules for handling C++ lvalues and rvalues that 119 can only come from a deep understanding of both the Python and C++ 120 type systems. 121 122The key insight that sparked the development of Boost.Python is that 123much of the boilerplate code in traditional extension modules could be 124eliminated using C++ compile-time introspection. Each argument of a 125wrapped C++ function must be extracted from a Python object using a 126procedure that depends on the argument type. Similarly the function's 127return type determines how the return value will be converted from C++ 128to Python. Of course argument and return types are part of each 129function's type, and this is exactly the source from which 130Boost.Python deduces most of the information required. 131 132This approach leads to *user guided wrapping*: as much information is 133extracted directly from the source code to be wrapped as is possible 134within the framework of pure C++, and some additional information is 135supplied explicitly by the user. Mostly the guidance is mechanical 136and little real intervention is required. Because the interface 137specification is written in the same full-featured language as the 138code being exposed, the user has unprecedented power available when 139she does need to take control. 140 141.. _Python: http://www.python.org/ 142.. _SWIG: http://www.swig.org/ 143.. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php 144.. _Qt: http://www.trolltech.com/ 145.. _CXX: http://cxx.sourceforge.net/ 146.. _Boost.Python: http://www.boost.org/libs/python/doc 147 148=========================== 149 Boost.Python Design Goals 150=========================== 151 152The primary goal of Boost.Python is to allow users to expose C++ 153classes and functions to Python using nothing more than a C++ 154compiler. In broad strokes, the user experience should be one of 155directly manipulating C++ objects from Python. 156 157However, it's also important not to translate all interfaces *too* 158literally: the idioms of each language must be respected. For 159example, though C++ and Python both have an iterator concept, they are 160expressed very differently. Boost.Python has to be able to bridge the 161interface gap. 162 163It must be possible to insulate Python users from crashes resulting 164from trivial misuses of C++ interfaces, such as accessing 165already-deleted objects. By the same token the library should 166insulate C++ users from low-level Python 'C' API, replacing 167error-prone 'C' interfaces like manual reference-count management and 168raw ``PyObject`` pointers with more-robust alternatives. 169 170Support for component-based development is crucial, so that C++ types 171exposed in one extension module can be passed to functions exposed in 172another without loss of crucial information like C++ inheritance 173relationships. 174 175Finally, all wrapping must be *non-intrusive*, without modifying or 176even seeing the original C++ source code. Existing C++ libraries have 177to be wrappable by third parties who only have access to header files 178and binaries. 179 180========================== 181 Hello Boost.Python World 182========================== 183 184And now for a preview of Boost.Python, and how it improves on the raw 185facilities offered by Python. Here's a function we might want to 186expose:: 187 188 char const* greet(unsigned x) 189 { 190 static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; 191 192 if (x > 2) 193 throw std::range_error("greet: index out of range"); 194 195 return msgs[x]; 196 } 197 198To wrap this function in standard C++ using the Python 'C' API, we'd 199need something like this:: 200 201 extern "C" // all Python interactions use 'C' linkage and calling convention 202 { 203 // Wrapper to handle argument/result conversion and checking 204 PyObject* greet_wrap(PyObject* args, PyObject * keywords) 205 { 206 int x; 207 if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments 208 { 209 char const* result = greet(x); // invoke wrapped function 210 return PyString_FromString(result); // convert result to Python 211 } 212 return 0; // error occurred 213 } 214 215 // Table of wrapped functions to be exposed by the module 216 static PyMethodDef methods[] = { 217 { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } 218 , { NULL, NULL, 0, NULL } // sentinel 219 }; 220 221 // module initialization function 222 DL_EXPORT init_hello() 223 { 224 (void) Py_InitModule("hello", methods); // add the methods to the module 225 } 226 } 227 228Now here's the wrapping code we'd use to expose it with Boost.Python:: 229 230 #include <boost/python.hpp> 231 using namespace boost::python; 232 BOOST_PYTHON_MODULE(hello) 233 { 234 def("greet", greet, "return one of 3 parts of a greeting"); 235 } 236 237and here it is in action:: 238 239 >>> import hello 240 >>> for x in range(3): 241 ... print hello.greet(x) 242 ... 243 hello 244 Boost.Python 245 world! 246 247Aside from the fact that the 'C' API version is much more verbose, 248it's worth noting a few things that it doesn't handle correctly: 249 250* The original function accepts an unsigned integer, and the Python 251 'C' API only gives us a way of extracting signed integers. The 252 Boost.Python version will raise a Python exception if we try to pass 253 a negative number to ``hello.greet``, but the other one will proceed 254 to do whatever the C++ implementation does when converting an 255 negative integer to unsigned (usually wrapping to some very large 256 number), and pass the incorrect translation on to the wrapped 257 function. 258 259* That brings us to the second problem: if the C++ ``greet()`` 260 function is called with a number greater than 2, it will throw an 261 exception. Typically, if a C++ exception propagates across the 262 boundary with code generated by a 'C' compiler, it will cause a 263 crash. As you can see in the first version, there's no C++ 264 scaffolding there to prevent this from happening. Functions wrapped 265 by Boost.Python automatically include an exception-handling layer 266 which protects Python users by translating unhandled C++ exceptions 267 into a corresponding Python exception. 268 269* A slightly more-subtle limitation is that the argument conversion 270 used in the Python 'C' API case can only get that integer ``x`` in 271 *one way*. PyArg_ParseTuple can't convert Python ``long`` objects 272 (arbitrary-precision integers) which happen to fit in an ``unsigned 273 int`` but not in a ``signed long``, nor will it ever handle a 274 wrapped C++ class with a user-defined implicit ``operator unsigned 275 int()`` conversion. Boost.Python's dynamic type conversion 276 registry allows users to add arbitrary conversion methods. 277 278================== 279 Library Overview 280================== 281 282This section outlines some of the library's major features. Except as 283neccessary to avoid confusion, details of library implementation are 284omitted. 285 286------------------ 287 Exposing Classes 288------------------ 289 290C++ classes and structs are exposed with a similarly-terse interface. 291Given:: 292 293 struct World 294 { 295 void set(std::string msg) { this->msg = msg; } 296 std::string greet() { return msg; } 297 std::string msg; 298 }; 299 300The following code will expose it in our extension module:: 301 302 #include <boost/python.hpp> 303 BOOST_PYTHON_MODULE(hello) 304 { 305 class_<World>("World") 306 .def("greet", &World::greet) 307 .def("set", &World::set) 308 ; 309 } 310 311Although this code has a certain pythonic familiarity, people 312sometimes find the syntax bit confusing because it doesn't look like 313most of the C++ code they're used to. All the same, this is just 314standard C++. Because of their flexible syntax and operator 315overloading, C++ and Python are great for defining domain-specific 316(sub)languages 317(DSLs), and that's what we've done in Boost.Python. To break it down:: 318 319 class_<World>("World") 320 321constructs an unnamed object of type ``class_<World>`` and passes 322``"World"`` to its constructor. This creates a new-style Python class 323called ``World`` in the extension module, and associates it with the 324C++ type ``World`` in the Boost.Python type conversion registry. We 325might have also written:: 326 327 class_<World> w("World"); 328 329but that would've been more verbose, since we'd have to name ``w`` 330again to invoke its ``def()`` member function:: 331 332 w.def("greet", &World::greet) 333 334There's nothing special about the location of the dot for member 335access in the original example: C++ allows any amount of whitespace on 336either side of a token, and placing the dot at the beginning of each 337line allows us to chain as many successive calls to member functions 338as we like with a uniform syntax. The other key fact that allows 339chaining is that ``class_<>`` member functions all return a reference 340to ``*this``. 341 342So the example is equivalent to:: 343 344 class_<World> w("World"); 345 w.def("greet", &World::greet); 346 w.def("set", &World::set); 347 348It's occasionally useful to be able to break down the components of a 349Boost.Python class wrapper in this way, but the rest of this article 350will stick to the terse syntax. 351 352For completeness, here's the wrapped class in use: :: 353 354 >>> import hello 355 >>> planet = hello.World() 356 >>> planet.set('howdy') 357 >>> planet.greet() 358 'howdy' 359 360Constructors 361============ 362 363Since our ``World`` class is just a plain ``struct``, it has an 364implicit no-argument (nullary) constructor. Boost.Python exposes the 365nullary constructor by default, which is why we were able to write: :: 366 367 >>> planet = hello.World() 368 369However, well-designed classes in any language may require constructor 370arguments in order to establish their invariants. Unlike Python, 371where ``__init__`` is just a specially-named method, In C++ 372constructors cannot be handled like ordinary member functions. In 373particular, we can't take their address: ``&World::World`` is an 374error. The library provides a different interface for specifying 375constructors. Given:: 376 377 struct World 378 { 379 World(std::string msg); // added constructor 380 ... 381 382we can modify our wrapping code as follows:: 383 384 class_<World>("World", init<std::string>()) 385 ... 386 387of course, a C++ class may have additional constructors, and we can 388expose those as well by passing more instances of ``init<...>`` to 389``def()``:: 390 391 class_<World>("World", init<std::string>()) 392 .def(init<double, double>()) 393 ... 394 395Boost.Python allows wrapped functions, member functions, and 396constructors to be overloaded to mirror C++ overloading. 397 398Data Members and Properties 399=========================== 400 401Any publicly-accessible data members in a C++ class can be easily 402exposed as either ``readonly`` or ``readwrite`` attributes:: 403 404 class_<World>("World", init<std::string>()) 405 .def_readonly("msg", &World::msg) 406 ... 407 408and can be used directly in Python: :: 409 410 >>> planet = hello.World('howdy') 411 >>> planet.msg 412 'howdy' 413 414This does *not* result in adding attributes to the ``World`` instance 415``__dict__``, which can result in substantial memory savings when 416wrapping large data structures. In fact, no instance ``__dict__`` 417will be created at all unless attributes are explicitly added from 418Python. Boost.Python owes this capability to the new Python 2.2 type 419system, in particular the descriptor interface and ``property`` type. 420 421In C++, publicly-accessible data members are considered a sign of poor 422design because they break encapsulation, and style guides usually 423dictate the use of "getter" and "setter" functions instead. In 424Python, however, ``__getattr__``, ``__setattr__``, and since 2.2, 425``property`` mean that attribute access is just one more 426well-encapsulated syntactic tool at the programmer's disposal. 427Boost.Python bridges this idiomatic gap by making Python ``property`` 428creation directly available to users. If ``msg`` were private, we 429could still expose it as attribute in Python as follows:: 430 431 class_<World>("World", init<std::string>()) 432 .add_property("msg", &World::greet, &World::set) 433 ... 434 435The example above mirrors the familiar usage of properties in Python 4362.2+: :: 437 438 >>> class World(object): 439 ... __init__(self, msg): 440 ... self.__msg = msg 441 ... def greet(self): 442 ... return self.__msg 443 ... def set(self, msg): 444 ... self.__msg = msg 445 ... msg = property(greet, set) 446 447Operator Overloading 448==================== 449 450The ability to write arithmetic operators for user-defined types has 451been a major factor in the success of both languages for numerical 452computation, and the success of packages like NumPy_ attests to the 453power of exposing operators in extension modules. Boost.Python 454provides a concise mechanism for wrapping operator overloads. The 455example below shows a fragment from a wrapper for the Boost rational 456number library:: 457 458 class_<rational<int> >("rational_int") 459 .def(init<int, int>()) // constructor, e.g. rational_int(3,4) 460 .def("numerator", &rational<int>::numerator) 461 .def("denominator", &rational<int>::denominator) 462 .def(-self) // __neg__ (unary minus) 463 .def(self + self) // __add__ (homogeneous) 464 .def(self * self) // __mul__ 465 .def(self + int()) // __add__ (heterogenous) 466 .def(int() + self) // __radd__ 467 ... 468 469The magic is performed using a simplified application of "expression 470templates" [VELD1995]_, a technique originally developed for 471optimization of high-performance matrix algebra expressions. The 472essence is that instead of performing the computation immediately, 473operators are overloaded to construct a type *representing* the 474computation. In matrix algebra, dramatic optimizations are often 475available when the structure of an entire expression can be taken into 476account, rather than evaluating each operation "greedily". 477Boost.Python uses the same technique to build an appropriate Python 478method object based on expressions involving ``self``. 479 480.. _NumPy: http://www.pfdubois.com/numpy/ 481 482Inheritance 483=========== 484 485C++ inheritance relationships can be represented to Boost.Python by adding 486an optional ``bases<...>`` argument to the ``class_<...>`` template 487parameter list as follows:: 488 489 class_<Derived, bases<Base1,Base2> >("Derived") 490 ... 491 492This has two effects: 493 4941. When the ``class_<...>`` is created, Python type objects 495 corresponding to ``Base1`` and ``Base2`` are looked up in 496 Boost.Python's registry, and are used as bases for the new Python 497 ``Derived`` type object, so methods exposed for the Python ``Base1`` 498 and ``Base2`` types are automatically members of the ``Derived`` 499 type. Because the registry is global, this works correctly even if 500 ``Derived`` is exposed in a different module from either of its 501 bases. 502 5032. C++ conversions from ``Derived`` to its bases are added to the 504 Boost.Python registry. Thus wrapped C++ methods expecting (a 505 pointer or reference to) an object of either base type can be 506 called with an object wrapping a ``Derived`` instance. Wrapped 507 member functions of class ``T`` are treated as though they have an 508 implicit first argument of ``T&``, so these conversions are 509 neccessary to allow the base class methods to be called for derived 510 objects. 511 512Of course it's possible to derive new Python classes from wrapped C++ 513class instances. Because Boost.Python uses the new-style class 514system, that works very much as for the Python built-in types. There 515is one significant detail in which it differs: the built-in types 516generally establish their invariants in their ``__new__`` function, so 517that derived classes do not need to call ``__init__`` on the base 518class before invoking its methods : :: 519 520 >>> class L(list): 521 ... def __init__(self): 522 ... pass 523 ... 524 >>> L().reverse() 525 >>> 526 527Because C++ object construction is a one-step operation, C++ instance 528data cannot be constructed until the arguments are available, in the 529``__init__`` function: :: 530 531 >>> class D(SomeBoostPythonClass): 532 ... def __init__(self): 533 ... pass 534 ... 535 >>> D().some_boost_python_method() 536 Traceback (most recent call last): 537 File "<stdin>", line 1, in ? 538 TypeError: bad argument type for built-in operation 539 540This happened because Boost.Python couldn't find instance data of type 541``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__`` 542function masked construction of the base class. It could be corrected 543by either removing ``D``'s ``__init__`` function or having it call 544``SomeBoostPythonClass.__init__(...)`` explicitly. 545 546Virtual Functions 547================= 548 549Deriving new types in Python from extension classes is not very 550interesting unless they can be used polymorphically from C++. In 551other words, Python method implementations should appear to override 552the implementation of C++ virtual functions when called *through base 553class pointers/references from C++*. Since the only way to alter the 554behavior of a virtual function is to override it in a derived class, 555the user must build a special derived class to dispatch a polymorphic 556class' virtual functions:: 557 558 // 559 // interface to wrap: 560 // 561 class Base 562 { 563 public: 564 virtual int f(std::string x) { return 42; } 565 virtual ~Base(); 566 }; 567 568 int calls_f(Base const& b, std::string x) { return b.f(x); } 569 570 // 571 // Wrapping Code 572 // 573 574 // Dispatcher class 575 struct BaseWrap : Base 576 { 577 // Store a pointer to the Python object 578 BaseWrap(PyObject* self_) : self(self_) {} 579 PyObject* self; 580 581 // Default implementation, for when f is not overridden 582 int f_default(std::string x) { return this->Base::f(x); } 583 // Dispatch implementation 584 int f(std::string x) { return call_method<int>(self, "f", x); } 585 }; 586 587 ... 588 def("calls_f", calls_f); 589 class_<Base, BaseWrap>("Base") 590 .def("f", &Base::f, &BaseWrap::f_default) 591 ; 592 593Now here's some Python code which demonstrates: :: 594 595 >>> class Derived(Base): 596 ... def f(self, s): 597 ... return len(s) 598 ... 599 >>> calls_f(Base(), 'foo') 600 42 601 >>> calls_f(Derived(), 'forty-two') 602 9 603 604Things to notice about the dispatcher class: 605 606* The key element which allows overriding in Python is the 607 ``call_method`` invocation, which uses the same global type 608 conversion registry as the C++ function wrapping does to convert its 609 arguments from C++ to Python and its return type from Python to C++. 610 611* Any constructor signatures you wish to wrap must be replicated with 612 an initial ``PyObject*`` argument 613 614* The dispatcher must store this argument so that it can be used to 615 invoke ``call_method`` 616 617* The ``f_default`` member function is needed when the function being 618 exposed is not pure virtual; there's no other way ``Base::f`` can be 619 called on an object of type ``BaseWrap``, since it overrides ``f``. 620 621Deeper Reflection on the Horizon? 622================================= 623 624Admittedly, this formula is tedious to repeat, especially on a project 625with many polymorphic classes. That it is neccessary reflects some 626limitations in C++'s compile-time introspection capabilities: there's 627no way to enumerate the members of a class and find out which are 628virtual functions. At least one very promising project has been 629started to write a front-end which can generate these dispatchers (and 630other wrapping code) automatically from C++ headers. 631 632Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on 633GCC_XML_, which generates an XML version of GCC's internal program 634representation. Since GCC is a highly-conformant C++ compiler, this 635ensures correct handling of the most-sophisticated template code and 636full access to the underlying type system. In keeping with the 637Boost.Python philosophy, a Pyste interface description is neither 638intrusive on the code being wrapped, nor expressed in some unfamiliar 639language: instead it is a 100% pure Python script. If Pyste is 640successful it will mark a move away from wrapping everything directly 641in C++ for many of our users. It will also allow us the choice to 642shift some of the metaprogram code from C++ to Python. We expect that 643soon, not only our users but the Boost.Python developers themselves 644will be "thinking hybrid" about their own code. 645 646.. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html 647.. _`Pyste`: http://www.boost.org/libs/python/pyste 648 649--------------- 650 Serialization 651--------------- 652 653*Serialization* is the process of converting objects in memory to a 654form that can be stored on disk or sent over a network connection. The 655serialized object (most often a plain string) can be retrieved and 656converted back to the original object. A good serialization system will 657automatically convert entire object hierarchies. Python's standard 658``pickle`` module is just such a system. It leverages the language's strong 659runtime introspection facilities for serializing practically arbitrary 660user-defined objects. With a few simple and unintrusive provisions this 661powerful machinery can be extended to also work for wrapped C++ objects. 662Here is an example:: 663 664 #include <string> 665 666 struct World 667 { 668 World(std::string a_msg) : msg(a_msg) {} 669 std::string greet() const { return msg; } 670 std::string msg; 671 }; 672 673 #include <boost/python.hpp> 674 using namespace boost::python; 675 676 struct World_picklers : pickle_suite 677 { 678 static tuple 679 getinitargs(World const& w) { return make_tuple(w.greet()); } 680 }; 681 682 BOOST_PYTHON_MODULE(hello) 683 { 684 class_<World>("World", init<std::string>()) 685 .def("greet", &World::greet) 686 .def_pickle(World_picklers()) 687 ; 688 } 689 690Now let's create a ``World`` object and put it to rest on disk:: 691 692 >>> import hello 693 >>> import pickle 694 >>> a_world = hello.World("howdy") 695 >>> pickle.dump(a_world, open("my_world", "w")) 696 697In a potentially *different script* on a potentially *different 698computer* with a potentially *different operating system*:: 699 700 >>> import pickle 701 >>> resurrected_world = pickle.load(open("my_world", "r")) 702 >>> resurrected_world.greet() 703 'howdy' 704 705Of course the ``cPickle`` module can also be used for faster 706processing. 707 708Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol 709defined in the standard Python documentation. Like a __getinitargs__ 710function in Python, the pickle_suite's getinitargs() is responsible for 711creating the argument tuple that will be use to reconstruct the pickled 712object. The other elements of the Python pickling protocol, 713__getstate__ and __setstate__ can be optionally provided via C++ 714getstate and setstate functions. C++'s static type system allows the 715library to ensure at compile-time that nonsensical combinations of 716functions (e.g. getstate without setstate) are not used. 717 718Enabling serialization of more complex C++ objects requires a little 719more work than is shown in the example above. Fortunately the 720``object`` interface (see next section) greatly helps in keeping the 721code manageable. 722 723------------------ 724 Object interface 725------------------ 726 727Experienced 'C' language extension module authors will be familiar 728with the ubiquitous ``PyObject*``, manual reference-counting, and the 729need to remember which API calls return "new" (owned) references or 730"borrowed" (raw) references. These constraints are not just 731cumbersome but also a major source of errors, especially in the 732presence of exceptions. 733 734Boost.Python provides a class ``object`` which automates reference 735counting and provides conversion to Python from C++ objects of 736arbitrary type. This significantly reduces the learning effort for 737prospective extension module writers. 738 739Creating an ``object`` from any other type is extremely simple:: 740 741 object s("hello, world"); // s manages a Python string 742 743``object`` has templated interactions with all other types, with 744automatic to-python conversions. It happens so naturally that it's 745easily overlooked:: 746 747 object ten_Os = 10 * s[4]; // -> "oooooooooo" 748 749In the example above, ``4`` and ``10`` are converted to Python objects 750before the indexing and multiplication operations are invoked. 751 752The ``extract<T>`` class template can be used to convert Python objects 753to C++ types:: 754 755 double x = extract<double>(o); 756 757If a conversion in either direction cannot be performed, an 758appropriate exception is thrown at runtime. 759 760The ``object`` type is accompanied by a set of derived types 761that mirror the Python built-in types such as ``list``, ``dict``, 762``tuple``, etc. as much as possible. This enables convenient 763manipulation of these high-level types from C++:: 764 765 dict d; 766 d["some"] = "thing"; 767 d["lucky_number"] = 13; 768 list l = d.keys(); 769 770This almost looks and works like regular Python code, but it is pure 771C++. Of course we can wrap C++ functions which accept or return 772``object`` instances. 773 774================= 775 Thinking hybrid 776================= 777 778Because of the practical and mental difficulties of combining 779programming languages, it is common to settle a single language at the 780outset of any development effort. For many applications, performance 781considerations dictate the use of a compiled language for the core 782algorithms. Unfortunately, due to the complexity of the static type 783system, the price we pay for runtime performance is often a 784significant increase in development time. Experience shows that 785writing maintainable C++ code usually takes longer and requires *far* 786more hard-earned working experience than developing comparable Python 787code. Even when developers are comfortable working exclusively in 788compiled languages, they often augment their systems by some type of 789ad hoc scripting layer for the benefit of their users without ever 790availing themselves of the same advantages. 791 792Boost.Python enables us to *think hybrid*. Python can be used for 793rapidly prototyping a new application; its ease of use and the large 794pool of standard libraries give us a head start on the way to a 795working system. If necessary, the working code can be used to 796discover rate-limiting hotspots. To maximize performance these can 797be reimplemented in C++, together with the Boost.Python bindings 798needed to tie them back into the existing higher-level procedure. 799 800Of course, this *top-down* approach is less attractive if it is clear 801from the start that many algorithms will eventually have to be 802implemented in C++. Fortunately Boost.Python also enables us to 803pursue a *bottom-up* approach. We have used this approach very 804successfully in the development of a toolbox for scientific 805applications. The toolbox started out mainly as a library of C++ 806classes with Boost.Python bindings, and for a while the growth was 807mainly concentrated on the C++ parts. However, as the toolbox is 808becoming more complete, more and more newly added functionality can be 809implemented in Python. 810 811.. image:: images/python_cpp_mix.png 812 813This figure shows the estimated ratio of newly added C++ and Python 814code over time as new algorithms are implemented. We expect this 815ratio to level out near 70% Python. Being able to solve new problems 816mostly in Python rather than a more difficult statically typed 817language is the return on our investment in Boost.Python. The ability 818to access all of our code from Python allows a broader group of 819developers to use it in the rapid development of new applications. 820 821===================== 822 Development history 823===================== 824 825The first version of Boost.Python was developed in 2000 by Dave 826Abrahams at Dragon Systems, where he was privileged to have Tim Peters 827as a guide to "The Zen of Python". One of Dave's jobs was to develop 828a Python-based natural language processing system. Since it was 829eventually going to be targeting embedded hardware, it was always 830assumed that the compute-intensive core would be rewritten in C++ to 831optimize speed and memory footprint [#proto]_. The project also wanted to 832test all of its C++ code using Python test scripts [#test]_. The only 833tool we knew of for binding C++ and Python was SWIG_, and at the time 834its handling of C++ was weak. It would be false to claim any deep 835insight into the possible advantages of Boost.Python's approach at 836this point. Dave's interest and expertise in fancy C++ template 837tricks had just reached the point where he could do some real damage, 838and Boost.Python emerged as it did because it filled a need and 839because it seemed like a cool thing to try. 840 841This early version was aimed at many of the same basic goals we've 842described in this paper, differing most-noticeably by having a 843slightly more cumbersome syntax and by lack of special support for 844operator overloading, pickling, and component-based development. 845These last three features were quickly added by Ullrich Koethe and 846Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived 847on the scene to contribute enhancements like support for nested 848modules and static member functions. 849 850By early 2001 development had stabilized and few new features were 851being added, however a disturbing new fact came to light: Ralf had 852begun testing Boost.Python on pre-release versions of a compiler using 853the EDG_ front-end, and the mechanism at the core of Boost.Python 854responsible for handling conversions between Python and C++ types was 855failing to compile. As it turned out, we had been exploiting a very 856common bug in the implementation of all the C++ compilers we had 857tested. We knew that as C++ compilers rapidly became more 858standards-compliant, the library would begin failing on more 859platforms. Unfortunately, because the mechanism was so central to the 860functioning of the library, fixing the problem looked very difficult. 861 862Fortunately, later that year Lawrence Berkeley and later Lawrence 863Livermore National labs contracted with `Boost Consulting`_ for support 864and development of Boost.Python, and there was a new opportunity to 865address fundamental issues and ensure a future for the library. A 866redesign effort began with the low level type conversion architecture, 867building in standards-compliance and support for component-based 868development (in contrast to version 1 where conversions had to be 869explicitly imported and exported across module boundaries). A new 870analysis of the relationship between the Python and C++ objects was 871done, resulting in more intuitive handling for C++ lvalues and 872rvalues. 873 874The emergence of a powerful new type system in Python 2.2 made the 875choice of whether to maintain compatibility with Python 1.5.2 easy: 876the opportunity to throw away a great deal of elaborate code for 877emulating classic Python classes alone was too good to pass up. In 878addition, Python iterators and descriptors provided crucial and 879elegant tools for representing similar C++ constructs. The 880development of the generalized ``object`` interface allowed us to 881further shield C++ programmers from the dangers and syntactic burdens 882of the Python 'C' API. A great number of other features including C++ 883exception translation, improved support for overloaded functions, and 884most significantly, CallPolicies for handling pointers and 885references, were added during this period. 886 887In October 2002, version 2 of Boost.Python was released. Development 888since then has concentrated on improved support for C++ runtime 889polymorphism and smart pointers. Peter Dimov's ingenious 890``boost::shared_ptr`` design in particular has allowed us to give the 891hybrid developer a consistent interface for moving objects back and 892forth across the language barrier without loss of information. At 893first, we were concerned that the sophistication and complexity of the 894Boost.Python v2 implementation might discourage contributors, but the 895emergence of Pyste_ and several other significant feature 896contributions have laid those fears to rest. Daily questions on the 897Python C++-sig and a backlog of desired improvements show that the 898library is getting used. To us, the future looks bright. 899 900.. _`EDG`: http://www.edg.com 901 902============= 903 Conclusions 904============= 905 906Boost.Python achieves seamless interoperability between two rich and 907complimentary language environments. Because it leverages template 908metaprogramming to introspect about types and functions, the user 909never has to learn a third syntax: the interface definitions are 910written in concise and maintainable C++. Also, the wrapping system 911doesn't have to parse C++ headers or represent the type system: the 912compiler does that work for us. 913 914Computationally intensive tasks play to the strengths of C++ and are 915often impossible to implement efficiently in pure Python, while jobs 916like serialization that are trivial in Python can be very difficult in 917pure C++. Given the luxury of building a hybrid software system from 918the ground up, we can approach design with new confidence and power. 919 920=========== 921 Citations 922=========== 923 924.. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report, 925 Vol. 7 No. 5 June 1995, pp. 26-31. 926 http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html 927 928=========== 929 Footnotes 930=========== 931 932.. [#proto] In retrospect, it seems that "thinking hybrid" from the 933 ground up might have been better for the NLP system: the 934 natural component boundaries defined by the pure python 935 prototype turned out to be inappropriate for getting the 936 desired performance and memory footprint out of the C++ core, 937 which eventually caused some redesign overhead on the Python 938 side when the core was moved to C++. 939 940.. [#test] We also have some reservations about driving all C++ 941 testing through a Python interface, unless that's the only way 942 it will be ultimately used. Any transition across language 943 boundaries with such different object models can inevitably 944 mask bugs. 945 946.. [#feature] These features were expressed very differently in v1 of 947 Boost.Python 948