Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Performance

Container definitions
Insertion tests
Processing tests

(Testing program at perf.cpp.)

We ran tests to measure the performance of the containers of Boost.PolyCollection in two scenarios:

As a comparison baseline we used containers and facilities from the standard library and Boost (details below). Tests were run in the following environments:

  • Baseline container: ptr_vector = boost::ptr_vector<base>
  • Polymorphic collection: base_collection = boost::base_collection<base>
  • Element types: T1 = derived1, T2 = derived2, T3 = derived3
struct base
{
  virtual ~base()=default;
  virtual int operator()(int)const=0;
};

struct derived1 final:base
{
  derived1(int n):n{n}{}
  virtual int operator()(int)const{return n;}

  int n;
};

struct derived2 final:base
{
  derived2(int n):n{n}{}
  virtual int operator()(int x)const{return x*n;}

  int unused,n;
};

struct derived3 final:base
{
  derived3(int n):n{n}{}
  virtual int operator()(int x)const{return x*x*n;}

  int unused,n;
};
  • Baseline container: func_vector = std::vector<std::function<int(int)>>
  • Polymorphic collection: function_collection = boost::function_collection<int(int)>
  • Element types: T1 = concrete1, T2 = concrete2, T3 = concrete3
struct concrete1
{
  concrete1(int n):n{n}{}
  int operator()(int)const{return n;}

  int n;
};

struct concrete2
{
  concrete2(int n):n{n}{}
  int operator()(int x)const{return x*n;}

  int unused,n;
};

struct concrete3
{
  concrete3(int n):n{n}{}
  int operator()(int x)const{return x*x*n;}

  int unused,n;
};
  • Baseline container: any_vector = std::vector<boost::type_erasure::any<concept_>>
  • Polymorphic collection: any_collection = boost::any_collection<concept_>
  • Element types: T1 = int, T2 = double, T3 = char
using concept_=boost::mpl::vector<
  boost::type_erasure::copy_constructible<>,
  boost::type_erasure::relaxed,
  boost::type_erasure::typeid_<>,
  boost::type_erasure::incrementable<>
>;

Tests measure the time taken to insert n elements (n between 102 and 107) from a source of values with types following the cyclic sequence

T1 T1 T2 T2 T3 T1 T1 T2 T2 T3 ···

No reserve operation is done before insertion. The figures show resulting times in nanoseconds/element. The horizontal axis is logarithmic.

Insertion, Visual Studio 2015 x86

Insertion, Visual Studio 2015 x64

Insertion, GCC 6.3 x64

Insertion, Clang 4.0 x64

Insertion, Visual Studio 2015 x86

Insertion, Visual Studio 2015 x64

Insertion, GCC 6.3 x64

Insertion, Clang 4.0 x64

Insertion, Visual Studio 2015 x86

Insertion, Visual Studio 2015 x64

Insertion, GCC 6.3 x64

Insertion, Clang 4.0 x64

Tests measure the time taken to traverse a container of size n (n between 102 and 107) and execute an operation on each of its elements. The operation for boost::base_collection and boost::function_collection (and the associated baseline containers) is defined as

struct for_each_callable
{
  for_each_callable():res{0}{}

  template<typename T>
  void operator()(T& x){
    res+=x(2);
  }

  int res;
};

whereas for boost::any_collection we use

struct for_each_incrementable
{
  for_each_incrementable():res{0}{}

  template<typename T>
  void operator()(T& x){
    ++x;
    ++res;
  }

  int res;
};

The baseline container is tested with three different setups:

  • Directly as initialized by the process described for the insertion tests. The sequence of types is complex enough that CPU's branch prediction mechanisms are not able to fully anticipate it [21]. As elements are ordered according to their construction time, certain degree of memory contiguity is expected.
  • With an extra post-insertion stage by which elements are sorted according to their typeid(). This increases branch prediction efficiency at the expense of having worse cache friendliness.
  • With an extra post-insertion stage that randomly shuffles all the elements in the container. This is the worst possible scenario both in terms of caching and branch prediction.

As for the polymorphic collection, three variations are measured:

  • With std::for_each (the same as the baseline container).
  • Using boost::poly_collection::for_each.
  • Using boost::poly_collection::for_each with type restitution of T1, T2 and T3.

The figures show resulting times in nanoseconds/element. The horizontal axis is logarithmic.

Processing, Visual Studio 2015 x86

Processing, Visual Studio 2015 x64

Processing, GCC 6.3 x64

Processing, Clang 4.0 x64

Processing, Visual Studio 2015 x86

Processing, Visual Studio 2015 x64

Processing, GCC 6.3 x64

Processing, Clang 4.0 x64

Processing, Visual Studio 2015 x86

Processing, Visual Studio 2015 x64

Processing, GCC 6.3 x64

Processing, Clang 4.0 x64



[21] This has been verified empirically: simpler cycles did indeed yield better execution times.


PrevUpHomeNext