1Bench Template Library 2 3**************************************** 4Introduction : 5 6The aim of this project is to compare the performance 7of available numerical libraries. The code is designed 8as generic and modular as possible. Thus, adding new 9numerical libraries or new numerical tests should 10require minimal effort. 11 12 13***************************************** 14 15Installation : 16 17BTL uses cmake / ctest: 18 191 - create a build directory: 20 21 $ mkdir build 22 $ cd build 23 242 - configure: 25 26 $ ccmake .. 27 283 - run the bench using ctest: 29 30 $ ctest -V 31 32You can run the benchmarks only on libraries matching a given regular expression: 33 ctest -V -R <regexp> 34For instance: 35 ctest -V -R eigen2 36 37You can also select a given set of actions defining the environment variable BTL_CONFIG this way: 38 BTL_CONFIG="-a action1{:action2}*" ctest -V 39An exemple: 40 BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2 41 42Finally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option: 43 BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2 44 454 : Analyze the result. different data files (.dat) are produced in each libs directories. 46 If gnuplot is available, choose a directory name in the data directory to store the results and type: 47 $ cd data 48 $ mkdir my_directory 49 $ cp ../libs/*/*.dat my_directory 50 Build the data utilities in this (data) directory 51 make 52 Then you can look the raw data, 53 go_mean my_directory 54 or smooth the data first : 55 smooth_all.sh my_directory 56 go_mean my_directory_smooth 57 58 59************************************************* 60 61Files and directories : 62 63 generic_bench : all the bench sources common to all libraries 64 65 actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested. 66 67 libs/* : bench sources specific to each tested libraries. 68 69 machine_dep : directory used to store machine specific Makefile.in 70 71 data : directory used to store gnuplot scripts and data analysis utilities 72 73************************************************** 74 75Principles : the code modularity is achieved by defining two concepts : 76 77 ****** Action concept : This is a class defining which kind 78 of test must be performed (e.g. a matrix_vector_product). 79 An Action should define the following methods : 80 81 *** Ctor using the size of the problem (matrix or vector size) as an argument 82 Action action(size); 83 *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments) 84 action.initialize(); 85 *** calculate : this method actually launch the calculation to be benchmarked 86 action.calculate; 87 *** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation) 88 *** name() : this method returns the name of the action (std::string) 89 90 ****** Interface concept : This is a class or namespace defining how to use a given library and 91 its specific containers (matrix and vector). Up to now an interface should following types 92 93 *** real_type : kind of float to be used (float or double) 94 *** stl_vector : must correspond to std::vector<real_type> 95 *** stl_matrix : must correspond to std::vector<stl_vector> 96 *** gene_vector : the vector type for this interface --> e.g. (real_type *) for the C_interface 97 *** gene_matrix : the matrix type for this interface --> e.g. (gene_vector *) for the C_interface 98 99 + the following common methods 100 101 *** free_matrix(gene_matrix & A, int N) dealocation of a N sized gene_matrix A 102 *** free_vector(gene_vector & B) dealocation of a N sized gene_vector B 103 *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A. 104 The allocation of A is done in this function. 105 *** vector_to_stl(gene_vector & B, stl_vector & B_stl) copy the content of an stl_vector B_stl into a gene_vector B. 106 The allocation of B is done in this function. 107 *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl. 108 The size of A_STL must corresponds to the size of A. 109 *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl. 110 The size of B_STL must corresponds to the size of B. 111 *** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source 112 and cible must be sized NxN. 113 *** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source 114 and cible must be sized N. 115 116 and the following method corresponding to the action one wants to be benchmarked : 117 118 *** matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N) 119 *** matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N) 120 *** ata_product(const gene_matrix & A, gene_matrix & X, int N) 121 *** aat_product(const gene_matrix & A, gene_matrix & X, int N) 122 *** axpy(real coef, const gene_vector & X, gene_vector & Y, int N) 123 124 The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with 125 an interface. A typical main.cpp source stored in a given library directory libs/A_LIB 126 looks like : 127 128 bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; 129 130 this function will produce XY data file containing measured mflops as a function of the size for 50 131 sizes between 10 and 10000. 132 133 This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time 134 measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides 135 a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer 136 so 137 138 bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; 139 140 is equivalent to 141 142 bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; 143 144 If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer). 145 replace 146 bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point); 147 with 148 bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point); 149 in generic/bench.hh 150 151. 152 153 154 155