1[template perf[name value] [value]] 2[template para[text] '''<para>'''[text]'''</para>'''] 3 4[mathpart perf Performance] 5 6[section:perf_over2 Performance Overview] 7[performance_overview] 8[endsect] 9 10[section:interp Interpreting these Results] 11 12In all of the following tables, the best performing 13result in each row, is assigned a relative value of "1" and shown 14in bold, so a score of "2" means ['"twice as slow as the best 15performing result".] Actual timings in nano-seconds per function call 16are also shown in parenthesis. To make the results easier to read, they 17are color-coded as follows: the best result and everything within 20% of 18it is green, anything that's more than twice as slow as the best result is red, 19and results in between are blue. 20 21Result were obtained on a system 22with an Intel core i7 4710MQ with 16Gb RAM and running 23either Windows 8.1 or Xubuntu Linux. 24 25[caution As usual with performance results these should be taken with a large pinch 26of salt: relative performance is known to shift quite a bit depending 27upon the architecture of the particular test system used. Further 28more, our performance results were obtained using our own test data: 29these test values are designed to provide good coverage of our code and test 30all the appropriate corner cases. They do not necessarily represent 31"typical" usage: whatever that may be! 32] 33 34[endsect] [/section:interp Interpreting these Results] 35 36[section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options] 37 38By far the most important thing you can do when using this library 39is turn on your compiler's optimisation options. As the following 40table shows the penalty for using the library in debug mode can be 41quite large. In addition switching to 64-bit code has a small but noticeable 42improvement in performance, as does switching to a different compiler 43(Intel C++ 15 in this example). 44 45[table_Compiler_Option_Comparison_on_Windows_x64] 46 47[endsect] [/section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options] 48 49[section:tradoffs Trading Accuracy for Performance] 50 51There are a number of [link policy Policies] that can be used to trade accuracy for performance: 52 53* Internal promotion: by default functions with `float` arguments are evaluated at `double` precision 54internally to ensure full precision in the result. Similarly `double` precision functions are 55evaluated at `long double` precision internally by default. Changing these defaults can have a significant 56speed advantage at the expense of accuracy, note also that evaluating using `float` internally may result in 57numerical instability for some of the more complex algorithms, we suggest you use this option with care. 58* Target accuracy: just because you choose to evaluate at `double` precision doesn't mean you necessarily want 59to target full 16-digit accuracy, if you wish you can change the default (full machine precision) to whatever 60is "good enough" for your particular use case. 61 62For example, suppose you want to evaluate `double` precision functions at `double` precision internally, you 63can change the global default by passing `-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false` on the command line, or 64at the point of call via something like this: 65 66 double val = boost::math::erf(my_argument, boost::math::policies::make_policy(boost::math::policies::promote_double<false>())); 67 68However, an easier option might be: 69 70 #include <boost/math/special_functions.hpp> // Or any individual special function header 71 72 namespace math{ 73 74 namespace precise{ 75 // 76 // Define a Policy for accurate evaluation - this is the same as the default, unless 77 // someone has changed the global defaults. 78 // 79 typedef boost::math::policies::policy<> accurate_policy; 80 // 81 // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS to declare 82 // functions that use the above policy. Note no trailing 83 // ";" required on the macro call: 84 // 85 BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(accurate_policy) 86 87 88 } 89 90 namespace fast{ 91 // 92 // Define a Policy for fast evaluation: 93 // 94 using namespace boost::math::policies[ 95 typedef policy<promote_double<false> > fast_policy; 96 // 97 // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS: 98 // 99 BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy) 100 101 } 102 103 } 104 105And now one can call: 106 107 math::accurate::tgamma(x); 108 109For the "accurate" version of tgamma, and: 110 111 math::fast::tgamma(x); 112 113For the faster version. 114 115Had we wished to change the target precision (to 9 decimal places) as well as the evaluation type used, we might have done: 116 117 namespace math{ 118 namespace fast{ 119 // 120 // Define a Policy for fast evaluation: 121 // 122 using namespace boost::math::policies; 123 typedef policy<promote_double<false>, digits10<9> > fast_policy; 124 // 125 // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS: 126 // 127 BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy) 128 129 } 130 } 131 132One can do a similar thing with the distribution classes: 133 134 #include <boost/math/distributions.hpp> // or any individual distribution header 135 136 namespace math{ namespace fast{ 137 // 138 // Define a policy for fastest possible evaluation: 139 // 140 using namespace boost::math::policies; 141 typedef policy<promote_float<false> > fast_float_policy; 142 // 143 // Invoke BOOST_MATH_DECLARE_DISTRIBUTIONS 144 // 145 BOOST_MATH_DECLARE_DISTRIBUTIONS(float, fast_float_policy) 146 147 }} // namespaces 148 149 // 150 // And use: 151 // 152 float p_val = cdf(math::fast::normal(1.0f, 3.0f), 0.25f); 153 154Here's how these options change the relative performance of the distributions on Linux: 155 156[table_Distribution_performance_comparison_for_different_performance_options_with_GNU_C_version_9_2_1_20191008_on_linux] 157 158[endsect] [/section:tradoffs Trading Accuracy for Performance] 159 160[section:multiprecision Cost of High-Precision Non-built-in Floating-point] 161 162Using user-defined floating-point like __multiprecision has a very high run-time cost. 163 164To give some flavour of this: 165 166[table:linpack_time Linpack Benchmark 167[[floating-point type] [speed Mflops]] 168[[double] [2727]] 169[[__float128] [35]] 170[[multiprecision::float128] [35]] 171[[multiprecision::cpp_bin_float_quad] [6]] 172] 173 174[endsect] [/section:multiprecision Cost of High-Precision Non-built-in Floating-point] 175 176 177[section:tuning Performance Tuning Macros] 178 179There are a small number of performance tuning options 180that are determined by configuration macros. These should be set 181in boost/math/tools/user.hpp; or else reported to the Boost-development 182mailing list so that the appropriate option for a given compiler and 183OS platform can be set automatically in our configuration setup. 184 185[table 186[[Macro][Meaning]] 187[[BOOST_MATH_POLY_METHOD] 188 [Determines how polynomials and most rational functions 189 are evaluated. Define to one 190 of the values 0, 1, 2 or 3: see below for the meaning of these values.]] 191[[BOOST_MATH_RATIONAL_METHOD] 192 [Determines how symmetrical rational functions are evaluated: mostly 193 this only effects how the Lanczos approximation is evaluated, and how 194 the `evaluate_rational` function behaves. Define to one 195 of the values 0, 1, 2 or 3: see below for the meaning of these values. 196 ]] 197[[BOOST_MATH_MAX_POLY_ORDER] 198 [The maximum order of polynomial or rational function that will 199 be evaluated by a method other than 0 (a simple "for" loop). 200 ]] 201[[BOOST_MATH_INT_TABLE_TYPE(RT, IT)] 202 [Many of the coefficients to the polynomials and rational functions 203 used by this library are integers. Normally these are stored as tables 204 as integers, but if mixed integer / floating point arithmetic is much 205 slower than regular floating point arithmetic then they can be stored 206 as tables of floating point values instead. If mixed arithmetic is slow 207 then add: 208 209 #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT 210 211 to boost/math/tools/user.hpp, otherwise the default of: 212 213 #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT 214 215 Set in boost/math/config.hpp is fine, and may well result in smaller 216 code. 217 ]] 218] 219 220The values to which `BOOST_MATH_POLY_METHOD` and `BOOST_MATH_RATIONAL_METHOD` 221may be set are as follows: 222 223[table 224[[Value][Effect]] 225[[0][The polynomial or rational function is evaluated using Horner's 226 method, and a simple for-loop. 227 228 Note that if the order of the polynomial 229 or rational function is a runtime parameter, or the order is 230 greater than the value of `BOOST_MATH_MAX_POLY_ORDER`, then 231 this method is always used, irrespective of the value 232 of `BOOST_MATH_POLY_METHOD` or `BOOST_MATH_RATIONAL_METHOD`.]] 233[[1][The polynomial or rational function is evaluated without 234 the use of a loop, and using Horner's method. This only occurs 235 if the order of the polynomial is known at compile time and is less 236 than or equal to `BOOST_MATH_MAX_POLY_ORDER`. ]] 237[[2][The polynomial or rational function is evaluated without 238 the use of a loop, and using a second order Horner's method. 239 In theory this permits two operations to occur in parallel 240 for polynomials, and four in parallel for rational functions. 241 This only occurs 242 if the order of the polynomial is known at compile time and is less 243 than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]] 244[[3][The polynomial or rational function is evaluated without 245 the use of a loop, and using a second order Horner's method. 246 In theory this permits two operations to occur in parallel 247 for polynomials, and four in parallel for rational functions. 248 This differs from method "2" in that the code is carefully ordered 249 to make the parallelisation more obvious to the compiler: rather than 250 relying on the compiler's optimiser to spot the parallelisation 251 opportunities. 252 This only occurs 253 if the order of the polynomial is known at compile time and is less 254 than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]] 255] 256 257The performance test suite generates a report for your particular compiler showing which method is likely to work best, 258the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux). There's not much to choose between 259the various methods, but generally loop-unrolled methods perform better. Interestingly, ordering the code 260to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below). 261 262[table_Polynomial_Method_Comparison_with_Microsoft_Visual_C_version_14_2_on_Windows_x64] 263 264[table_Rational_Method_Comparison_with_Microsoft_Visual_C_version_14_2_on_Windows_x64] 265 266[table_Polynomial_Method_Comparison_with_GNU_C_version_9_2_1_20191008_on_linux] 267 268[table_Rational_Method_Comparison_with_GNU_C_version_9_2_1_20191008_on_linux] 269 270[table_Polynomial_Method_Comparison_with_Clang_version_9_0_0_tags_RELEASE_900_final_on_linux] 271 272[table_Rational_Method_Comparison_with_Clang_version_9_0_0_tags_RELEASE_900_final_on_linux] 273 274[table_Polynomial_Method_Comparison_with_Intel_C_C_0x_mode_version_1910_on_linux] 275 276[table_Rational_Method_Comparison_with_Intel_C_C_0x_mode_version_1910_on_linux] 277 278[endsect] [/section:tuning Performance Tuning Macros] 279 280[section:comp_compilers Comparing Different Compilers] 281 282By running our performance test suite multiple times, we can compare the effect of different compilers: as 283might be expected, the differences are generally small compared to say disabling internal use of `long double`. 284However, there are still gains to be main, particularly from some of the commercial offerings: 285 286[table_Compiler_Comparison_on_Windows_x64] 287 288[table_Compiler_Comparison_on_linux] 289 290[endsect] [/section:comp_compilers Comparing Different Compilers] 291 292[section:comparisons Comparisons to Other Open Source Libraries] 293 294We've run our performance tests both for our own code, and against other 295open source implementations of the same functions. The results are 296presented below to give you a rough idea of how they all compare. 297In order to give a more-or-less level playing field our test data 298was screened against all the libraries being tested, and any 299unsupported domains removed, likewise for any test cases that gave large errors 300or unexpected non-finite values. 301 302[caution 303You should exercise extreme caution when interpreting 304these results, relative performance may vary by platform, by compiler options settings, 305the tests use data that gives good code coverage of /our/ code, but which may skew the 306results towards the corner cases. Finally, remember that different 307libraries make different choices with regard to performance verses 308numerical stability. 309] 310 311The first results compare standard library functions to Boost equivalents with MSVC-14.0: 312 313[table_Library_Comparison_with_Microsoft_Visual_C_version_14_2_on_Windows_x64] 314 315On Linux with GCC, we can also compare to the TR1 functions, and to GSL and RMath: 316 317[table_Library_Comparison_with_GNU_C_version_9_2_1_20191008_on_linux] 318 319And finally we can compare the statistical distributions to GSL, RMath and DCDFLIB: 320 321[table_Distribution_performance_comparison_with_GNU_C_version_9_2_1_20191008_on_linux] 322 323[endsect] [/section:comparisons Comparisons to Other Open Source Libraries] 324 325[section:perf_test_app The Performance Test Applications] 326 327Under ['boost-path]\/libs\/math\/reporting\/performance you will find 328some reasonable comprehensive performance test applications for this library. 329 330In order to generate the tables you will have seen in this documentation (or others 331for your specific compiler) you need to invoke `bjam` in this directory, using a C++11 332capable compiler. Note that 333results extend/overwrite whatever is already present in 334['boost-path]\/libs\/math\/reporting\/performance\/doc\/performance_tables.qbk, 335you may want to delete this file before you begin so as to make a fresh start for 336your particular system. 337 338The programs produce results in Boost's Quickbook format which is not terribly 339human readable. If you configure your user-config.jam to be able to build Docbook 340documentation, then you will also get a full summary of all the data in HTML format 341in ['boost-path]\/libs\/math\/reporting\/performance\/html\/index.html. Assuming 342you're on a 'nix-like platform the procedure to do this is to first install the 343`xsltproc`, `Docbook DTD`, and `Bookbook XSL` packages. Then: 344 345* Copy ['boost-path]\/tools\/build\/example\/user-config.jam to your home directory. 346* Add `using xsltproc ;` to the end of the file (note the space surrounding each token, including the final ";", this is important!) 347This assumes that `xsltproc` is in your path. 348* Add `using boostbook : path-to-xsl-stylesheets : path-to-dtd ;` to the end of the file. The `path-to-dtd` should point 349to version 4.2.x of the Docbook DTD, while `path-to-xsl-stylesheets` should point to the folder containing the latest XSLT stylesheets. 350Both paths should use all forward slashes even on Windows. 351 352At this point you should be able to run the tests and generate the HTML summary, if GSL, RMath or libstdc++ are 353present in the compilers path they will be automatically tested. For DCDFLIB you will need to place the C 354source in ['boost-path]\/libs\/math\/reporting\/performance\/third_party\/dcdflib. 355 356If you want to compare multiple compilers, or multiple options for one compiler, then you will 357need to invoke `bjam` multiple times, once for each compiler. Note that in order to test 358multiple configurations of the same compiler, each has to be given a unique name in the test 359program, otherwise they all edit the same table cells. Suppose you want to test GCC with 360and without the -ffast-math option, in this case bjam would be invoked first as: 361 362 bjam toolset=gcc -a cxxflags=-std=gnu++11 363 364Which would run the tests using default optimization options (-O3), we can then run again 365using -ffast-math: 366 367 bjam toolset=gcc -a cxxflags='-std=gnu++11 -ffast-math' define=COMPILER_NAME='"GCC with -ffast-math"' 368 369In the command line above, the -a flag forces a full rebuild, and the preprocessor define COMPILER_NAME needs to be set 370to a string literal describing the compiler configuration, hence the double quotes - one for the command line, one for the 371compiler. 372 373[endsect] [/section:perf_test_app The Performance Test Applications] 374 375[endmathpart] [/mathpart perf Performance] 376 377[/ 378 Copyright 2006 John Maddock and Paul A. Bristow. 379 Distributed under the Boost Software License, Version 1.0. 380 (See accompanying file LICENSE_1_0.txt or copy at 381 http://www.boost.org/LICENSE_1_0.txt). 382] 383 384 385