• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[template perf[name value] [value]]
2[template para[text] '''<para>'''[text]'''</para>''']
3
4[mathpart perf Performance]
5
6[section:perf_over2 Performance Overview]
7[performance_overview]
8[endsect]
9
10[section:interp Interpreting these Results]
11
12In all of the following tables, the best performing
13result in each row, is assigned a relative value of "1" and shown
14in bold, so a score of "2" means ['"twice as slow as the best
15performing result".]  Actual timings in nano-seconds per function call
16are also shown in parenthesis.  To make the results easier to read, they
17are color-coded as follows: the best result and everything within 20% of
18it is green, anything that's more than twice as slow as the best result is red,
19and results in between are blue.
20
21Result were obtained on a system
22with an Intel core i7 4710MQ with 16Gb RAM and running
23either Windows 8.1 or Xubuntu Linux.
24
25[caution As usual with performance results these should be taken with a large pinch
26of salt: relative performance is known to shift quite a bit depending
27upon the architecture of the particular test system used.  Further
28more, our performance results were obtained using our own test data:
29these test values are designed to provide good coverage of our code and test
30all the appropriate corner cases.  They do not necessarily represent
31"typical" usage: whatever that may be!
32]
33
34[endsect] [/section:interp Interpreting these Results]
35
36[section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
37
38By far the most important thing you can do when using this library
39is turn on your compiler's optimisation options.  As the following
40table shows the penalty for using the library in debug mode can be
41quite large.  In addition switching to 64-bit code has a small but noticeable
42improvement in performance, as does switching to a different compiler
43(Intel C++ 15 in this example).
44
45[table_Compiler_Option_Comparison_on_Windows_x64]
46
47[endsect] [/section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
48
49[section:tradoffs Trading Accuracy for Performance]
50
51There are a number of [link policy Policies] that can be used to trade accuracy for performance:
52
53* Internal promotion: by default functions with `float` arguments are evaluated at `double` precision
54internally to ensure full precision in the result.  Similarly `double` precision functions are
55evaluated at `long double` precision internally by default.  Changing these defaults can have a significant
56speed advantage at the expense of accuracy, note also that evaluating using `float` internally may result in
57numerical instability for some of the more complex algorithms, we suggest you use this option with care.
58* Target accuracy: just because you choose to evaluate at `double` precision doesn't mean you necessarily want
59to target full 16-digit accuracy, if you wish you can change the default (full machine precision) to whatever
60is "good enough" for your particular use case.
61
62For example, suppose you want to evaluate `double` precision functions at `double` precision internally, you
63can change the global default by passing `-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false` on the command line, or
64at the point of call via something like this:
65
66   double val = boost::math::erf(my_argument, boost::math::policies::make_policy(boost::math::policies::promote_double<false>()));
67
68However, an easier option might be:
69
70   #include <boost/math/special_functions.hpp> // Or any individual special function header
71
72   namespace math{
73
74   namespace precise{
75   //
76   // Define a Policy for accurate evaluation - this is the same as the default, unless
77   // someone has changed the global defaults.
78   //
79   typedef boost::math::policies::policy<> accurate_policy;
80   //
81   // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS to declare
82   // functions that use the above policy.  Note no trailing
83   // ";" required on the macro call:
84   //
85   BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(accurate_policy)
86
87
88   }
89
90   namespace fast{
91   //
92   // Define a Policy for fast evaluation:
93   //
94   using namespace boost::math::policies[
95   typedef policy<promote_double<false> > fast_policy;
96   //
97   // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
98   //
99   BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
100
101   }
102
103   }
104
105And now one can call:
106
107   math::accurate::tgamma(x);
108
109For the "accurate" version of tgamma, and:
110
111   math::fast::tgamma(x);
112
113For the faster version.
114
115Had we wished to change the target precision (to 9 decimal places) as well as the evaluation type used, we might have done:
116
117   namespace math{
118   namespace fast{
119   //
120   // Define a Policy for fast evaluation:
121   //
122   using namespace boost::math::policies;
123   typedef policy<promote_double<false>, digits10<9> > fast_policy;
124   //
125   // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
126   //
127   BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
128
129   }
130   }
131
132One can do a similar thing with the distribution classes:
133
134   #include <boost/math/distributions.hpp> // or any individual distribution header
135
136   namespace math{ namespace fast{
137   //
138   // Define a policy for fastest possible evaluation:
139   //
140   using namespace boost::math::policies;
141   typedef policy<promote_float<false> > fast_float_policy;
142   //
143   // Invoke BOOST_MATH_DECLARE_DISTRIBUTIONS
144   //
145   BOOST_MATH_DECLARE_DISTRIBUTIONS(float, fast_float_policy)
146
147   }} // namespaces
148
149   //
150   // And use:
151   //
152   float p_val = cdf(math::fast::normal(1.0f, 3.0f), 0.25f);
153
154Here's how these options change the relative performance of the distributions on Linux:
155
156[table_Distribution_performance_comparison_for_different_performance_options_with_GNU_C_version_9_2_1_20191008_on_linux]
157
158[endsect] [/section:tradoffs Trading Accuracy for Performance]
159
160[section:multiprecision Cost of High-Precision Non-built-in Floating-point]
161
162Using user-defined floating-point like __multiprecision has a very high run-time cost.
163
164To give some flavour of this:
165
166[table:linpack_time Linpack Benchmark
167[[floating-point type]                            [speed Mflops]]
168[[double]                                                [2727]]
169[[__float128]                                          [35]]
170[[multiprecision::float128]                    [35]]
171[[multiprecision::cpp_bin_float_quad] [6]]
172]
173
174[endsect] [/section:multiprecision Cost of High-Precision Non-built-in Floating-point]
175
176
177[section:tuning Performance Tuning Macros]
178
179There are a small number of performance tuning options
180that are determined by configuration macros.  These should be set
181in boost/math/tools/user.hpp; or else reported to the Boost-development
182mailing list so that the appropriate option for a given compiler and
183OS platform can be set automatically in our configuration setup.
184
185[table
186[[Macro][Meaning]]
187[[BOOST_MATH_POLY_METHOD]
188   [Determines how polynomials and most rational functions
189   are evaluated.  Define to one
190   of the values 0, 1, 2 or 3: see below for the meaning of these values.]]
191[[BOOST_MATH_RATIONAL_METHOD]
192   [Determines how symmetrical rational functions are evaluated: mostly
193   this only effects how the Lanczos approximation is evaluated, and how
194   the `evaluate_rational` function behaves.  Define to one
195   of the values 0, 1, 2 or 3: see below for the meaning of these values.
196   ]]
197[[BOOST_MATH_MAX_POLY_ORDER]
198   [The maximum order of polynomial or rational function that will
199   be evaluated by a method other than 0 (a simple "for" loop).
200   ]]
201[[BOOST_MATH_INT_TABLE_TYPE(RT, IT)]
202   [Many of the coefficients to the polynomials and rational functions
203   used by this library are integers.  Normally these are stored as tables
204   as integers, but if mixed integer / floating point arithmetic is much
205   slower than regular floating point arithmetic then they can be stored
206   as tables of floating point values instead.  If mixed arithmetic is slow
207   then add:
208
209      #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT
210
211   to boost/math/tools/user.hpp, otherwise the default of:
212
213      #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT
214
215   Set in boost/math/config.hpp is fine, and may well result in smaller
216   code.
217   ]]
218]
219
220The values to which `BOOST_MATH_POLY_METHOD` and `BOOST_MATH_RATIONAL_METHOD`
221may be set are as follows:
222
223[table
224[[Value][Effect]]
225[[0][The polynomial or rational function is evaluated using Horner's
226      method, and a simple for-loop.
227
228      Note that if the order of the polynomial
229      or rational function is a runtime parameter, or the order is
230      greater than the value of `BOOST_MATH_MAX_POLY_ORDER`, then
231      this method is always used, irrespective of the value
232      of `BOOST_MATH_POLY_METHOD` or `BOOST_MATH_RATIONAL_METHOD`.]]
233[[1][The polynomial or rational function is evaluated without
234      the use of a loop, and using Horner's method.  This only occurs
235      if the order of the polynomial is known at compile time and is less
236      than or equal to `BOOST_MATH_MAX_POLY_ORDER`. ]]
237[[2][The polynomial or rational function is evaluated without
238      the use of a loop, and using a second order Horner's method.
239      In theory this permits two operations to occur in parallel
240      for polynomials, and four in parallel for rational functions.
241      This only occurs
242      if the order of the polynomial is known at compile time and is less
243      than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
244[[3][The polynomial or rational function is evaluated without
245      the use of a loop, and using a second order Horner's method.
246      In theory this permits two operations to occur in parallel
247      for polynomials, and four in parallel for rational functions.
248      This differs from method "2" in that the code is carefully ordered
249      to make the parallelisation more obvious to the compiler: rather than
250      relying on the compiler's optimiser to spot the parallelisation
251      opportunities.
252      This only occurs
253      if the order of the polynomial is known at compile time and is less
254      than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
255]
256
257The performance test suite generates a report for your particular compiler showing which method is likely to work best,
258the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux).  There's not much to choose between
259the various methods, but generally loop-unrolled methods perform better.  Interestingly, ordering the code
260to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below).
261
262[table_Polynomial_Method_Comparison_with_Microsoft_Visual_C_version_14_2_on_Windows_x64]
263
264[table_Rational_Method_Comparison_with_Microsoft_Visual_C_version_14_2_on_Windows_x64]
265
266[table_Polynomial_Method_Comparison_with_GNU_C_version_9_2_1_20191008_on_linux]
267
268[table_Rational_Method_Comparison_with_GNU_C_version_9_2_1_20191008_on_linux]
269
270[table_Polynomial_Method_Comparison_with_Clang_version_9_0_0_tags_RELEASE_900_final_on_linux]
271
272[table_Rational_Method_Comparison_with_Clang_version_9_0_0_tags_RELEASE_900_final_on_linux]
273
274[table_Polynomial_Method_Comparison_with_Intel_C_C_0x_mode_version_1910_on_linux]
275
276[table_Rational_Method_Comparison_with_Intel_C_C_0x_mode_version_1910_on_linux]
277
278[endsect] [/section:tuning Performance Tuning Macros]
279
280[section:comp_compilers Comparing Different Compilers]
281
282By running our performance test suite multiple times, we can compare the effect of different compilers: as
283might be expected, the differences are generally small compared to say disabling internal use of `long double`.
284However, there are still gains to be main, particularly from some of the commercial offerings:
285
286[table_Compiler_Comparison_on_Windows_x64]
287
288[table_Compiler_Comparison_on_linux]
289
290[endsect] [/section:comp_compilers Comparing Different Compilers]
291
292[section:comparisons Comparisons to Other Open Source Libraries]
293
294We've run our performance tests both for our own code, and against other
295open source implementations of the same functions.  The results are
296presented below to give you a rough idea of how they all compare.
297In order to give a more-or-less level playing field our test data
298was screened against all the libraries being tested, and any
299unsupported domains removed, likewise for any test cases that gave large errors
300or unexpected non-finite values.
301
302[caution
303You should exercise extreme caution when interpreting
304these results, relative performance may vary by platform, by compiler options settings,
305the tests use data that gives good code coverage of /our/ code, but which may skew the
306results towards the corner cases.  Finally, remember that different
307libraries make different choices with regard to performance verses
308numerical stability.
309]
310
311The first results compare standard library functions to Boost equivalents with MSVC-14.0:
312
313[table_Library_Comparison_with_Microsoft_Visual_C_version_14_2_on_Windows_x64]
314
315On Linux with GCC, we can also compare to the TR1 functions, and to GSL and RMath:
316
317[table_Library_Comparison_with_GNU_C_version_9_2_1_20191008_on_linux]
318
319And finally we can compare the statistical distributions to GSL, RMath and DCDFLIB:
320
321[table_Distribution_performance_comparison_with_GNU_C_version_9_2_1_20191008_on_linux]
322
323[endsect] [/section:comparisons Comparisons to Other Open Source Libraries]
324
325[section:perf_test_app The Performance Test Applications]
326
327Under ['boost-path]\/libs\/math\/reporting\/performance you will find
328some reasonable comprehensive performance test applications for this library.
329
330In order to generate the tables you will have seen in this documentation (or others
331for your specific compiler) you need to invoke `bjam` in this directory, using a C++11
332capable compiler.  Note that
333results extend/overwrite whatever is already present in
334['boost-path]\/libs\/math\/reporting\/performance\/doc\/performance_tables.qbk,
335you may want to delete this file before you begin so as to make a fresh start for
336your particular system.
337
338The programs produce results in Boost's Quickbook format which is not terribly
339human readable.  If you configure your user-config.jam to be able to build Docbook
340documentation, then you will also get a full summary of all the data in HTML format
341in ['boost-path]\/libs\/math\/reporting\/performance\/html\/index.html.  Assuming
342you're on a 'nix-like platform the procedure to do this is to first install the
343`xsltproc`, `Docbook DTD`, and `Bookbook XSL` packages.  Then:
344
345* Copy ['boost-path]\/tools\/build\/example\/user-config.jam to your home directory.
346* Add `using xsltproc ;` to the end of the file (note the space surrounding each token, including the final ";", this is important!)
347This assumes that `xsltproc` is in your path.
348* Add `using boostbook : path-to-xsl-stylesheets : path-to-dtd ;` to the end of the file.  The `path-to-dtd` should point
349to version 4.2.x of the Docbook DTD, while `path-to-xsl-stylesheets` should point to the folder containing the latest XSLT stylesheets.
350Both paths should use all forward slashes even on Windows.
351
352At this point you should be able to run the tests and generate the HTML summary, if GSL, RMath or libstdc++ are
353present in the compilers path they will be automatically tested.  For DCDFLIB you will need to place the C
354source in ['boost-path]\/libs\/math\/reporting\/performance\/third_party\/dcdflib.
355
356If you want to compare multiple compilers, or multiple options for one compiler, then you will
357need to invoke `bjam` multiple times, once for each compiler.  Note that in order to test
358multiple configurations of the same compiler, each has to be given a unique name in the test
359program, otherwise they all edit the same table cells.  Suppose you want to test GCC with
360and without the -ffast-math option, in this case bjam would be invoked first as:
361
362   bjam toolset=gcc -a cxxflags=-std=gnu++11
363
364Which would run the tests using default optimization options (-O3), we can then run again
365using -ffast-math:
366
367   bjam toolset=gcc -a cxxflags='-std=gnu++11 -ffast-math' define=COMPILER_NAME='"GCC with -ffast-math"'
368
369In the command line above, the -a flag forces a full rebuild, and the preprocessor define COMPILER_NAME needs to be set
370to a string literal describing the compiler configuration, hence the double quotes - one for the command line, one for the
371compiler.
372
373[endsect] [/section:perf_test_app The Performance Test Applications]
374
375[endmathpart] [/mathpart perf Performance]
376
377[/
378  Copyright 2006 John Maddock and Paul A. Bristow.
379  Distributed under the Boost Software License, Version 1.0.
380  (See accompanying file LICENSE_1_0.txt or copy at
381  http://www.boost.org/LICENSE_1_0.txt).
382]
383
384
385