1[/ 2Copyright (c) 2019 Nick Thompson 3Use, modification and distribution are subject to the 4Boost Software License, Version 1.0. (See accompanying file 5LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 6] 7 8[section:linear_regression Linear Regression] 9 10[heading Synopsis] 11 12``` 13#include <boost/math/statistics/linear_regression.hpp> 14 15namespace boost::math::statistics { 16 17template<typename RandomAccessContainer> 18std::pair<Real, Real> simple_ordinary_least_squares(RandomAccessContainer const & x, 19 RandomAccessContainer const & y); 20 21template<typename RandomAccessContainer> 22std::tuple<Real, Real, Real> simple_ordinary_least_squares_with_R_squared(RandomAccessContainer const & x, 23 RandomAccessContainer const & y); 24 25}}} 26``` 27 28[heading Background] 29 30A simple ordinary least squares finds the numbers /c/[sub 0] and /c/[sub 1] which minimizes the merit function 31 32[$../graphs/simple_ordinary_least_squares_merit.svg] 33 34The predictive model generated from the minima of this functional is /f/(/x/) = /c/[sub 0] + /c/[sub 1] /x/. 35 36It turns out that numerically, computing the numbers /c/[sub 0] and /c/[sub 1] is not /quite/ trivial. 37See [@https://www.johndcook.com/blog/2008/10/20/comparing-two-ways-to-fit-a-line-to-data/ here] for an explanation of some ways linear regression can go wrong. 38A better method of computing the model parameters uses one-pass, numerically stable methods to compute means, variances, and covariances, and then assembles the parameters from these. 39 40An example usage of the simple linear regression is given below: 41 42``` 43#include <vector> 44#include <iostream> 45#include <boost/math/statistics/linear_regression.hpp> 46 47int main() { 48 using boost::math::statistics::simple_ordinary_least_squares; 49 std::vector<double> x{1, 3, 7, 12}; 50 std::vector<double> y{8,13, 26, 35}; 51 52 auto [c0, c1] = simple_ordinary_least_squares(x, y); 53 std::cout << "f(x) = " << c0 << " + " << c1 << "*x" << "\n"; 54 std::cout << "f(2) = " << c0 + c1*2 << "\n"; 55} 56// Output: 57f(x) = 6.0742 + 2.50883*x 58f(2) = 11.0919 59``` 60 61If, in addition, you wish to assess how appropriate linear regression is for your data, you can calculate /R/[super 2] as well, via 62 63``` 64auto [c0, c1, R2] = simple_ordinary_least_squares_with_R_squared(x, y); 65``` 66 67It seems a number of definitions exist for /R/[super 2]; we use 68 69[$../graphs/r_squared_definition.svg] 70 71The fit is good if /R/[super 2] is close to 1. 72 73 74[heading Performance] 75 76There are two cases: When you want to compute /R/[super 2], and when you don't want to simultaneously compute /R/[super 2], although the cost of computing /R/[super 2] is not high: 77 78``` 79----------------------------------------------------------------------------------------------------------------- 80Benchmark Time Bytes/second 81----------------------------------------------------------------------------------------------------------------- 82BMSimpleOrdinaryLeastSquares<float>/8 51.9 ns 588.476M/s 83BMSimpleOrdinaryLeastSquares<float>/16 112 ns 546.087M/s 84BMSimpleOrdinaryLeastSquares<float>/32 277 ns 440.823M/s 85BMSimpleOrdinaryLeastSquares<float>/64 608 ns 401.659M/s 86BMSimpleOrdinaryLeastSquares<float>/128 1276 ns 382.622M/s 87BMSimpleOrdinaryLeastSquares<float>/256 2606 ns 374.748M/s 88BMSimpleOrdinaryLeastSquares<float>/512 5266 ns 370.868M/s 89BMSimpleOrdinaryLeastSquares<float>/1024 10601 ns 368.466M/s 90BMSimpleOrdinaryLeastSquares<float>/2048 21243 ns 367.775M/s 91BMSimpleOrdinaryLeastSquares<float>/4096 42502 ns 367.631M/s 92BMSimpleOrdinaryLeastSquares<float>/8192 85239 ns 366.618M/s 93BMSimpleOrdinaryLeastSquares<float>/16384 169736 ns 368.22M/s 94BMSimpleOrdinaryLeastSquares<float>/32768 340220 ns 367.409M/s 95BMSimpleOrdinaryLeastSquares<float>/65536 678907 ns 368.247M/s 96BMSimpleOrdinaryLeastSquares<float>/131072 1357145 ns 368.422M/s 97BMSimpleOrdinaryLeastSquares<float>/262144 2720207 ns 367.635M/s 98BMSimpleOrdinaryLeastSquares<float>/524288 5430141 ns 368.332M/s 99BMSimpleOrdinaryLeastSquares<float>/1048576 10896708 ns 367.095M/s 100BMSimpleOrdinaryLeastSquares<float>/2097152 21797935 ns 367.047M/s 101BMSimpleOrdinaryLeastSquares<float>/4194304 43723059 ns 365.944M/s 102BMSimpleOrdinaryLeastSquares<float>/8388608 87229180 ns 366.864M/s 103BMSimpleOrdinaryLeastSquares<float>/16777216 174988864 ns 365.74M/s 104BMSimpleOrdinaryLeastSquares<float>_BigO 10.42 N 105BMSimpleOrdinaryLeastSquares<double>/8 52.4 ns 1.13779G/s 106BMSimpleOrdinaryLeastSquares<double>/16 122 ns 1002.14M/s 107BMSimpleOrdinaryLeastSquares<double>/32 307 ns 795.253M/s 108BMSimpleOrdinaryLeastSquares<double>/64 685 ns 712.628M/s 109BMSimpleOrdinaryLeastSquares<double>/128 1445 ns 675.805M/s 110BMSimpleOrdinaryLeastSquares<double>/256 2966 ns 658.488M/s 111BMSimpleOrdinaryLeastSquares<double>/512 6062 ns 644.35M/s 112BMSimpleOrdinaryLeastSquares<double>/1024 12166 ns 642.173M/s 113BMSimpleOrdinaryLeastSquares<double>/2048 24336 ns 642.064M/s 114BMSimpleOrdinaryLeastSquares<double>/4096 48862 ns 639.567M/s 115BMSimpleOrdinaryLeastSquares<double>/8192 98008 ns 637.708M/s 116BMSimpleOrdinaryLeastSquares<double>/16384 196394 ns 636.481M/s 117BMSimpleOrdinaryLeastSquares<double>/32768 392810 ns 636.434M/s 118BMSimpleOrdinaryLeastSquares<double>/65536 783903 ns 637.859M/s 119BMSimpleOrdinaryLeastSquares<double>/131072 1556741 ns 642.378M/s 120BMSimpleOrdinaryLeastSquares<double>/262144 3121184 ns 640.792M/s 121BMSimpleOrdinaryLeastSquares<double>/524288 6265681 ns 638.404M/s 122BMSimpleOrdinaryLeastSquares<double>/1048576 12421627 ns 644.076M/s 123BMSimpleOrdinaryLeastSquares<double>/2097152 24907611 ns 642.417M/s 124BMSimpleOrdinaryLeastSquares<double>/4194304 49773317 ns 642.934M/s 125BMSimpleOrdinaryLeastSquares<double>/8388608 100034750 ns 639.79M/s 126BMSimpleOrdinaryLeastSquares<double>/16777216 199477349 ns 641.685M/s 127BMSimpleOrdinaryLeastSquares<double>_BigO 11.90 N 128BMSimpleOrdinaryLeastSquares<double>_RMS 0 % 129BMSimpleOrdinaryLeastSquaresWRSquared<float>/8 69.2 ns 441.314M/s 130BMSimpleOrdinaryLeastSquaresWRSquared<float>/16 147 ns 415.939M/s 131BMSimpleOrdinaryLeastSquaresWRSquared<float>/32 356 ns 342.815M/s 132BMSimpleOrdinaryLeastSquaresWRSquared<float>/64 736 ns 331.749M/s 133BMSimpleOrdinaryLeastSquaresWRSquared<float>/128 1494 ns 326.765M/s 134BMSimpleOrdinaryLeastSquaresWRSquared<float>/256 3161 ns 308.909M/s 135BMSimpleOrdinaryLeastSquaresWRSquared<float>/512 6343 ns 307.94M/s 136BMSimpleOrdinaryLeastSquaresWRSquared<float>/1024 12707 ns 307.409M/s 137BMSimpleOrdinaryLeastSquaresWRSquared<float>/2048 25390 ns 307.699M/s 138BMSimpleOrdinaryLeastSquaresWRSquared<float>/4096 50820 ns 307.455M/s 139BMSimpleOrdinaryLeastSquaresWRSquared<float>/8192 101738 ns 307.161M/s 140BMSimpleOrdinaryLeastSquaresWRSquared<float>/16384 203033 ns 307.835M/s 141BMSimpleOrdinaryLeastSquaresWRSquared<float>/32768 403366 ns 309.894M/s 142BMSimpleOrdinaryLeastSquaresWRSquared<float>/65536 767080 ns 325.911M/s 143BMSimpleOrdinaryLeastSquaresWRSquared<float>/131072 1515010 ns 330.034M/s 144BMSimpleOrdinaryLeastSquaresWRSquared<float>/262144 2965539 ns 337.21M/s 145BMSimpleOrdinaryLeastSquaresWRSquared<float>/524288 5774329 ns 346.372M/s 146BMSimpleOrdinaryLeastSquaresWRSquared<float>/1048576 11384267 ns 351.371M/s 147BMSimpleOrdinaryLeastSquaresWRSquared<float>/2097152 22899097 ns 349.406M/s 148BMSimpleOrdinaryLeastSquaresWRSquared<float>/4194304 45923903 ns 348.423M/s 149BMSimpleOrdinaryLeastSquaresWRSquared<float>/8388608 92138186 ns 347.306M/s 150BMSimpleOrdinaryLeastSquaresWRSquared<float>/16777216 183263471 ns 349.226M/s 151BMSimpleOrdinaryLeastSquaresWRSquared<float>_BigO 10.94 N 152BMSimpleOrdinaryLeastSquaresWRSquared<float>_RMS 1 % 153BMSimpleOrdinaryLeastSquaresWRSquared<double>/8 68.7 ns 887.806M/s 154BMSimpleOrdinaryLeastSquaresWRSquared<double>/16 166 ns 734.816M/s 155BMSimpleOrdinaryLeastSquaresWRSquared<double>/32 385 ns 633.918M/s 156BMSimpleOrdinaryLeastSquaresWRSquared<double>/64 812 ns 601.394M/s 157BMSimpleOrdinaryLeastSquaresWRSquared<double>/128 1774 ns 550.424M/s 158BMSimpleOrdinaryLeastSquaresWRSquared<double>/256 3554 ns 549.562M/s 159BMSimpleOrdinaryLeastSquaresWRSquared<double>/512 7151 ns 546.25M/s 160BMSimpleOrdinaryLeastSquaresWRSquared<double>/1024 14335 ns 545.006M/s 161BMSimpleOrdinaryLeastSquaresWRSquared<double>/2048 28608 ns 546.163M/s 162BMSimpleOrdinaryLeastSquaresWRSquared<double>/4096 57228 ns 546.067M/s 163BMSimpleOrdinaryLeastSquaresWRSquared<double>/8192 113732 ns 549.537M/s 164BMSimpleOrdinaryLeastSquaresWRSquared<double>/16384 227686 ns 549.004M/s 165BMSimpleOrdinaryLeastSquaresWRSquared<double>/32768 453989 ns 550.668M/s 166BMSimpleOrdinaryLeastSquaresWRSquared<double>/65536 901696 ns 554.517M/s 167BMSimpleOrdinaryLeastSquaresWRSquared<double>/131072 1771910 ns 564.365M/s 168BMSimpleOrdinaryLeastSquaresWRSquared<double>/262144 3430961 ns 582.933M/s 169BMSimpleOrdinaryLeastSquaresWRSquared<double>/524288 6751237 ns 592.511M/s 170BMSimpleOrdinaryLeastSquaresWRSquared<double>/1048576 13544819 ns 590.639M/s 171BMSimpleOrdinaryLeastSquaresWRSquared<double>/2097152 27331142 ns 585.422M/s 172BMSimpleOrdinaryLeastSquaresWRSquared<double>/4194304 54944987 ns 582.425M/s 173BMSimpleOrdinaryLeastSquaresWRSquared<double>/8388608 109574257 ns 584.087M/s 174BMSimpleOrdinaryLeastSquaresWRSquared<double>/16777216 221449209 ns 578.003M/s 175BMSimpleOrdinaryLeastSquaresWRSquared<double>_BigO 13.17 N 176BMSimpleOrdinaryLeastSquaresWRSquared<double>_RMS 1 % 177``` 178 179 180 181 182[endsect] 183[/section:linear_regression] 184