• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/
2Copyright (c) 2019 Nick Thompson
3Use, modification and distribution are subject to the
4Boost Software License, Version 1.0. (See accompanying file
5LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6]
7
8[section:linear_regression Linear Regression]
9
10[heading Synopsis]
11
12```
13#include <boost/math/statistics/linear_regression.hpp>
14
15namespace boost::math::statistics {
16
17template<typename RandomAccessContainer>
18std::pair<Real, Real> simple_ordinary_least_squares(RandomAccessContainer const & x,
19                                                    RandomAccessContainer const & y);
20
21template<typename RandomAccessContainer>
22std::tuple<Real, Real, Real> simple_ordinary_least_squares_with_R_squared(RandomAccessContainer const & x,
23                                                                          RandomAccessContainer const & y);
24
25}}}
26```
27
28[heading Background]
29
30A simple ordinary least squares finds the numbers /c/[sub 0] and /c/[sub 1] which minimizes the merit function
31
32[$../graphs/simple_ordinary_least_squares_merit.svg]
33
34The predictive model generated from the minima of this functional is /f/(/x/) = /c/[sub 0] + /c/[sub 1] /x/.
35
36It turns out that numerically, computing the numbers /c/[sub 0] and /c/[sub 1] is not /quite/ trivial.
37See [@https://www.johndcook.com/blog/2008/10/20/comparing-two-ways-to-fit-a-line-to-data/ here] for an explanation of some ways linear regression can go wrong.
38A better method of computing the model parameters uses one-pass, numerically stable methods to compute means, variances, and covariances, and then assembles the parameters from these.
39
40An example usage of the simple linear regression is given below:
41
42```
43#include <vector>
44#include <iostream>
45#include <boost/math/statistics/linear_regression.hpp>
46
47int main() {
48    using boost::math::statistics::simple_ordinary_least_squares;
49    std::vector<double> x{1, 3, 7, 12};
50    std::vector<double> y{8,13, 26, 35};
51
52    auto [c0, c1] = simple_ordinary_least_squares(x, y);
53    std::cout << "f(x) = " << c0 << " + " << c1 << "*x" << "\n";
54    std::cout << "f(2) = " << c0 + c1*2 << "\n";
55}
56// Output:
57f(x) = 6.0742 + 2.50883*x
58f(2) = 11.0919
59```
60
61If, in addition, you wish to assess how appropriate linear regression is for your data, you can calculate /R/[super 2] as well, via
62
63```
64auto [c0, c1, R2] = simple_ordinary_least_squares_with_R_squared(x, y);
65```
66
67It seems a number of definitions exist for /R/[super 2]; we use
68
69[$../graphs/r_squared_definition.svg]
70
71The fit is good if /R/[super 2] is close to 1.
72
73
74[heading Performance]
75
76There are two cases: When you want to compute /R/[super 2], and when you don't want to simultaneously compute /R/[super 2], although the cost of computing /R/[super 2] is not high:
77
78```
79-----------------------------------------------------------------------------------------------------------------
80Benchmark                                                       Time  Bytes/second
81-----------------------------------------------------------------------------------------------------------------
82BMSimpleOrdinaryLeastSquares<float>/8                        51.9 ns  588.476M/s
83BMSimpleOrdinaryLeastSquares<float>/16                        112 ns  546.087M/s
84BMSimpleOrdinaryLeastSquares<float>/32                        277 ns  440.823M/s
85BMSimpleOrdinaryLeastSquares<float>/64                        608 ns  401.659M/s
86BMSimpleOrdinaryLeastSquares<float>/128                      1276 ns  382.622M/s
87BMSimpleOrdinaryLeastSquares<float>/256                      2606 ns  374.748M/s
88BMSimpleOrdinaryLeastSquares<float>/512                      5266 ns  370.868M/s
89BMSimpleOrdinaryLeastSquares<float>/1024                    10601 ns  368.466M/s
90BMSimpleOrdinaryLeastSquares<float>/2048                    21243 ns  367.775M/s
91BMSimpleOrdinaryLeastSquares<float>/4096                    42502 ns  367.631M/s
92BMSimpleOrdinaryLeastSquares<float>/8192                    85239 ns  366.618M/s
93BMSimpleOrdinaryLeastSquares<float>/16384                  169736 ns  368.22M/s
94BMSimpleOrdinaryLeastSquares<float>/32768                  340220 ns  367.409M/s
95BMSimpleOrdinaryLeastSquares<float>/65536                  678907 ns  368.247M/s
96BMSimpleOrdinaryLeastSquares<float>/131072                1357145 ns  368.422M/s
97BMSimpleOrdinaryLeastSquares<float>/262144                2720207 ns  367.635M/s
98BMSimpleOrdinaryLeastSquares<float>/524288                5430141 ns  368.332M/s
99BMSimpleOrdinaryLeastSquares<float>/1048576              10896708 ns  367.095M/s
100BMSimpleOrdinaryLeastSquares<float>/2097152              21797935 ns  367.047M/s
101BMSimpleOrdinaryLeastSquares<float>/4194304              43723059 ns  365.944M/s
102BMSimpleOrdinaryLeastSquares<float>/8388608              87229180 ns  366.864M/s
103BMSimpleOrdinaryLeastSquares<float>/16777216            174988864 ns  365.74M/s
104BMSimpleOrdinaryLeastSquares<float>_BigO                    10.42 N
105BMSimpleOrdinaryLeastSquares<double>/8                       52.4 ns 1.13779G/s
106BMSimpleOrdinaryLeastSquares<double>/16                       122 ns 1002.14M/s
107BMSimpleOrdinaryLeastSquares<double>/32                       307 ns 795.253M/s
108BMSimpleOrdinaryLeastSquares<double>/64                       685 ns 712.628M/s
109BMSimpleOrdinaryLeastSquares<double>/128                     1445 ns 675.805M/s
110BMSimpleOrdinaryLeastSquares<double>/256                     2966 ns 658.488M/s
111BMSimpleOrdinaryLeastSquares<double>/512                     6062 ns 644.35M/s
112BMSimpleOrdinaryLeastSquares<double>/1024                   12166 ns 642.173M/s
113BMSimpleOrdinaryLeastSquares<double>/2048                   24336 ns 642.064M/s
114BMSimpleOrdinaryLeastSquares<double>/4096                   48862 ns 639.567M/s
115BMSimpleOrdinaryLeastSquares<double>/8192                   98008 ns 637.708M/s
116BMSimpleOrdinaryLeastSquares<double>/16384                 196394 ns 636.481M/s
117BMSimpleOrdinaryLeastSquares<double>/32768                 392810 ns 636.434M/s
118BMSimpleOrdinaryLeastSquares<double>/65536                 783903 ns 637.859M/s
119BMSimpleOrdinaryLeastSquares<double>/131072               1556741 ns 642.378M/s
120BMSimpleOrdinaryLeastSquares<double>/262144               3121184 ns 640.792M/s
121BMSimpleOrdinaryLeastSquares<double>/524288               6265681 ns 638.404M/s
122BMSimpleOrdinaryLeastSquares<double>/1048576             12421627 ns 644.076M/s
123BMSimpleOrdinaryLeastSquares<double>/2097152             24907611 ns 642.417M/s
124BMSimpleOrdinaryLeastSquares<double>/4194304             49773317 ns 642.934M/s
125BMSimpleOrdinaryLeastSquares<double>/8388608            100034750 ns 639.79M/s
126BMSimpleOrdinaryLeastSquares<double>/16777216           199477349 ns 641.685M/s
127BMSimpleOrdinaryLeastSquares<double>_BigO                   11.90 N
128BMSimpleOrdinaryLeastSquares<double>_RMS                        0 %
129BMSimpleOrdinaryLeastSquaresWRSquared<float>/8               69.2 ns 441.314M/s
130BMSimpleOrdinaryLeastSquaresWRSquared<float>/16               147 ns 415.939M/s
131BMSimpleOrdinaryLeastSquaresWRSquared<float>/32               356 ns 342.815M/s
132BMSimpleOrdinaryLeastSquaresWRSquared<float>/64               736 ns 331.749M/s
133BMSimpleOrdinaryLeastSquaresWRSquared<float>/128             1494 ns 326.765M/s
134BMSimpleOrdinaryLeastSquaresWRSquared<float>/256             3161 ns 308.909M/s
135BMSimpleOrdinaryLeastSquaresWRSquared<float>/512             6343 ns 307.94M/s
136BMSimpleOrdinaryLeastSquaresWRSquared<float>/1024           12707 ns 307.409M/s
137BMSimpleOrdinaryLeastSquaresWRSquared<float>/2048           25390 ns 307.699M/s
138BMSimpleOrdinaryLeastSquaresWRSquared<float>/4096           50820 ns 307.455M/s
139BMSimpleOrdinaryLeastSquaresWRSquared<float>/8192          101738 ns 307.161M/s
140BMSimpleOrdinaryLeastSquaresWRSquared<float>/16384         203033 ns 307.835M/s
141BMSimpleOrdinaryLeastSquaresWRSquared<float>/32768         403366 ns 309.894M/s
142BMSimpleOrdinaryLeastSquaresWRSquared<float>/65536         767080 ns 325.911M/s
143BMSimpleOrdinaryLeastSquaresWRSquared<float>/131072       1515010 ns 330.034M/s
144BMSimpleOrdinaryLeastSquaresWRSquared<float>/262144       2965539 ns 337.21M/s
145BMSimpleOrdinaryLeastSquaresWRSquared<float>/524288       5774329 ns 346.372M/s
146BMSimpleOrdinaryLeastSquaresWRSquared<float>/1048576     11384267 ns 351.371M/s
147BMSimpleOrdinaryLeastSquaresWRSquared<float>/2097152     22899097 ns 349.406M/s
148BMSimpleOrdinaryLeastSquaresWRSquared<float>/4194304     45923903 ns 348.423M/s
149BMSimpleOrdinaryLeastSquaresWRSquared<float>/8388608     92138186 ns 347.306M/s
150BMSimpleOrdinaryLeastSquaresWRSquared<float>/16777216   183263471 ns 349.226M/s
151BMSimpleOrdinaryLeastSquaresWRSquared<float>_BigO           10.94 N
152BMSimpleOrdinaryLeastSquaresWRSquared<float>_RMS                1 %
153BMSimpleOrdinaryLeastSquaresWRSquared<double>/8              68.7 ns 887.806M/s
154BMSimpleOrdinaryLeastSquaresWRSquared<double>/16              166 ns 734.816M/s
155BMSimpleOrdinaryLeastSquaresWRSquared<double>/32              385 ns 633.918M/s
156BMSimpleOrdinaryLeastSquaresWRSquared<double>/64              812 ns 601.394M/s
157BMSimpleOrdinaryLeastSquaresWRSquared<double>/128            1774 ns 550.424M/s
158BMSimpleOrdinaryLeastSquaresWRSquared<double>/256            3554 ns 549.562M/s
159BMSimpleOrdinaryLeastSquaresWRSquared<double>/512            7151 ns 546.25M/s
160BMSimpleOrdinaryLeastSquaresWRSquared<double>/1024          14335 ns 545.006M/s
161BMSimpleOrdinaryLeastSquaresWRSquared<double>/2048          28608 ns 546.163M/s
162BMSimpleOrdinaryLeastSquaresWRSquared<double>/4096          57228 ns 546.067M/s
163BMSimpleOrdinaryLeastSquaresWRSquared<double>/8192         113732 ns 549.537M/s
164BMSimpleOrdinaryLeastSquaresWRSquared<double>/16384        227686 ns 549.004M/s
165BMSimpleOrdinaryLeastSquaresWRSquared<double>/32768        453989 ns 550.668M/s
166BMSimpleOrdinaryLeastSquaresWRSquared<double>/65536        901696 ns 554.517M/s
167BMSimpleOrdinaryLeastSquaresWRSquared<double>/131072      1771910 ns 564.365M/s
168BMSimpleOrdinaryLeastSquaresWRSquared<double>/262144      3430961 ns 582.933M/s
169BMSimpleOrdinaryLeastSquaresWRSquared<double>/524288      6751237 ns 592.511M/s
170BMSimpleOrdinaryLeastSquaresWRSquared<double>/1048576    13544819 ns 590.639M/s
171BMSimpleOrdinaryLeastSquaresWRSquared<double>/2097152    27331142 ns 585.422M/s
172BMSimpleOrdinaryLeastSquaresWRSquared<double>/4194304    54944987 ns 582.425M/s
173BMSimpleOrdinaryLeastSquaresWRSquared<double>/8388608   109574257 ns 584.087M/s
174BMSimpleOrdinaryLeastSquaresWRSquared<double>/16777216  221449209 ns 578.003M/s
175BMSimpleOrdinaryLeastSquaresWRSquared<double>_BigO          13.17 N
176BMSimpleOrdinaryLeastSquaresWRSquared<double>_RMS               1 %
177```
178
179
180
181
182[endsect]
183[/section:linear_regression]
184