• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/
2Copyright (c) 2019 Nick Thompson
3Use, modification and distribution are subject to the
4Boost Software License, Version 1.0. (See accompanying file
5LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6]
7
8[section:anderson_darling The Anderson-Darling Test]
9
10[heading Synopsis]
11
12```
13#include <boost/math/statistics/anderson_darling.hpp>
14
15namespace boost{ namespace math { namespace { statistics {
16
17template<class RandomAccessContainer>
18auto anderson_darling_normality_statistic(RandomAccessContainer const & v,
19                                          typename RandomAccessContainer::value_type mu = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN(),
20                                          typename RandomAccessContainer::value_type sd = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN());
21
22}}}
23```
24
25[heading Background]
26
27The Anderson-Darling test for normality asks if a given sequence of numbers are drawn from a normal distribution by computing an integral over the empirical cumulative distribution function.
28The test statistic /A/[super 2] is given by
29
30[$../graphs/anderson_darling_definition.svg]
31
32where /F/[sub /n/] is the empirical cumulative distribution and /F/ is the CDF of the normal distribution.
33
34The value returned by the routine is /A/[super 2].
35
36If /A/[super 2]\/n converges to zero as /n/ goes to infinity, then the hypothesis that the data is normally distributed is supported by the test.
37
38If /A/[super 2]\/n converges to a finite positive value as /n/ goes to infinity, then the hypothesis is not supported by the test.
39
40An example usage is demonstrated below:
41
42```
43#include <vector>
44#include <random>
45#include <iostream>
46#include <boost/math/statistics/anderson_darling.hpp>
47using boost::math::statistics::anderson_darling_normality_statistic;
48std::random_device rd;
49std::normal_distribution<double> dis(0, 1);
50std::vector<double> v(8192);
51for (auto & x : v) { x = dis(rd); }
52std::sort(v.begin(), v.end());
53double presumed_mean = 0;
54double presumed_standard_deviation = 0;
55double Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation);
56std::cout << "A^2/n = " << Asq/v.size() << "\n";
575.39e-05 // should be small . . .
58// Now use an incorrect hypothesis:
59presumed_mean = 4;
60Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation);
61std::cout << "A^2/n = " << Asq/v.size() << "\n";
627.41 // should be somewhat large . . .
63```
64
65The Anderson-Darling normality requires sorted data.
66If the data are not sorted an exception is thrown.
67
68If you simply wish to know whether or not data is normally distributed, and not whether it is normally distributed with a presumed mean and variance,
69then you can call the function without the final two arguments, and the mean and variance will be estimated from the data themselves:
70
71```
72double Asq = anderson_darling_normality_statistic(v);
73```
74
75The following graph demonstrates the convergence of the test statistic.
76Each data point represents a vector of length /n/ which is filled with normally distributed data.
77The test statistic is computed over this vector, divided by /n/, and passed to the natural logarithm.
78This exhibits the (admittedly slow) convergence of the integral to zero when the hypothesis is true.
79
80[$../graphs/anderson_darling_simulation.svg]
81
82
83[heading Performance]
84
85```
86---------------------------------------------------------------
87Benchmark                                              Time
88---------------------------------------------------------------
89AndersonDarlingNormalityTest<float>/8                224 ns    bytes_per_second=136.509M/s
90AndersonDarlingNormalityTest<float>/16               435 ns    bytes_per_second=140.254M/s
91AndersonDarlingNormalityTest<float>/32               898 ns    bytes_per_second=135.995M/s
92AndersonDarlingNormalityTest<float>/64              1773 ns    bytes_per_second=137.675M/s
93AndersonDarlingNormalityTest<float>/128             3455 ns    bytes_per_second=141.338M/s
94AndersonDarlingNormalityTest<float>/256             7001 ns    bytes_per_second=139.488M/s
95AndersonDarlingNormalityTest<float>/512            13996 ns    bytes_per_second=139.551M/s
96AndersonDarlingNormalityTest<float>/1024           28129 ns    bytes_per_second=138.868M/s
97AndersonDarlingNormalityTest<float>/2048           55723 ns    bytes_per_second=140.206M/s
98AndersonDarlingNormalityTest<float>/4096          112008 ns    bytes_per_second=139.501M/s
99AndersonDarlingNormalityTest<float>/8192          224643 ns    bytes_per_second=139.11M/s
100AndersonDarlingNormalityTest<float>/16384         450320 ns    bytes_per_second=138.791M/s
101AndersonDarlingNormalityTest<float>/32768         896409 ns    bytes_per_second=139.45M/s
102AndersonDarlingNormalityTest<float>/65536        1797800 ns    bytes_per_second=139.058M/s
103AndersonDarlingNormalityTest<float>/131072       3604995 ns    bytes_per_second=138.698M/s
104AndersonDarlingNormalityTest<float>/262144       7235625 ns    bytes_per_second=138.207M/s
105AndersonDarlingNormalityTest<float>/524288      14502815 ns    bytes_per_second=137.904M/s
106AndersonDarlingNormalityTest<float>/1048576     29058087 ns    bytes_per_second=137.659M/s
107AndersonDarlingNormalityTest<float>/2097152     58470439 ns    bytes_per_second=136.824M/s
108AndersonDarlingNormalityTest<float>/4194304    117476365 ns    bytes_per_second=136.201M/s
109AndersonDarlingNormalityTest<float>/8388608    239887895 ns    bytes_per_second=133.397M/s
110AndersonDarlingNormalityTest<float>/16777216   488787211 ns    bytes_per_second=130.94M/s
111AndersonDarlingNormalityTest<float>_BigO           28.96 N         28.96 N
112AndersonDarlingNormalityTest<double>/8               470 ns    bytes_per_second=129.733M/s
113AndersonDarlingNormalityTest<double>/16              911 ns    bytes_per_second=133.989M/s
114AndersonDarlingNormalityTest<double>/32             1773 ns    bytes_per_second=137.723M/s
115AndersonDarlingNormalityTest<double>/64             3368 ns    bytes_per_second=144.966M/s
116AndersonDarlingNormalityTest<double>/128            6627 ns    bytes_per_second=147.357M/s
117AndersonDarlingNormalityTest<double>/256           12458 ns    bytes_per_second=156.777M/s
118AndersonDarlingNormalityTest<double>/512           23060 ns    bytes_per_second=169.395M/s
119AndersonDarlingNormalityTest<double>/1024          44529 ns    bytes_per_second=175.45M/s
120AndersonDarlingNormalityTest<double>/2048          88735 ns    bytes_per_second=176.087M/s
121AndersonDarlingNormalityTest<double>/4096         175583 ns    bytes_per_second=177.978M/s
122AndersonDarlingNormalityTest<double>/8192         348042 ns    bytes_per_second=179.577M/s
123AndersonDarlingNormalityTest<double>/16384        701439 ns    bytes_per_second=178.206M/s
124AndersonDarlingNormalityTest<double>/32768       1394597 ns    bytes_per_second=179.262M/s
125AndersonDarlingNormalityTest<double>/65536       2777943 ns    bytes_per_second=179.994M/s
126AndersonDarlingNormalityTest<double>/131072      5571455 ns    bytes_per_second=179.487M/s
127AndersonDarlingNormalityTest<double>/262144     11161456 ns    bytes_per_second=179.193M/s
128AndersonDarlingNormalityTest<double>/524288     22048950 ns    bytes_per_second=181.417M/s
129AndersonDarlingNormalityTest<double>/1048576    44094409 ns    bytes_per_second=181.429M/s
130AndersonDarlingNormalityTest<double>/2097152    88300185 ns    bytes_per_second=181.199M/s
131AndersonDarlingNormalityTest<double>/4194304   176140378 ns    bytes_per_second=181.678M/s
132AndersonDarlingNormalityTest<double>/8388608   352102955 ns    bytes_per_second=181.769M/s
133AndersonDarlingNormalityTest<double>/16777216  706160246 ns    bytes_per_second=181.267M/s
134AndersonDarlingNormalityTest<double>_BigO          42.06 N
135```
136
137[heading Caveats]
138
139Some authors, including [@https://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm NIST], give the following definition of the Anderson-Darling test statistic:
140
141[$../graphs/alternative_anderson_darling_definition.svg]
142
143This is an approximation to the quadrature sum we use as our definition.
144Boost.Math /does not compute this quantity/.
145(However, with a sufficiently large amount of data the two definitions seem to agree to two digits, so the importance of making a clear distinction between the two is unclear.)
146Our computation of the Anderson-Darling test statistic agrees with Mathematica.
147
148[endsect]
149[/section:anderson_darling]
150