1[/ 2Copyright (c) 2019 Nick Thompson 3Use, modification and distribution are subject to the 4Boost Software License, Version 1.0. (See accompanying file 5LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 6] 7 8[section:anderson_darling The Anderson-Darling Test] 9 10[heading Synopsis] 11 12``` 13#include <boost/math/statistics/anderson_darling.hpp> 14 15namespace boost{ namespace math { namespace { statistics { 16 17template<class RandomAccessContainer> 18auto anderson_darling_normality_statistic(RandomAccessContainer const & v, 19 typename RandomAccessContainer::value_type mu = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN(), 20 typename RandomAccessContainer::value_type sd = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN()); 21 22}}} 23``` 24 25[heading Background] 26 27The Anderson-Darling test for normality asks if a given sequence of numbers are drawn from a normal distribution by computing an integral over the empirical cumulative distribution function. 28The test statistic /A/[super 2] is given by 29 30[$../graphs/anderson_darling_definition.svg] 31 32where /F/[sub /n/] is the empirical cumulative distribution and /F/ is the CDF of the normal distribution. 33 34The value returned by the routine is /A/[super 2]. 35 36If /A/[super 2]\/n converges to zero as /n/ goes to infinity, then the hypothesis that the data is normally distributed is supported by the test. 37 38If /A/[super 2]\/n converges to a finite positive value as /n/ goes to infinity, then the hypothesis is not supported by the test. 39 40An example usage is demonstrated below: 41 42``` 43#include <vector> 44#include <random> 45#include <iostream> 46#include <boost/math/statistics/anderson_darling.hpp> 47using boost::math::statistics::anderson_darling_normality_statistic; 48std::random_device rd; 49std::normal_distribution<double> dis(0, 1); 50std::vector<double> v(8192); 51for (auto & x : v) { x = dis(rd); } 52std::sort(v.begin(), v.end()); 53double presumed_mean = 0; 54double presumed_standard_deviation = 0; 55double Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation); 56std::cout << "A^2/n = " << Asq/v.size() << "\n"; 575.39e-05 // should be small . . . 58// Now use an incorrect hypothesis: 59presumed_mean = 4; 60Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation); 61std::cout << "A^2/n = " << Asq/v.size() << "\n"; 627.41 // should be somewhat large . . . 63``` 64 65The Anderson-Darling normality requires sorted data. 66If the data are not sorted an exception is thrown. 67 68If you simply wish to know whether or not data is normally distributed, and not whether it is normally distributed with a presumed mean and variance, 69then you can call the function without the final two arguments, and the mean and variance will be estimated from the data themselves: 70 71``` 72double Asq = anderson_darling_normality_statistic(v); 73``` 74 75The following graph demonstrates the convergence of the test statistic. 76Each data point represents a vector of length /n/ which is filled with normally distributed data. 77The test statistic is computed over this vector, divided by /n/, and passed to the natural logarithm. 78This exhibits the (admittedly slow) convergence of the integral to zero when the hypothesis is true. 79 80[$../graphs/anderson_darling_simulation.svg] 81 82 83[heading Performance] 84 85``` 86--------------------------------------------------------------- 87Benchmark Time 88--------------------------------------------------------------- 89AndersonDarlingNormalityTest<float>/8 224 ns bytes_per_second=136.509M/s 90AndersonDarlingNormalityTest<float>/16 435 ns bytes_per_second=140.254M/s 91AndersonDarlingNormalityTest<float>/32 898 ns bytes_per_second=135.995M/s 92AndersonDarlingNormalityTest<float>/64 1773 ns bytes_per_second=137.675M/s 93AndersonDarlingNormalityTest<float>/128 3455 ns bytes_per_second=141.338M/s 94AndersonDarlingNormalityTest<float>/256 7001 ns bytes_per_second=139.488M/s 95AndersonDarlingNormalityTest<float>/512 13996 ns bytes_per_second=139.551M/s 96AndersonDarlingNormalityTest<float>/1024 28129 ns bytes_per_second=138.868M/s 97AndersonDarlingNormalityTest<float>/2048 55723 ns bytes_per_second=140.206M/s 98AndersonDarlingNormalityTest<float>/4096 112008 ns bytes_per_second=139.501M/s 99AndersonDarlingNormalityTest<float>/8192 224643 ns bytes_per_second=139.11M/s 100AndersonDarlingNormalityTest<float>/16384 450320 ns bytes_per_second=138.791M/s 101AndersonDarlingNormalityTest<float>/32768 896409 ns bytes_per_second=139.45M/s 102AndersonDarlingNormalityTest<float>/65536 1797800 ns bytes_per_second=139.058M/s 103AndersonDarlingNormalityTest<float>/131072 3604995 ns bytes_per_second=138.698M/s 104AndersonDarlingNormalityTest<float>/262144 7235625 ns bytes_per_second=138.207M/s 105AndersonDarlingNormalityTest<float>/524288 14502815 ns bytes_per_second=137.904M/s 106AndersonDarlingNormalityTest<float>/1048576 29058087 ns bytes_per_second=137.659M/s 107AndersonDarlingNormalityTest<float>/2097152 58470439 ns bytes_per_second=136.824M/s 108AndersonDarlingNormalityTest<float>/4194304 117476365 ns bytes_per_second=136.201M/s 109AndersonDarlingNormalityTest<float>/8388608 239887895 ns bytes_per_second=133.397M/s 110AndersonDarlingNormalityTest<float>/16777216 488787211 ns bytes_per_second=130.94M/s 111AndersonDarlingNormalityTest<float>_BigO 28.96 N 28.96 N 112AndersonDarlingNormalityTest<double>/8 470 ns bytes_per_second=129.733M/s 113AndersonDarlingNormalityTest<double>/16 911 ns bytes_per_second=133.989M/s 114AndersonDarlingNormalityTest<double>/32 1773 ns bytes_per_second=137.723M/s 115AndersonDarlingNormalityTest<double>/64 3368 ns bytes_per_second=144.966M/s 116AndersonDarlingNormalityTest<double>/128 6627 ns bytes_per_second=147.357M/s 117AndersonDarlingNormalityTest<double>/256 12458 ns bytes_per_second=156.777M/s 118AndersonDarlingNormalityTest<double>/512 23060 ns bytes_per_second=169.395M/s 119AndersonDarlingNormalityTest<double>/1024 44529 ns bytes_per_second=175.45M/s 120AndersonDarlingNormalityTest<double>/2048 88735 ns bytes_per_second=176.087M/s 121AndersonDarlingNormalityTest<double>/4096 175583 ns bytes_per_second=177.978M/s 122AndersonDarlingNormalityTest<double>/8192 348042 ns bytes_per_second=179.577M/s 123AndersonDarlingNormalityTest<double>/16384 701439 ns bytes_per_second=178.206M/s 124AndersonDarlingNormalityTest<double>/32768 1394597 ns bytes_per_second=179.262M/s 125AndersonDarlingNormalityTest<double>/65536 2777943 ns bytes_per_second=179.994M/s 126AndersonDarlingNormalityTest<double>/131072 5571455 ns bytes_per_second=179.487M/s 127AndersonDarlingNormalityTest<double>/262144 11161456 ns bytes_per_second=179.193M/s 128AndersonDarlingNormalityTest<double>/524288 22048950 ns bytes_per_second=181.417M/s 129AndersonDarlingNormalityTest<double>/1048576 44094409 ns bytes_per_second=181.429M/s 130AndersonDarlingNormalityTest<double>/2097152 88300185 ns bytes_per_second=181.199M/s 131AndersonDarlingNormalityTest<double>/4194304 176140378 ns bytes_per_second=181.678M/s 132AndersonDarlingNormalityTest<double>/8388608 352102955 ns bytes_per_second=181.769M/s 133AndersonDarlingNormalityTest<double>/16777216 706160246 ns bytes_per_second=181.267M/s 134AndersonDarlingNormalityTest<double>_BigO 42.06 N 135``` 136 137[heading Caveats] 138 139Some authors, including [@https://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm NIST], give the following definition of the Anderson-Darling test statistic: 140 141[$../graphs/alternative_anderson_darling_definition.svg] 142 143This is an approximation to the quadrature sum we use as our definition. 144Boost.Math /does not compute this quantity/. 145(However, with a sufficiently large amount of data the two definitions seem to agree to two digits, so the importance of making a clear distinction between the two is unclear.) 146Our computation of the Anderson-Darling test statistic agrees with Mathematica. 147 148[endsect] 149[/section:anderson_darling] 150