1[/ 2Copyright (c) 2019 Nick Thompson 3Use, modification and distribution are subject to the 4Boost Software License, Version 1.0. (See accompanying file 5LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 6] 7 8[section:empirical_cdf Empirical Cumulative Distribution Function] 9 10[heading Synopsis] 11 12``` 13#include <boost/math/distributions/empirical_cumulative_distribution_function.hpp> 14 15namespace boost{ namespace math{ 16 17template <class RandomAccessContainer> 18class empirical_cumulative_distribution_function 19{ 20public: 21 using Real = typename RandomAccessContainer::value_type; 22 empirical_cumulative_distribution_function(RandomAccessContainer && v, bool sorted = false); 23 24 auto operator()(Real t) const; 25 26 RandomAccessContainer&& return_data(); 27}; 28 29}} 30``` 31 32[heading Empirical Cumulative Distribution Function] 33 34The empirical cumulative distribution function is a step function constructed from observed data which converges to the true cumulative distribution function in the limit of infinite data. 35This function is a basic building block of hypothesis testing workflows that attempt to answer the question "does my data come from a given distribution?" 36These tests require computing quadratures over some function of the empirical CDF and the supposed CDF to create a distance measurement, and hence it is occasionally useful to construct a continuous callable from the data. 37 38An example usage is demonstrated below: 39 40``` 41#include <vector> 42#include <random> 43#include <boost/math/distributions/empirical_cumulative_distribution_function.hpp> 44using boost::math::empirical_cumulative_distribution_function; 45std::random_device rd; 46std::mt19937 gen{rd()}; 47std::normal_distribution<double> dis(0, 1); 48size_t n = 128; 49std::vector<double> v(n); 50for (size_t i = 0; i < n; ++i) { 51 v[i] = dis(gen); 52} 53 54auto ecdf = empirical_cumulative_distribution_function(std::move(v)); 55std::cout << "ecdf(0.0) = " << ecdf(0.0) << "\n"; 56// should print approximately 0.5 . . . 57``` 58 59The empirical distribution function requires sorted data. 60By default, the constructor sorts it for you at O(Nlog(N)) cost. 61 62If your data is already sorted, you can specify this and the constructor simply moves your data into the class: 63 64``` 65std::sort(v.begin(), v.end()); 66auto ecdf = empirical_cumulative_distribution_function(std::move(v), /* already sorted = */ true); 67``` 68 69If you want your data back after being done with the object, use 70 71``` 72v = ecdf.return_data(); 73``` 74 75This operation invalidates `ecdf`; it can no longer be used. 76 77The call operator complexity is O(log(N)), as it requires a call to `std::upper_bound`. 78 79Works with both integer and floating point types. 80If the input data consists of integers, the output of the call operator is a double. Requires C++17. 81 82[$../graphs/empiricial_cumulative_distribution_gauss.svg] 83 84[$../graphs/empiricial_cumulative_distribution_uniform.svg] 85 86 87[heading Performance] 88 89``` 90------------------------------------------------------ 91Benchmark Time 92------------------------------------------------------ 93ECDFConstructorSorted<double>/8 4.52 ns 94ECDFConstructorSorted<double>/16 5.20 ns 95ECDFConstructorSorted<double>/32 5.22 ns 96ECDFConstructorSorted<double>/64 7.37 ns 97ECDFConstructorSorted<double>/128 7.16 ns 98ECDFConstructorSorted<double>/256 8.97 ns 99ECDFConstructorSorted<double>/512 8.44 ns 100ECDFConstructorSorted<double>/1024 9.07 ns 101ECDFConstructorSorted<double>/2048 11.4 ns 102ECDFConstructorSorted<double>/4096 12.6 ns 103ECDFConstructorSorted<double>/8192 11.4 ns 104ECDFConstructorSorted<double>/16384 16.0 ns 105ECDFConstructorSorted<double>/32768 17.0 ns 106ECDFConstructorSorted<double>/65536 19.5 ns 107ECDFConstructorSorted<double>/131072 15.8 ns 108ECDFConstructorSorted<double>/262144 17.9 ns 109ECDFConstructorSorted<double>/524288 26.7 ns 110ECDFConstructorSorted<double>/1048576 29.5 ns 111ECDFConstructorSorted<double>/2097152 31.8 ns 112ECDFConstructorSorted<double>/4194304 32.8 ns 113ECDFConstructorSorted<double>/8388608 35.4 ns 114ECDFConstructorSorted<double>/16777216 30.4 ns 115ECDFConstructorSorted<double>_BigO 1.27 lgN 116ECDFConstructorSorted<double>_RMS 20 % 117ECDFConstructorUnsorted<double>/8 155 ns 118ECDFConstructorUnsorted<double>/64 2095 ns 119ECDFConstructorUnsorted<double>/512 22212 ns 120ECDFConstructorUnsorted<double>/4096 220821 ns 121ECDFConstructorUnsorted<double>/32768 1996380 ns 122ECDFConstructorUnsorted<double>/262144 18916039 ns 123ECDFConstructorUnsorted<double>/2097152 194250013 ns 124ECDFConstructorUnsorted<double>/16777216 2281469424 ns 125ECDFConstructorUnsorted<double>_BigO 5.65 NlgN 126ECDFConstructorUnsorted<double>_RMS 6 % 127Shuffle<double>/8 82.4 ns 128Shuffle<double>/64 731 ns 129Shuffle<double>/512 5876 ns 130Shuffle<double>/4096 46864 ns 131Shuffle<double>/32768 385265 ns 132Shuffle<double>/262144 4663866 ns 133Shuffle<double>/2097152 54686332 ns 134Shuffle<double>/16777216 875309099 ns 135Shuffle<double>_BigO 2.16 NlgN 136Shuffle<double>_RMS 12 % 137ECDFEvaluation<double>/8 48.6 ns 138ECDFEvaluation<double>/64 61.3 ns 139ECDFEvaluation<double>/512 85.1 ns 140ECDFEvaluation<double>/4096 105 ns 141ECDFEvaluation<double>/32768 131 ns 142ECDFEvaluation<double>/262144 196 ns 143ECDFEvaluation<double>/2097152 391 ns 144ECDFEvaluation<double>/16777216 715 ns 145ECDFEvaluation<double>_BigO 18.19 lgN 146ECDFEvaluation<double>_RMS 60 % 147``` 148 149The call to the unsorted constructor is in fact a little faster than indicated, as the data must be shuffled after being sorted in the benchmark. 150This is itself a fairly expensive operation. 151 152[endsect] 153[/section:empirical_cdf] 154