• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/
2Copyright (c) 2019 Nick Thompson
3Use, modification and distribution are subject to the
4Boost Software License, Version 1.0. (See accompanying file
5LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6]
7
8[section:empirical_cdf Empirical Cumulative Distribution Function]
9
10[heading Synopsis]
11
12```
13#include <boost/math/distributions/empirical_cumulative_distribution_function.hpp>
14
15namespace boost{ namespace math{
16
17template <class RandomAccessContainer>
18class empirical_cumulative_distribution_function
19{
20public:
21    using Real = typename RandomAccessContainer::value_type;
22    empirical_cumulative_distribution_function(RandomAccessContainer && v, bool sorted = false);
23
24    auto operator()(Real t) const;
25
26    RandomAccessContainer&& return_data();
27};
28
29}}
30```
31
32[heading Empirical Cumulative Distribution Function]
33
34The empirical cumulative distribution function is a step function constructed from observed data which converges to the true cumulative distribution function in the limit of infinite data.
35This function is a basic building block of hypothesis testing workflows that attempt to answer the question "does my data come from a given distribution?"
36These tests require computing quadratures over some function of the empirical CDF and the supposed CDF to create a distance measurement, and hence it is occasionally useful to construct a continuous callable from the data.
37
38An example usage is demonstrated below:
39
40```
41#include <vector>
42#include <random>
43#include <boost/math/distributions/empirical_cumulative_distribution_function.hpp>
44using boost::math::empirical_cumulative_distribution_function;
45std::random_device rd;
46std::mt19937 gen{rd()};
47std::normal_distribution<double> dis(0, 1);
48size_t n = 128;
49std::vector<double> v(n);
50for (size_t i = 0; i < n; ++i) {
51  v[i] = dis(gen);
52}
53
54auto ecdf = empirical_cumulative_distribution_function(std::move(v));
55std::cout << "ecdf(0.0) = " << ecdf(0.0) << "\n";
56// should print approximately 0.5 . . .
57```
58
59The empirical distribution function requires sorted data.
60By default, the constructor sorts it for you at O(Nlog(N)) cost.
61
62If your data is already sorted, you can specify this and the constructor simply moves your data into the class:
63
64```
65std::sort(v.begin(), v.end());
66auto ecdf = empirical_cumulative_distribution_function(std::move(v), /* already sorted = */ true);
67```
68
69If you want your data back after being done with the object, use
70
71```
72v = ecdf.return_data();
73```
74
75This operation invalidates `ecdf`; it can no longer be used.
76
77The call operator complexity is O(log(N)), as it requires a call to `std::upper_bound`.
78
79Works with both integer and floating point types.
80If the input data consists of integers, the output of the call operator is a double. Requires C++17.
81
82[$../graphs/empiricial_cumulative_distribution_gauss.svg]
83
84[$../graphs/empiricial_cumulative_distribution_uniform.svg]
85
86
87[heading Performance]
88
89```
90------------------------------------------------------
91Benchmark                                         Time
92------------------------------------------------------
93ECDFConstructorSorted<double>/8                4.52 ns
94ECDFConstructorSorted<double>/16               5.20 ns
95ECDFConstructorSorted<double>/32               5.22 ns
96ECDFConstructorSorted<double>/64               7.37 ns
97ECDFConstructorSorted<double>/128              7.16 ns
98ECDFConstructorSorted<double>/256              8.97 ns
99ECDFConstructorSorted<double>/512              8.44 ns
100ECDFConstructorSorted<double>/1024             9.07 ns
101ECDFConstructorSorted<double>/2048             11.4 ns
102ECDFConstructorSorted<double>/4096             12.6 ns
103ECDFConstructorSorted<double>/8192             11.4 ns
104ECDFConstructorSorted<double>/16384            16.0 ns
105ECDFConstructorSorted<double>/32768            17.0 ns
106ECDFConstructorSorted<double>/65536            19.5 ns
107ECDFConstructorSorted<double>/131072           15.8 ns
108ECDFConstructorSorted<double>/262144           17.9 ns
109ECDFConstructorSorted<double>/524288           26.7 ns
110ECDFConstructorSorted<double>/1048576          29.5 ns
111ECDFConstructorSorted<double>/2097152          31.8 ns
112ECDFConstructorSorted<double>/4194304          32.8 ns
113ECDFConstructorSorted<double>/8388608          35.4 ns
114ECDFConstructorSorted<double>/16777216         30.4 ns
115ECDFConstructorSorted<double>_BigO             1.27 lgN
116ECDFConstructorSorted<double>_RMS                20 %
117ECDFConstructorUnsorted<double>/8               155 ns
118ECDFConstructorUnsorted<double>/64             2095 ns
119ECDFConstructorUnsorted<double>/512           22212 ns
120ECDFConstructorUnsorted<double>/4096         220821 ns
121ECDFConstructorUnsorted<double>/32768       1996380 ns
122ECDFConstructorUnsorted<double>/262144     18916039 ns
123ECDFConstructorUnsorted<double>/2097152   194250013 ns
124ECDFConstructorUnsorted<double>/16777216 2281469424 ns
125ECDFConstructorUnsorted<double>_BigO           5.65 NlgN
126ECDFConstructorUnsorted<double>_RMS               6 %
127Shuffle<double>/8                              82.4 ns
128Shuffle<double>/64                              731 ns
129Shuffle<double>/512                            5876 ns
130Shuffle<double>/4096                          46864 ns
131Shuffle<double>/32768                        385265 ns
132Shuffle<double>/262144                      4663866 ns
133Shuffle<double>/2097152                    54686332 ns
134Shuffle<double>/16777216                  875309099 ns
135Shuffle<double>_BigO                           2.16 NlgN
136Shuffle<double>_RMS                              12 %
137ECDFEvaluation<double>/8                       48.6 ns
138ECDFEvaluation<double>/64                      61.3 ns
139ECDFEvaluation<double>/512                     85.1 ns
140ECDFEvaluation<double>/4096                     105 ns
141ECDFEvaluation<double>/32768                    131 ns
142ECDFEvaluation<double>/262144                   196 ns
143ECDFEvaluation<double>/2097152                  391 ns
144ECDFEvaluation<double>/16777216                 715 ns
145ECDFEvaluation<double>_BigO                   18.19 lgN
146ECDFEvaluation<double>_RMS                       60 %
147```
148
149The call to the unsorted constructor is in fact a little faster than indicated, as the data must be shuffled after being sorted in the benchmark.
150This is itself a fairly expensive operation.
151
152[endsect]
153[/section:empirical_cdf]
154