1[section:geometric_dist Geometric Distribution] 2 3``#include <boost/math/distributions/geometric.hpp>`` 4 5 namespace boost{ namespace math{ 6 7 template <class RealType = double, 8 class ``__Policy`` = ``__policy_class`` > 9 class geometric_distribution; 10 11 typedef geometric_distribution<> geometric; 12 13 template <class RealType, class ``__Policy``> 14 class geometric_distribution 15 { 16 public: 17 typedef RealType value_type; 18 typedef Policy policy_type; 19 // Constructor from success_fraction: 20 geometric_distribution(RealType p); 21 22 // Parameter accessors: 23 RealType success_fraction() const; 24 RealType successes() const; 25 26 // Bounds on success fraction: 27 static RealType find_lower_bound_on_p( 28 RealType trials, 29 RealType successes, 30 RealType probability); // alpha 31 static RealType find_upper_bound_on_p( 32 RealType trials, 33 RealType successes, 34 RealType probability); // alpha 35 36 // Estimate min/max number of trials: 37 static RealType find_minimum_number_of_trials( 38 RealType k, // Number of failures. 39 RealType p, // Success fraction. 40 RealType probability); // Probability threshold alpha. 41 static RealType find_maximum_number_of_trials( 42 RealType k, // Number of failures. 43 RealType p, // Success fraction. 44 RealType probability); // Probability threshold alpha. 45 }; 46 47 }} // namespaces 48 49The class type `geometric_distribution` represents a 50[@http://en.wikipedia.org/wiki/geometric_distribution geometric distribution]: 51it is used when there are exactly two mutually exclusive outcomes of a 52[@http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trial]: 53these outcomes are labelled "success" and "failure". 54 55For [@http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] 56each with success fraction /p/, the geometric distribution gives 57the probability of observing /k/ trials (failures, events, occurrences, or arrivals) 58before the first success. 59 60[note For this implementation, the set of trials *includes zero* 61(unlike another definition where the set of trials starts at one, sometimes named /shifted/).] 62The geometric distribution assumes that success_fraction /p/ is fixed for all /k/ trials. 63 64The probability that there are /k/ failures before the first success 65 66[expression Pr(Y=/k/) = (1-/p/)[super /k/] /p/] 67 68For example, when throwing a 6-face dice the success probability /p/ = 1/6 = 0.1666[recur]. 69Throwing repeatedly until a /three/ appears, 70the probability distribution of the number of times /not-a-three/ is thrown is geometric. 71 72Geometric distribution has the Probability Density Function PDF: 73 74[expression (1-/p/)[super /k/] /p/] 75 76The following graph illustrates how the PDF and CDF vary for three examples 77of the success fraction /p/, 78(when considering the geometric distribution as a continuous function), 79 80[graph geometric_pdf_2] 81 82[graph geometric_cdf_2] 83 84and as discrete. 85 86[graph geometric_pdf_discrete] 87 88[graph geometric_cdf_discrete] 89 90 91[h4 Related Distributions] 92 93The geometric distribution is a special case of 94the __negative_binomial_distrib with successes parameter /r/ = 1, 95so only one first and only success is required : thus by definition 96__spaces `geometric(p) == negative_binomial(1, p)` 97 98 negative_binomial_distribution(RealType r, RealType success_fraction); 99 negative_binomial nb(1, success_fraction); 100 geometric g(success_fraction); 101 ASSERT(pdf(nb, 1) == pdf(g, 1)); 102 103This implementation uses real numbers for the computation throughout 104(because it uses the *real-valued* power and exponential functions). 105So to obtain a conventional strictly-discrete geometric distribution 106you must ensure that an integer value is provided for the number of trials 107(random variable) /k/, 108and take integer values (floor or ceil functions) from functions that return 109a number of successes. 110 111[discrete_quantile_warning geometric] 112 113[h4 Member Functions] 114 115[h5 Constructor] 116 117 geometric_distribution(RealType p); 118 119Constructor: /p/ or success_fraction is the probability of success of a single trial. 120 121Requires: `0 <= p <= 1`. 122 123[h5 Accessors] 124 125 RealType success_fraction() const; // successes / trials (0 <= p <= 1) 126 127Returns the success_fraction parameter /p/ from which this distribution was constructed. 128 129 RealType successes() const; // required successes always one, 130 // included for compatibility with negative binomial distribution 131 // with successes r == 1. 132 133Returns unity. 134 135The following functions are equivalent to those provided for the negative binomial, 136with successes = 1, but are provided here for completeness. 137 138The best method of calculation for the following functions is disputed: 139see __binomial_distrib and __negative_binomial_distrib for more discussion. 140 141[h5 Lower Bound on success_fraction Parameter ['p]] 142 143 static RealType find_lower_bound_on_p( 144 RealType failures, 145 RealType probability) // (0 <= alpha <= 1), 0.05 equivalent to 95% confidence. 146 147Returns a *lower bound* on the success fraction: 148 149[variablelist 150[[failures][The total number of failures before the 1st success.]] 151[[alpha][The largest acceptable probability that the true value of 152 the success fraction is [*less than] the value returned.]] 153] 154 155For example, if you observe /k/ failures from /n/ trials 156the best estimate for the success fraction is simply 1/['n], but if you 157want to be 95% sure that the true value is [*greater than] some value, 158['p[sub min]], then: 159 160 p``[sub min]`` = geometric_distribution<RealType>:: 161 find_lower_bound_on_p(failures, 0.05); 162 163[link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_conf See negative_binomial confidence interval example.] 164 165This function uses the Clopper-Pearson method of computing the lower bound on the 166success fraction, whilst many texts refer to this method as giving an "exact" 167result in practice it produces an interval that guarantees ['at least] the 168coverage required, and may produce pessimistic estimates for some combinations 169of /failures/ and /successes/. See: 170 171[@http://www.ucs.louisiana.edu/~kxk4695/Discrete_new.pdf 172Yong Cai and K. Krishnamoorthy, A Simple Improved Inferential Method for Some Discrete Distributions. 173Computational statistics and data analysis, 2005, vol. 48, no3, 605-621]. 174 175[h5 Upper Bound on success_fraction Parameter p] 176 177 static RealType find_upper_bound_on_p( 178 RealType trials, 179 RealType alpha); // (0 <= alpha <= 1), 0.05 equivalent to 95% confidence. 180 181Returns an *upper bound* on the success fraction: 182 183[variablelist 184[[trials][The total number of trials conducted.]] 185[[alpha][The largest acceptable probability that the true value of 186 the success fraction is [*greater than] the value returned.]] 187] 188 189For example, if you observe /k/ successes from /n/ trials the 190best estimate for the success fraction is simply ['k/n], but if you 191want to be 95% sure that the true value is [*less than] some value, 192['p[sub max]], then: 193 194 p``[sub max]`` = geometric_distribution<RealType>::find_upper_bound_on_p( 195 k, 0.05); 196 197[link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_conf See negative binomial confidence interval example.] 198 199This function uses the Clopper-Pearson method of computing the lower bound on the 200success fraction, whilst many texts refer to this method as giving an "exact" 201result in practice it produces an interval that guarantees ['at least] the 202coverage required, and may produce pessimistic estimates for some combinations 203of /failures/ and /successes/. See: 204 205[@http://www.ucs.louisiana.edu/~kxk4695/Discrete_new.pdf 206Yong Cai and K. Krishnamoorthy, A Simple Improved Inferential Method for Some Discrete Distributions. 207Computational statistics and data analysis, 2005, vol. 48, no3, 605-621]. 208 209[h5 Estimating Number of Trials to Ensure at Least a Certain Number of Failures] 210 211 static RealType find_minimum_number_of_trials( 212 RealType k, // number of failures. 213 RealType p, // success fraction. 214 RealType alpha); // probability threshold (0.05 equivalent to 95%). 215 216This functions estimates the number of trials required to achieve a certain 217probability that [*more than ['k] failures will be observed]. 218 219[variablelist 220[[k][The target number of failures to be observed.]] 221[[p][The probability of ['success] for each trial.]] 222[[alpha][The maximum acceptable ['risk] that only ['k] failures or fewer will be observed.]] 223] 224 225For example: 226 227 geometric_distribution<RealType>::find_minimum_number_of_trials(10, 0.5, 0.05); 228 229Returns the smallest number of trials we must conduct to be 95% (1-0.05) sure 230of seeing 10 failures that occur with frequency one half. 231 232[link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_size_eg Worked Example.] 233 234This function uses numeric inversion of the geometric distribution 235to obtain the result: another interpretation of the result is that it finds 236the number of trials (failures) that will lead to an /alpha/ probability 237of observing /k/ failures or fewer. 238 239[h5 Estimating Number of Trials to Ensure a Maximum Number of Failures or Less] 240 241 static RealType find_maximum_number_of_trials( 242 RealType k, // number of failures. 243 RealType p, // success fraction. 244 RealType alpha); // probability threshold (0.05 equivalent to 95%). 245 246This functions estimates the maximum number of trials we can conduct and achieve 247a certain probability that [*k failures or fewer will be observed]. 248 249[variablelist 250[[k][The maximum number of failures to be observed.]] 251[[p][The probability of ['success] for each trial.]] 252[[alpha][The maximum acceptable ['risk] that more than ['k] failures will be observed.]] 253] 254 255For example: 256 257 geometric_distribution<RealType>::find_maximum_number_of_trials(0, 1.0-1.0/1000000, 0.05); 258 259Returns the largest number of trials we can conduct and still be 95% sure 260of seeing no failures that occur with frequency one in one million. 261 262This function uses numeric inversion of the geometric distribution 263to obtain the result: another interpretation of the result, is that it finds 264the number of trials that will lead to an /alpha/ probability 265of observing more than k failures. 266 267[h4 Non-member Accessors] 268 269All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions] 270that are generic to all distributions are supported: __usual_accessors. 271 272However it's worth taking a moment to define what these actually mean in 273the context of this distribution: 274 275[table Meaning of the non-member accessors. 276[[Function][Meaning]] 277[[__pdf] 278 [The probability of obtaining [*exactly k failures] from /k/ trials 279 with success fraction p. For example: 280 281``pdf(geometric(p), k)``]] 282[[__cdf] 283 [The probability of obtaining [*k failures or fewer] from /k/ trials 284 with success fraction p and success on the last trial. For example: 285 286``cdf(geometric(p), k)``]] 287[[__ccdf] 288 [The probability of obtaining [*more than k failures] from /k/ trials 289 with success fraction p and success on the last trial. For example: 290 291``cdf(complement(geometric(p), k))``]] 292[[__quantile] 293 [The [*greatest] number of failures /k/ expected to be observed from /k/ trials 294 with success fraction /p/, at probability /P/. Note that the value returned 295 is a real-number, and not an integer. Depending on the use case you may 296 want to take either the floor or ceiling of the real result. For example: 297``quantile(geometric(p), P)``]] 298[[__quantile_c] 299 [The [*smallest] number of failures /k/ expected to be observed from /k/ trials 300 with success fraction /p/, at probability /P/. Note that the value returned 301 is a real-number, and not an integer. Depending on the use case you may 302 want to take either the floor or ceiling of the real result. For example: 303 ``quantile(complement(geometric(p), P))``]] 304] 305 306[h4 Accuracy] 307 308This distribution is implemented using the pow and exp functions, so most results 309are accurate within a few epsilon for the RealType. 310For extreme values of `double` /p/, for example 0.9999999999, 311accuracy can fall significantly, for example to 10 decimal digits (from 16). 312 313[h4 Implementation] 314 315In the following table, /p/ is the probability that any one trial will 316be successful (the success fraction), /k/ is the number of failures, 317/p/ is the probability and /q = 1-p/, 318/x/ is the given probability to estimate 319the expected number of failures using the quantile. 320 321[table 322[[Function][Implementation Notes]] 323[[pdf][pdf = p * pow(q, k)]] 324[[cdf][cdf = 1 - q[super k=1]]] 325[[cdf complement][exp(log1p(-p) * (k+1))]] 326[[quantile][k = log1p(-x) / log1p(-p) -1]] 327[[quantile from the complement][k = log(x) / log1p(-p) -1]] 328[[mean][(1-p)/p]] 329[[variance][(1-p)/p[sup2]]] 330[[mode][0]] 331[[skewness][(2-p)/[sqrt]q]] 332[[kurtosis][9+p[sup2]/q]] 333[[kurtosis excess][6 +p[sup2]/q]] 334[[parameter estimation member functions][See __negative_binomial_distrib]] 335[[`find_lower_bound_on_p`][See __negative_binomial_distrib]] 336[[`find_upper_bound_on_p`][See __negative_binomial_distrib]] 337[[`find_minimum_number_of_trials`][See __negative_binomial_distrib]] 338[[`find_maximum_number_of_trials`][See __negative_binomial_distrib]] 339] 340 341[endsect] [/section:geometric_dist geometric] 342 343[/ geometric.qbk 344 Copyright 2010 John Maddock and Paul A. Bristow. 345 Distributed under the Boost Software License, Version 1.0. 346 (See accompanying file LICENSE_1_0.txt or copy at 347 http://www.boost.org/LICENSE_1_0.txt). 348] 349 350