1[section:binomial_dist Binomial Distribution] 2 3``#include <boost/math/distributions/binomial.hpp>`` 4 5 namespace boost{ namespace math{ 6 7 template <class RealType = double, 8 class ``__Policy`` = ``__policy_class`` > 9 class binomial_distribution; 10 11 typedef binomial_distribution<> binomial; 12 13 template <class RealType, class ``__Policy``> 14 class binomial_distribution 15 { 16 public: 17 typedef RealType value_type; 18 typedef Policy policy_type; 19 20 static const ``['unspecified-type]`` clopper_pearson_exact_interval; 21 static const ``['unspecified-type]`` jeffreys_prior_interval; 22 23 // construct: 24 binomial_distribution(RealType n, RealType p); 25 26 // parameter access:: 27 RealType success_fraction() const; 28 RealType trials() const; 29 30 // Bounds on success fraction: 31 static RealType find_lower_bound_on_p( 32 RealType trials, 33 RealType successes, 34 RealType probability, 35 ``['unspecified-type]`` method = clopper_pearson_exact_interval); 36 static RealType find_upper_bound_on_p( 37 RealType trials, 38 RealType successes, 39 RealType probability, 40 ``['unspecified-type]`` method = clopper_pearson_exact_interval); 41 42 // estimate min/max number of trials: 43 static RealType find_minimum_number_of_trials( 44 RealType k, // number of events 45 RealType p, // success fraction 46 RealType alpha); // risk level 47 48 static RealType find_maximum_number_of_trials( 49 RealType k, // number of events 50 RealType p, // success fraction 51 RealType alpha); // risk level 52 }; 53 54 }} // namespaces 55 56The class type `binomial_distribution` represents a 57[@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]: 58it is used when there are exactly two mutually 59exclusive outcomes of a trial. These outcomes are labelled 60"success" and "failure". The 61__binomial_distrib is used to obtain 62the probability of observing k successes in N trials, with the 63probability of success on a single trial denoted by p. The 64binomial distribution assumes that p is fixed for all trials. 65 66[note The random variable for the binomial distribution is the number of successes, 67(the number of trials is a fixed property of the distribution) 68whereas for the negative binomial, 69the random variable is the number of trials, for a fixed number of successes.] 70 71The PDF for the binomial distribution is given by: 72 73[equation binomial_ref2] 74 75The following two graphs illustrate how the PDF changes depending 76upon the distributions parameters, first we'll keep the success 77fraction /p/ fixed at 0.5, and vary the sample size: 78 79[graph binomial_pdf_1] 80 81Alternatively, we can keep the sample size fixed at N=20 and 82vary the success fraction /p/: 83 84[graph binomial_pdf_2] 85 86[discrete_quantile_warning Binomial] 87 88[h4 Member Functions] 89 90[h5 Construct] 91 92 binomial_distribution(RealType n, RealType p); 93 94Constructor: /n/ is the total number of trials, /p/ is the 95probability of success of a single trial. 96 97Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error. 98 99[h5 Accessors] 100 101 RealType success_fraction() const; 102 103Returns the parameter /p/ from which this distribution was constructed. 104 105 RealType trials() const; 106 107Returns the parameter /n/ from which this distribution was constructed. 108 109[h5 Lower Bound on the Success Fraction] 110 111 static RealType find_lower_bound_on_p( 112 RealType trials, 113 RealType successes, 114 RealType alpha, 115 ``['unspecified-type]`` method = clopper_pearson_exact_interval); 116 117Returns a lower bound on the success fraction: 118 119[variablelist 120[[trials][The total number of trials conducted.]] 121[[successes][The number of successes that occurred.]] 122[[alpha][The largest acceptable probability that the true value of 123 the success fraction is [*less than] the value returned.]] 124[[method][An optional parameter that specifies the method to be used 125 to compute the interval (See below).]] 126] 127 128For example, if you observe /k/ successes from /n/ trials the 129best estimate for the success fraction is simply ['k/n], but if you 130want to be 95% sure that the true value is [*greater than] some value, 131['p[sub min]], then: 132 133 p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(n, k, 0.05); 134 135[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.] 136 137There are currently two possible values available for the /method/ 138optional parameter: /clopper_pearson_exact_interval/ 139or /jeffreys_prior_interval/. These constants are both members of 140class template `binomial_distribution`, so usage is for example: 141 142 p = binomial_distribution<RealType>::find_lower_bound_on_p( 143 n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval); 144 145The default method if this parameter is not specified is the Clopper Pearson 146"exact" interval. This produces an interval that guarantees at least 147`100(1-alpha)%` coverage, but which is known to be overly conservative, 148sometimes producing intervals with much greater than the requested coverage. 149 150The alternative calculation method produces a non-informative 151Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only 152['in the average case], though is typically very close to the requested 153coverage level. It is one of the main methods of calculation recommended 154in the review by Brown, Cai and DasGupta. 155 156Please note that the "textbook" calculation method using 157a normal approximation (the Wald interval) is deliberately 158not provided: it is known to produce consistently poor results, 159even when the sample size is surprisingly large. 160Refer to Brown, Cai and DasGupta for a full explanation. Many other methods 161of calculation are available, and may be more appropriate for specific 162situations. Unfortunately there appears to be no consensus amongst 163statisticians as to which is "best": refer to the discussion at the end of 164Brown, Cai and DasGupta for examples. 165 166The two methods provided here were chosen principally because they 167can be used for both one and two sided intervals. 168See also: 169 170Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001), 171Interval Estimation for a Binomial Proportion, 172Statistical Science, Vol. 16, No. 2, 101-133. 173 174T. Tony Cai (2005), 175One-sided confidence intervals in discrete distributions, 176Journal of Statistical Planning and Inference 131, 63-88. 177 178Agresti, A. and Coull, B. A. (1998). Approximate is better than 179"exact" for interval estimation of binomial proportions. Amer. 180Statist. 52 119-126. 181 182Clopper, C. J. and Pearson, E. S. (1934). The use of confidence 183or fiducial limits illustrated in the case of the binomial. 184Biometrika 26 404-413. 185 186[h5 Upper Bound on the Success Fraction] 187 188 static RealType find_upper_bound_on_p( 189 RealType trials, 190 RealType successes, 191 RealType alpha, 192 ``['unspecified-type]`` method = clopper_pearson_exact_interval); 193 194Returns an upper bound on the success fraction: 195 196[variablelist 197[[trials][The total number of trials conducted.]] 198[[successes][The number of successes that occurred.]] 199[[alpha][The largest acceptable probability that the true value of 200 the success fraction is [*greater than] the value returned.]] 201[[method][An optional parameter that specifies the method to be used 202 to compute the interval. Refer to the documentation for 203 `find_upper_bound_on_p` above for the meaning of the 204 method options.]] 205] 206 207For example, if you observe /k/ successes from /n/ trials the 208best estimate for the success fraction is simply ['k/n], but if you 209want to be 95% sure that the true value is [*less than] some value, 210['p[sub max]], then: 211 212 p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(n, k, 0.05); 213 214[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.] 215 216[note 217In order to obtain a two sided bound on the success fraction, you 218call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p` 219each with the same arguments. 220 221If the desired risk level 222that the true success fraction lies outside the bounds is [alpha], 223then you pass [alpha]/2 to these functions. 224 225So for example a two sided 95% confidence interval would be obtained 226by passing [alpha] = 0.025 to each of the functions. 227 228[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.] 229] 230 231 232[h5 Estimating the Number of Trials Required for a Certain Number of Successes] 233 234 static RealType find_minimum_number_of_trials( 235 RealType k, // number of events 236 RealType p, // success fraction 237 RealType alpha); // probability threshold 238 239This function estimates the minimum number of trials required to ensure that 240more than k events is observed with a level of risk /alpha/ that k or 241fewer events occur. 242 243[variablelist 244[[k][The number of success observed.]] 245[[p][The probability of success for each trial.]] 246[[alpha][The maximum acceptable probability that k events or fewer will be observed.]] 247] 248 249For example: 250 251 binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05); 252 253Returns the smallest number of trials we must conduct to be 95% sure 254of seeing 10 events that occur with frequency one half. 255 256[h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes] 257 258 static RealType find_maximum_number_of_trials( 259 RealType k, // number of events 260 RealType p, // success fraction 261 RealType alpha); // probability threshold 262 263This function estimates the maximum number of trials we can conduct 264to ensure that k successes or fewer are observed, with a risk /alpha/ 265that more than k occur. 266 267[variablelist 268[[k][The number of success observed.]] 269[[p][The probability of success for each trial.]] 270[[alpha][The maximum acceptable probability that more than k events will be observed.]] 271] 272 273For example: 274 275 binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05); 276 277Returns the largest number of trials we can conduct and still be 95% certain 278of not observing any events that occur with one in a million frequency. 279This is typically used in failure analysis. 280 281[link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.] 282 283[h4 Non-member Accessors] 284 285All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions] 286that are generic to all distributions are supported: __usual_accessors. 287 288The domain for the random variable /k/ is `0 <= k <= N`, otherwise a 289__domain_error is returned. 290 291It's worth taking a moment to define what these accessors actually mean in 292the context of this distribution: 293 294[table Meaning of the non-member accessors 295[[Function][Meaning]] 296[[__pdf] 297 [The probability of obtaining [*exactly k successes] from n trials 298 with success fraction p. For example: 299 300`pdf(binomial(n, p), k)`]] 301[[__cdf] 302 [The probability of obtaining [*k successes or fewer] from n trials 303 with success fraction p. For example: 304 305`cdf(binomial(n, p), k)`]] 306[[__ccdf] 307 [The probability of obtaining [*more than k successes] from n trials 308 with success fraction p. For example: 309 310`cdf(complement(binomial(n, p), k))`]] 311[[__quantile] 312 [Given a binomial distribution with ['n] trials, success fraction ['p] and probability ['P], 313 finds the largest number of successes ['k] whose CDF is less than ['P]. 314 It is strongly recommended that you read the tutorial 315 [link math_toolkit.pol_tutorial.understand_dis_quant Understanding Quantiles of Discrete Distributions] before 316 using the quantile function.]] 317[[__quantile_c] 318 [Given a binomial distribution with ['n] trials, success fraction ['p] and probability ['Q], 319 finds the smallest number of successes ['k] whose CDF is greater than ['1-Q]. 320 It is strongly recommended that you read the tutorial 321 [link math_toolkit.pol_tutorial.understand_dis_quant Understanding Quantiles of Discrete Distributions] before 322 using the quantile function.]] 323] 324 325[h4 Examples] 326 327Various [link math_toolkit.stat_tut.weg.binom_eg worked examples] 328are available illustrating the use of the binomial distribution. 329 330[h4 Accuracy] 331 332This distribution is implemented using the 333incomplete beta functions __ibeta and __ibetac, 334please refer to these functions for information on accuracy. 335 336[h4 Implementation] 337 338In the following table /p/ is the probability that one trial will 339be successful (the success fraction), /n/ is the number of trials, 340/k/ is the number of successes, /p/ is the probability and /q = 1-p/. 341 342[table 343[[Function][Implementation Notes]] 344[[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial 345 coefficient of a and b, then we have: 346 347[equation binomial_ref1] 348 349Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)` 350 351The function __ibeta_derivative is used here, since it has already 352 been optimised for the lowest possible error - indeed this is really 353 just a thin wrapper around part of the internals of the incomplete 354 beta function. 355 356There are also various special cases: refer to the code for details. 357 ]] 358[[cdf][Using the relation: 359 360`` 361p = I[sub 1-p](n - k, k + 1) 362 = 1 - I[sub p](k + 1, n - k) 363 = __ibetac(k + 1, n - k, p)`` 364 365There are also various special cases: refer to the code for details. 366]] 367[[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p) 368 369There are also various special cases: refer to the code for details. ]] 370[[quantile][Since the cdf is non-linear in variate /k/ none of the inverse 371 incomplete beta functions can be used here. Instead the quantile 372 is found numerically using a derivative free method 373 (__root_finding_TOMS748).]] 374[[quantile from the complement][Found numerically as above.]] 375[[mean][ `p * n` ]] 376[[variance][ `p * n * (1-p)` ]] 377[[mode][`floor(p * (n + 1))`]] 378[[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]] 379[[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]] 380[[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]] 381[[parameter estimation][The member functions `find_upper_bound_on_p` 382 `find_lower_bound_on_p` and `find_number_of_trials` are 383 implemented in terms of the inverse incomplete beta functions 384 __ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]] 385] 386 387[h4 References] 388 389* [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource]. 390* [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution]. 391* [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Exploratory Data Analysis]. 392 393[endsect] [/section:binomial_dist Binomial] 394 395[/ binomial.qbk 396 Copyright 2006 John Maddock and Paul A. Bristow. 397 Distributed under the Boost Software License, Version 1.0. 398 (See accompanying file LICENSE_1_0.txt or copy at 399 http://www.boost.org/LICENSE_1_0.txt). 400] 401