Confidence Intervals for proportions and parts per million

The BIS.Net Team BIS.Net Team

The following information is directed at proportions but also applicable to ppm as ppm is directly related to proportions.

Proportions are based on an underlying discreet scale.

Consider a sample size of 10 sampled from a Normal Population. There can be an infinite number of possible averages, ignoring measurement precision, based around the expected average. E.g. 10.0000000001, 10.0100000000.

Consider a sample size of 10 sampled from a lot of clear marbles containing some red marbles. The objective of the sampling is to measure the proportion of red marbles. In this instance we have a discrete, not continuous scale. For each drawing only a red or clear marble can be obtained. For a sample size of 10 there are only 11 outcomes not an infinite number. These are R0 C10, R1 C9, R2 C8, R3 C7, R4 C6, R 5 C5, R 6 C4, R 7 C 3, R8 C2, R9 C 1, R10 C0 where R stands for Red marble and C for Clear marble. The numbers are the number of marbles obtained. E.g. R1 C9 means one red marble and 9 clear marbles.

Hence, only the following proportions of red marbles in a sample of 10 are possible: 0, .1,.2,.3,.4,.5,.6,.7,.8,.9,1

Due to the discreteness of the observations the actual coverage of the confidence interval will differ to the specified coverage. If the analyst specifies 95% the actual coverage may only be 90% resulting in wrong conclusions.

Historically the Wald interval has been the most popular and is still seen in most elementary statistical text books and software relying on Main Stream statistics.

This interval is calculated as:

LL= sample proportion - Z(alpha/2)*Sqrt((sample proportion)*(1-sample proportion)/sample size)

UL= sample proportion + Z(alpha/2)*Sqrt((sample proportion)*(1-sample proportion)/sample size)

It works reasonably well for large sample sizes, and can be modified to make some improvements, but generally the actual probability that the interval contains the population proportion is less.

More recently with modern computing power an ‘Exact’ method has been proposed as an alternative, for example the Clopper–Pearson interval. However, the term exact is misleading. It is exact in the sense that it is based on the cumulative probabilities of the binomial distribution (i.e. the correct distribution not an approximation). The discontinuous nature of the binomial distribution prevents any interval with exact coverage for all population proportions.

Both the Wilson Score method and the likelihood ratio-based confidence interval methods provide much better estimates according to our own simulations and that performed by others.


The likelihood ratio-based method is more complex to solve. It requires an iterative method to solve the following expression where the right-hand side is equal to the ChiSquare value for one degree of freedom and the chosen level of significance, e.g. .05. Pi^ is the sample proportion and n sample size.


The Likelihood method although more complex to solve has according to our simulations provided the best coverage overall.

The BIS.Net Inferences APP uses Machine Power to obtain confidence intervals for proportions using the Wilson Score and likelihood ratio-based method. However, the analyst is warned that even these two methods do not guarantee coverage. Although the overall coverage is superior there are many situations where the coverage is not reliable and can be out by 3%. Due to the discrete nature of the data it is not possible to guarantee obtaining actual coverage equalling the specified coverage.

The BIS.Net Inferences App thus includes a ‘Safe’ interval. The coverage is guaranteed to be above 99% and hence the maximum error is 1%, which is considerably less than all the other alternatives proposed by various scholars.

The Safe option is arguably the only workable guaranteed solution, but it requires a change of thinking towards the 95% confidence coefficient. There is nothing sacrosanct about the value 95. Over the years it has become entrenched in classes and textbooks, and now accepted as being ‘correct’. 95% has historically been chosen as a reasonable compromise over being exceedingly cautious or foolish when drawing conclusions. For most applications one can argue that 99% is a more responsible level, especially for medical applications. As a reference Quality Professionals use 99.7% levels for process control limits, not 95% because these result in too many wrong conclusions.

We recommend using the ‘Safe Value’ as worst case benchmark.

Download the Inferences APP, comprised of mainstream and machine-powered analytics for statistical analysis

Analytics as a Service (AaaS) for Quality

Drive quality improvement through actionable insights using analytics you can trust! Use up to 200 analytics tools downloadable through a suite of Apps!

FREE usage of the analytics Apps for quality improvement
  • Augmented with machine-powered smarts
  • Always updated with the latest tools and features
  • No licencing or fixed subscriptions - Pay ONLY for the analysis you run from 20 USD cents per analysis, billed monthly! Set a budget so you don't exceed!