TECHNOLOGY OVERVIEW

# Confidence Intervals for proportions and parts per million

The following information is directed at proportions but also applicable to ppm as ppm is directly related to proportions.

Proportions are based on underlying discreet (and hence are not on a continuous) scale as for the mean sampled from a Normal distribution.

Consider a sample size of 10 sampled from a Normal Population. There can be an infinite number of possible averages, ignoring measurement precision, based around the expected average. E.g. 10.0000000001, 10.0100000000.

Consider a sample size of 10 sampled from a lot of clear marbles containing some red marbles. The objective of the sampling is to measure the proportion of red marbles. In this instance we have a discrete, not continuous scale. For each drawing only a red or clear marble can be obtained. For a sample size of 10 there are only 11 outcomes not an infinite number. These are R0 C10, R1 C9, R2 C8, R3 C7, R4 C6, R 5 C5, R 6 C4, R 7 C 3, R8 C2, R9 C 1, R10 C0 where R stands for Red marble and C for Clear marble. The numbers are the number of marbles obtained. E.g. R1 C9 means one red marble and 9 clear marbles.

Hence, only the following proportions of red marbles in a sample of 10 are possible: 0, .1,.2,.3,.4,.5,.6,.7,.8,.9,1

Due to the discreteness of the observations the actual coverage of the confidence interval will differ to the specified coverage. If the analyst specifies 95% the actual coverage may only be 90% resulting in wrong conclusons

Historically the Wald interval has been the most popular and is still seen in most elementary statistical text books and software relying on Main Stream statistics.

This interval is calculated as

LL= sample proportion - Z(alpha/2)*Sqrt((sample proportion)*(1-sample proportion)/sample size)

UL= sample proportion + Z(alpha/2)*Sqrt((sample proportion)*(1-sample proportion)/sample size)

It works reasonably well for large sample sizes, and can be modified to make some improvements, but generally the actual probability that the interval contains the population proportion is less.

More recently with modern computing power an ‘Exact’ method has been proposed as an alternative, for example the Clopper–Pearson interval. However, the term exact is misleading. It is exact in the sense that it is based on the cumulative probabilities of the binomial distribution (i.e. the correct distribution not an approximation. The problem of the discontinuous nature of the binomial distribution prevents any interval with exact coverage for all population proportions.

The free online BIS.Net Analyst version uses both the exact and Wald method for calculating the confidence intervals on the proportion.

Use the BIS.Net Inferences APP for Machine Powered technology to obtain more reliable results

## Analytics as a Service (AaaS) for Quality

Drive quality improvement through actionable insights using analytics you can trust! Use up to 200 analytics tools downloadable through a suite of Apps!

• Augmented with machine-powered smarts
• Always updated with the latest tools and features
• No licencing or fixed subscriptions - Pay ONLY for the analysis you run from 20 USD cents per analysis, billed monthly! Set a budget so you don't exceed!