FEATURED ARTICLE

# Sample size for confidence intervals on the proportion (and ppm) using machine powered algorithms

Knowing the proportions of some occurrence over all possible occurrences is important in countless applications. A politician would like to know what proportion of voters in his electorate will vote for him. A market research manager would like to know what proportion of customers are highly satisfied with after sales service. A Quality Assurance manager would like to know what proportion of faulty products are returned for warranty claims. A medical researcher would like to know what proportion of patients reacted favourably to new pain relief medication.

For all these applications samples need to be taken as the population is too large. The question is how large a sample size needs to be taken to provide the required precision. Consider a politician who has conducted a mini-survey based on 30 samples and found that 54% of constituents would vote for him. He would be very foolish to conclude that most voters in his electorate would vote for him based on this sample size. Based on this sample size it is possible that 64 percent of voters would vote for his opponent. The 54% is just the way the numbers fell when taking a sample size of 30.

The classical approach is of calculating confidence intervals for the proportion is to use the Wald formula obtained from inverting the Wald statistic. The expression for this interval is equal to:

This formula is often used to compute the sample size by solving for n.

Confidence intervals using this approach tend to be unreliable with coverage less than specified.

An alternative method is using an exact method, such as the Clopper-Pearson a method. However, although the term exact is used, it is only exact because the binomial distribution is used instead of instead of using a normal distribution approximation. The coverage is not exact as has been shown by our own simulations and several academic papers by authors such as Agresti and Coull.

Our simulation research has shown that the unmodified Wilson score interval has the best coverage over all scenarios and hence the BIS.Net Sample Size app uses the Wilson Score method only. (Pay as you go apps only use technology proven to work best). Analysts who wish to use classical technologies can do so for free by using the online BISNETAnalyst.com

The Wilson score interval is calculated using the following expression.

Where the right-hand side is the margin-of-error

Although this function can be solved numerically for n and a given margin of error a far more efficient machine powered algorithm is used to provide and extremely fast and accurate solution.

As pie^ is the estimate of the population p and hence unknown at the time of wishing to estimate the sample size BIS.Net Sample Size uses the worst case of pie^ of .5 when computing sample size to cover all cases. The margin-of-error will therefore be better than specified in practice unless the sample proportion is equal to 0.5 after sampling.

Please note the same principles apply to ppm wish is converted internally to proportion when calculating sample size. E.g. a ppm of 300 is equal to a proportion of 300/1000000.

## Analytics as a Service (AaaS) for Quality

Drive quality improvement through actionable insights using analytics you can trust! Use up to 200 analytics tools downloadable through a suite of Apps!

• Augmented with machine-powered smarts
• Always updated with the latest tools and features
• No licencing or fixed subscriptions - Pay ONLY for the analysis you run from 20 USD cents per analysis, billed monthly! Set a budget so you don't exceed!