Sample size for confidence intervals on the standard deviation

The BIS.Net Team BIS.Net Team

The biggest enemy for quality and cost is variability.

For example, in manufacturing the greater the variability, the greater the chance of producing non-conforming product.

For a politician, the greater the age variability the harder to satisfy all constituents’ needs in the electorate. This in turn provides openings for opposition parties. Similarly, the greater the variability in disposable income the smaller the size of target markets (dependent on income) for the marketing manager, making target marketing less efficient.

For family doctors the greater the variation in key health variables, such as glucose levels or blood pressure the more difficult it is to manage these.

Variability must be quantified to make better decisions in all walks of life. Doctors who do not take natural blood pressure variation into account risk prescribing medication when it is not required, or alternatively will fail to prescribe medication when needed. Marketing managers who do not know the variability in disposable income in their target market will not be able to develop optimum product mixes targeted at customers with different disposable income. Quality managers without an estimate of variability will not be able to perform a process capability analysis.

Variability in the form of the standard deviation is required for many significance tests. The standard deviation places perspective on sample results. For example, a difference of $500 per month in the average disposable income in one region compared to another is of greater importance if the standard deviation of disposable income is $50 than if it were $1000.

Confidence intervals on the mean, difference in two means, all require knowledge of the standard deviation. Variables sampling plans require knowledge of the standard deviation

It is thus important to estimate standard deviation. This estimate must be reliable. Unfortunately, it is not possible to specify a margin of error for two reasons.

  • The margin of error concept assumes that the confidence interval is placed symmetrically about the sample statistic. For example, sample average + or – margin of error. This is not possible for standard deviation as the margin of error is not the same around both sides of the sample standard deviation
  • The confidence interval width for the standard deviation is dependent on the sample standard deviation. This is not known in advance.

Hence a different approach is required.

Consider the confidence interval for the standard deviation which is the square root of the confidence interval of the variance.

The confidence interval for the variance of a normally distributed data set is

(n-1)*Sd^2/ChiSq(alpha/2,n-1) to (n-1)*Sd^2/ChiSq(1-alpha/2,n-1)

Where n is the sample size and alpha the level of significance. The percent confidence level is equal to 100*(1-alpha). ChiSq is the ChiSquare value.

By taking the square root of the expression for confidence interval on the standard deviation we obtain the confidence interval range for the standard deviation

Range= Sd*Sqrt((n-1)/ChiSq(1-alpha/2,n-1))- Sd*Sqrt((n-1)/ChiSq(alpha/2,n-1))

If we divide the expression by Sd then Sd is removed leaving

Range= Sqrt((n-1)/ChiSq(1-alpha/2,n-1))- Sqrt((n-1)/ChiSq(alpha/2,n-1))

This thus does not depend on the sample Sd which is unknown prior to sampling.

100*Range is effectively equal to the % of the confidence interval range relative to any sample standard deviation.

Although this cannot be related to the population standard deviation, other than loosely, it does provide a rational way of specifying criteria to compute a sample size for the confidence interval to be a reasonable size. A confidence interval range that is 10% of say the sample standard deviation is better than having a confidence interval range that is 50% of the sample standard deviation and certainly better than an arbitrary confidence interval range dependent on whatever arbitrary sample size is chosen.

The final issue is solving the above expression for sample size. There is no closed form equation which can be solved directly or numerically. The sample size is a part of the chi squared value. BIS.Net Analyst provides a solution using a combination of machine power and machine learning. The algorithm through a learning process knows where to find and where not to locate the required sample size.

Download the Inferences APP, comprised of mainstream and machine-powered analytics for statistical analysis

Analytics as a Service (AaaS) for Quality

Drive quality improvement through actionable insights using analytics you can trust! Use up to 200 analytics tools downloadable through a suite of Apps!

FREE usage of the analytics Apps for quality improvement
  • Augmented with machine-powered smarts
  • Always updated with the latest tools and features
  • No licencing or fixed subscriptions - Pay ONLY for the analysis you run from 20 USD cents per analysis, billed monthly! Set a budget so you don't exceed!