ARTICLES

Hypothesis Testing

The BIS.Net Team BIS.Net Team

There are many applications requiring hypothesis testing. It is a great tool for politicians.

A medical researcher may test the response time of a drug on a several rats. Based on the sample response time the practitioner needs to know if the response time has changed from 10 seconds. This must be done statistically taking into account variability.

Consider wishing to know if the current average fill volume for a beverage is above the declared volume. The QA manager may take a sample of 10 cans, measure the average volume and compare it to the declared volume. If the sample average is greater than the declared volume, then a naive QA manager would declare that the process is producing above the declared measure, say 100 mls. A wise QA manager knows that variability must be taken into account and would thus perform a hypothesis test.

A hypotheses test requires specifying a Null hypothesis and an Alternative Hypothesis. The null hypothesis usually the status quo. The alternative is the opposite of the status quo. For the drug experiment the status quo would be that the drug has no effect. For the declared volume test the status quo means that we manufacture to the declared value.

Thus, for the drug experiment

H0=10

And for the declared value test

H0=100

Where H0 is an abbreviation for null hypothesis

The alternative hypothesis for the drug experiment is

Ha<>10

And for the declared value

Ha>100

Ha is an abbreviation for Alternative Hypothesis

The null hypothesis is always an equal hypothesis and the alternative can be < or > or <>. Hypothesis testing involves determining the probability that the sample results were consistent with the null hypothesis, taking sampling error into account. For example, if the sample average volume were 102 what is the probability, based on the variability of the process, that the sample result of 102 could have been obtained if the process average was equal to the declared volume. If the probability is low, we may conclude that the process average is indeed above 100.

The probability where the analyst decides the result is significant is called the level of significance.

Hypothesis tests are related to confidence intervals. For example, if the confidence interval for the average response time for the drug administered to the rats is 9 to 11, then because the null hypothesis value of 10 falls inside the interval the conclusion would be that there is no evidence that the response time has changed. If on the other hand the confidence interval falls between 12 and 15, this does not contain the value of 10 and hence there is good evidence to conclude that the drug is having an effect.

For the declared volume test if the confidence interval were 97 to 99 then because 100 is above this value there is no evidence that the declared volume is above 100, which the QA manager needs to know. Of course, there is evidence that the process is below the declared measure.

Hypothesis test can thus be carried out with confidence intervals, however hypothesis testing provides p values, which provides more information than confidence intervals. The p value is the probability of obtaining the sample results if the null hypothesis holds. Using a 95% confidence interval for hypothesis testing will let the analyst know if the probability of obtaining the results is less than 5%, but not by how much. The p value provides the ‘exact’ probability.

As with confidence intervals probabilities apply to long term experiments. For an instance of the above experiments we can only state that the drug has or does not have an effect, or the volume is above or below the declared volume. However, hypothesis testing gives as some assurance that we draw the wrong conclusion, just no guarantee. If hypothesis testing is used for other applications we can be far more confident that the right decisions are made compared to the naïve, unscientific approach of using only a sample statistic.

Even if used repeatedly, the actual p value must be seen only as an approximation. For example, even though a p value may be .05, it may be as high as .1 or as low as .002. There are many reasons for this. One reason is that data is not normally distributed, another is that for proportions, data is discrete. These problems are discussed under sections for specific confidence intervals in the Knowledge-Center which can be accessed from the links below.

Hypothesis testing assumes random sampling. Sampling is rarely truly random.

A wise analyst will thus use fuzzy thinking and assume the decision world is not precise. A wise decision make will use hypothesis testing or confidence intervals instead of point estimates. A wise decision maker will use the p value or specified level of significance only as a guide, and not make the same mistake that point estimate decision makers make, i.e. assume the numbers are real.