We are often asked how much study sample is needed when clients work with us to design a study. This question can be definitely answered with 2 other questions:
1. How much budget do you have?
2. How much certainty do you want in the results?
The first question seems a bit facetious, but it’s not. Budget is a finite entity, and using it all up on sample often precludes using it on further projects. The second question is more relevant, and that’s what we discuss here.
What is a confidence interval?
That’s what the crux of sample size is all about. Statistical significance testing with a 95% confidence interval means that the survey results we obtain have a 95% probability of being representative of the results if we were to ask the question of the entire population, i.e. it’s “true” out there in the world.
In our business, a confidence interval of 95% is usually adequate. We sometimes relax this requirement, but only when we are dealing with smaller sample sizes. This relaxation of the confidence interval doesn’t come without a price, however; instead of being assured that a mean is reflective of the population 95 times out of 100, we’re only confident that it is true 90 times out of 100. That may be too much uncertainty for some business decisions based on our research.
Why a minimum of n = 30 when stat testing?
We’ve all had to do it, we footnote a slide with “small base, use with caution” when we’re comparing results with small sample sizes.
What is magic about n = 30? Well, one of the underlying premises of measuring survey results statistically is that the results are “normally” distributed. This normality has a precise technical definition—it means that the curve of the results represents a normal bell curve, with 50 percent of the results lying on either side of the mean, 68 percent lying within one standard deviation(s) of the mean, 95 percent lying within 2 standard deviations of the mean, and so on.
With 30 or fewer in our sample, the distribution (aka dispersion) of the data is not considered normal, so regular stat testing rules do not apply.
When we test for statistical significance, what exactly are we measuring?
That’s a more profound question than it appears at first glance. In technical terms, stat testing is used to accept or reject the null hypothesis. For example, consider someone who is testing a new product, Product A, against the existing product, the Control. We help them develop a questionnaire based on their research questions.
Each question presents a research hypothesis, as follows:
The likelihood to purchase Product A is greater than the likelihood to purchase the Control.
Stat testing considers and quantifies the reverse of this, the null hypothesis:
The likelihood to purchase Product A is NOT greater than the likelihood to purchase the Control.
Based on our stat testing—functions of sample size and confidence interval—let’s say that this difference is “statistically significant.” This means that we can reject the null hypothesis, and the intent to purchase Product A is real out there in the world.
Note that stat testing is intended to accept or reject the null hypothesis, never to accept or reject its positively-framed cousin, the research hypothesis.
How do we interpret results that are not statistically significant?
Let’s assume for a minute that our results were not statistically significant. Did Product A still win? If we say “yes,” we have committed what’s known as a Type I error. A Type I error means we have rejected the null hypothesis when in fact the null hypothesis was true.
The other type of possible error is called a Type II error, where we accept the null hypothesis when in fact it’s false. There is always some risk of both types of these errors, so we have to be careful to recognize them when they occur.