Sampling Distributions and the Central Limit Theorem
Until now, we have worked with raw data and individual samples. In statistics, we often want to understand how sample results behave when we repeat the sampling process many times.
This lesson introduces two powerful ideas: sampling distributions and the Central Limit Theorem (CLT).
What Is a Sampling Distribution?
A sampling distribution is the distribution of a statistic (such as the mean or proportion) calculated from repeated samples of the same size taken from a population.
Instead of looking at individual data points, we look at how a statistic behaves across many samples.
Simple Intuition
Imagine taking many samples of 30 students from a university and calculating the average height for each sample.
Each sample has a different mean. The distribution of these sample means is the sampling distribution.
Why Sampling Distributions Matter
- They explain why sample statistics vary
- They help estimate population parameters
- They form the basis of confidence intervals and hypothesis testing
Sampling Distribution of the Mean
One of the most important sampling distributions is the sampling distribution of the sample mean.
It has two key properties:
- The mean of the sampling distribution equals the population mean
- The spread decreases as sample size increases
Numerical Example
Suppose the population mean test score is 70.
If we repeatedly take samples and compute their means:
- Some sample means will be slightly above 70
- Some will be slightly below 70
On average, the sample means will center around 70.
What Is the Central Limit Theorem (CLT)?
The Central Limit Theorem states that:
As the sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the population’s shape.
This happens when the sample size is sufficiently large (usually n ≥ 30).
Why the CLT Is Powerful
- It allows us to use normal distribution methods
- It works even if the original data is skewed
- It makes statistical inference possible
Real-World Example
Customer wait times at a service center may be skewed.
If we repeatedly sample average wait times from groups of customers, the distribution of those averages will be approximately normal.
This allows businesses to make reliable decisions using probability models.
Effect of Sample Size
| Sample Size | Sampling Distribution Shape | Variability |
|---|---|---|
| Small | May not be normal | High |
| Large | Approximately normal | Low |
Common Misunderstandings
- CLT does not say the data itself becomes normal
- It applies to the distribution of sample means
- Larger samples give more reliable estimates
Quick Check
Does the Central Limit Theorem require the population to be normally distributed?
No. The CLT works regardless of the population distribution, as long as the sample size is large enough.
Practice Quiz
Question 1:
What does a sampling distribution describe?
The distribution of a statistic calculated from many samples.
Question 2:
What happens to variability as sample size increases?
Variability decreases.
Question 3:
What shape does the sampling distribution approach under the CLT?
Normal distribution.
Mini Practice
A population has a highly skewed income distribution. Samples of size 40 are taken repeatedly.
- What can you say about the distribution of the sample means?
The distribution of sample means will be approximately normal because the sample size is large.
What’s Next
In the next lesson, we will study Point Estimates and Margin of Error, which help quantify uncertainty in estimates.