Statistics Lesson 21 – CLT | Dataplexa

Sampling Distributions and the Central Limit Theorem

Until now, we have worked with raw data and individual samples. In statistics, we often want to understand how sample results behave when we repeat the sampling process many times.

This lesson introduces two powerful ideas: sampling distributions and the Central Limit Theorem (CLT).


What Is a Sampling Distribution?

A sampling distribution is the distribution of a statistic (such as the mean or proportion) calculated from repeated samples of the same size taken from a population.

Instead of looking at individual data points, we look at how a statistic behaves across many samples.


Simple Intuition

Imagine taking many samples of 30 students from a university and calculating the average height for each sample.

Each sample has a different mean. The distribution of these sample means is the sampling distribution.


Why Sampling Distributions Matter

  • They explain why sample statistics vary
  • They help estimate population parameters
  • They form the basis of confidence intervals and hypothesis testing

Sampling Distribution of the Mean

One of the most important sampling distributions is the sampling distribution of the sample mean.

It has two key properties:

  • The mean of the sampling distribution equals the population mean
  • The spread decreases as sample size increases

Numerical Example

Suppose the population mean test score is 70.

If we repeatedly take samples and compute their means:

  • Some sample means will be slightly above 70
  • Some will be slightly below 70

On average, the sample means will center around 70.


What Is the Central Limit Theorem (CLT)?

The Central Limit Theorem states that:

As the sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the population’s shape.

This happens when the sample size is sufficiently large (usually n ≥ 30).


Why the CLT Is Powerful

  • It allows us to use normal distribution methods
  • It works even if the original data is skewed
  • It makes statistical inference possible

Real-World Example

Customer wait times at a service center may be skewed.

If we repeatedly sample average wait times from groups of customers, the distribution of those averages will be approximately normal.

This allows businesses to make reliable decisions using probability models.


Effect of Sample Size

Sample Size Sampling Distribution Shape Variability
Small May not be normal High
Large Approximately normal Low

Common Misunderstandings

  • CLT does not say the data itself becomes normal
  • It applies to the distribution of sample means
  • Larger samples give more reliable estimates

Quick Check

Does the Central Limit Theorem require the population to be normally distributed?


Practice Quiz

Question 1:
What does a sampling distribution describe?


Question 2:
What happens to variability as sample size increases?


Question 3:
What shape does the sampling distribution approach under the CLT?


Mini Practice

A population has a highly skewed income distribution. Samples of size 40 are taken repeatedly.

  • What can you say about the distribution of the sample means?

What’s Next

In the next lesson, we will study Point Estimates and Margin of Error, which help quantify uncertainty in estimates.