Statistics Lesson 37 – Goodness of Fit | Dataplexa

Chi-Square Goodness of Fit

In the previous lesson, we used the Chi-Square test to study the relationship between two categorical variables.

In this lesson, we focus on a different question:

“Does observed data match an expected distribution?”

The Chi-Square Goodness of Fit test helps answer this question.

What Is the Chi-Square Goodness of Fit Test?

The Chi-Square Goodness of Fit test compares:

Observed frequencies from data
Expected frequencies from a known or assumed distribution

It tests whether the observed data fits the expected pattern.

When Do We Use This Test?

This test is appropriate when:

There is one categorical variable
We know or assume expected proportions
Data consists of frequency counts
Observations are independent

Examples of Use

Testing if a die is fair
Checking if customer choices match expected percentages
Verifying if defects follow a claimed distribution

Setting Up the Hypotheses

Hypothesis	Statement
H₀	The observed distribution matches the expected distribution
H₁	The observed distribution does not match the expected distribution

Observed vs Expected Frequencies

Observed frequencies come directly from data.

Expected frequencies are calculated using:

Expected Frequency = Total × Expected Proportion

Step-by-Step Example

A company claims that customers choose three subscription plans in the following proportions:

Basic: 50%
Standard: 30%
Premium: 20%

A sample of 200 customers gives the following results:

Plan	Observed	Expected
Basic	90	100
Standard	70	60
Premium	40	40

Chi-Square Statistic Concept

The chi-square statistic measures how far observed frequencies deviate from expected frequencies.

Larger deviations produce larger chi-square values.

Degrees of Freedom

For the goodness of fit test:

Degrees of freedom = Number of categories − 1

In this example:

3 − 1 = 2

Decision Rule

Using the p-value approach:

If p-value ≤ α → Reject H₀
If p-value > α → Fail to reject H₀

Interpretation

Rejecting the null hypothesis means the observed data does not fit the expected distribution.

Failing to reject means there is no strong evidence against the claimed distribution.

Goodness of Fit vs Independence

Aspect	Goodness of Fit	Independence
Variables	One categorical variable	Two categorical variables
Purpose	Match to expected distribution	Check relationship
Table type	One-way table	Contingency table

Common Mistakes to Avoid

Using percentages instead of counts
Ignoring low expected frequencies
Confusing this test with independence test
Assuming rejection means the model is useless

Quick Check

How many categorical variables are used in this test?

Practice Quiz

Question 1:
What does the goodness of fit test compare?

Question 2:
What are degrees of freedom based on?

Question 3:
Does this test prove the expected distribution is correct?

Mini Practice

A lottery claims that numbers 1–5 are equally likely.

What test would you use?
What would rejecting H₀ imply?

What’s Next

In the next lesson, we will study One-Way ANOVA, which compares means across multiple groups.

← Previous Lesson Statistics Index Next ➜