Chi-Square Goodness of Fit
In the previous lesson, we used the Chi-Square test to study the relationship between two categorical variables.
In this lesson, we focus on a different question:
“Does observed data match an expected distribution?”
The Chi-Square Goodness of Fit test helps answer this question.
What Is the Chi-Square Goodness of Fit Test?
The Chi-Square Goodness of Fit test compares:
- Observed frequencies from data
- Expected frequencies from a known or assumed distribution
It tests whether the observed data fits the expected pattern.
When Do We Use This Test?
This test is appropriate when:
- There is one categorical variable
- We know or assume expected proportions
- Data consists of frequency counts
- Observations are independent
Examples of Use
- Testing if a die is fair
- Checking if customer choices match expected percentages
- Verifying if defects follow a claimed distribution
Setting Up the Hypotheses
| Hypothesis | Statement |
|---|---|
| H₀ | The observed distribution matches the expected distribution |
| H₁ | The observed distribution does not match the expected distribution |
Observed vs Expected Frequencies
Observed frequencies come directly from data.
Expected frequencies are calculated using:
Expected Frequency = Total × Expected Proportion
Step-by-Step Example
A company claims that customers choose three subscription plans in the following proportions:
- Basic: 50%
- Standard: 30%
- Premium: 20%
A sample of 200 customers gives the following results:
| Plan | Observed | Expected |
|---|---|---|
| Basic | 90 | 100 |
| Standard | 70 | 60 |
| Premium | 40 | 40 |
Chi-Square Statistic Concept
The chi-square statistic measures how far observed frequencies deviate from expected frequencies.
Larger deviations produce larger chi-square values.
Degrees of Freedom
For the goodness of fit test:
Degrees of freedom = Number of categories − 1
In this example:
3 − 1 = 2
Decision Rule
Using the p-value approach:
- If p-value ≤ α → Reject H₀
- If p-value > α → Fail to reject H₀
Interpretation
Rejecting the null hypothesis means the observed data does not fit the expected distribution.
Failing to reject means there is no strong evidence against the claimed distribution.
Goodness of Fit vs Independence
| Aspect | Goodness of Fit | Independence |
|---|---|---|
| Variables | One categorical variable | Two categorical variables |
| Purpose | Match to expected distribution | Check relationship |
| Table type | One-way table | Contingency table |
Common Mistakes to Avoid
- Using percentages instead of counts
- Ignoring low expected frequencies
- Confusing this test with independence test
- Assuming rejection means the model is useless
Quick Check
How many categorical variables are used in this test?
One categorical variable.
Practice Quiz
Question 1:
What does the goodness of fit test compare?
Observed frequencies to expected frequencies.
Question 2:
What are degrees of freedom based on?
Number of categories minus one.
Question 3:
Does this test prove the expected distribution is correct?
No. It only checks whether data is consistent with it.
Mini Practice
A lottery claims that numbers 1–5 are equally likely.
- What test would you use?
- What would rejecting H₀ imply?
Chi-Square Goodness of Fit. Rejecting H₀ implies the numbers are not equally likely.
What’s Next
In the next lesson, we will study One-Way ANOVA, which compares means across multiple groups.