Statistics Lesson 37 – Goodness of Fit | Dataplexa

Chi-Square Goodness of Fit

In the previous lesson, we used the Chi-Square test to study the relationship between two categorical variables.

In this lesson, we focus on a different question:

“Does observed data match an expected distribution?”

The Chi-Square Goodness of Fit test helps answer this question.


What Is the Chi-Square Goodness of Fit Test?

The Chi-Square Goodness of Fit test compares:

  • Observed frequencies from data
  • Expected frequencies from a known or assumed distribution

It tests whether the observed data fits the expected pattern.


When Do We Use This Test?

This test is appropriate when:

  • There is one categorical variable
  • We know or assume expected proportions
  • Data consists of frequency counts
  • Observations are independent

Examples of Use

  • Testing if a die is fair
  • Checking if customer choices match expected percentages
  • Verifying if defects follow a claimed distribution

Setting Up the Hypotheses

Hypothesis Statement
H₀ The observed distribution matches the expected distribution
H₁ The observed distribution does not match the expected distribution

Observed vs Expected Frequencies

Observed frequencies come directly from data.

Expected frequencies are calculated using:

Expected Frequency = Total × Expected Proportion


Step-by-Step Example

A company claims that customers choose three subscription plans in the following proportions:

  • Basic: 50%
  • Standard: 30%
  • Premium: 20%

A sample of 200 customers gives the following results:

Plan Observed Expected
Basic 90 100
Standard 70 60
Premium 40 40

Chi-Square Statistic Concept

The chi-square statistic measures how far observed frequencies deviate from expected frequencies.

Larger deviations produce larger chi-square values.


Degrees of Freedom

For the goodness of fit test:

Degrees of freedom = Number of categories − 1

In this example:

3 − 1 = 2


Decision Rule

Using the p-value approach:

  • If p-value ≤ α → Reject H₀
  • If p-value > α → Fail to reject H₀

Interpretation

Rejecting the null hypothesis means the observed data does not fit the expected distribution.

Failing to reject means there is no strong evidence against the claimed distribution.


Goodness of Fit vs Independence

Aspect Goodness of Fit Independence
Variables One categorical variable Two categorical variables
Purpose Match to expected distribution Check relationship
Table type One-way table Contingency table

Common Mistakes to Avoid

  • Using percentages instead of counts
  • Ignoring low expected frequencies
  • Confusing this test with independence test
  • Assuming rejection means the model is useless

Quick Check

How many categorical variables are used in this test?


Practice Quiz

Question 1:
What does the goodness of fit test compare?


Question 2:
What are degrees of freedom based on?


Question 3:
Does this test prove the expected distribution is correct?


Mini Practice

A lottery claims that numbers 1–5 are equally likely.

  • What test would you use?
  • What would rejecting H₀ imply?

What’s Next

In the next lesson, we will study One-Way ANOVA, which compares means across multiple groups.