Statistics Lesson 36 – Chi-Square Test | Dataplexa

Chi-Square Test for Independence

Up to this point, we have worked mostly with numerical data (means, proportions, regression).

Many real-world questions, however, involve categorical variables.

The Chi-Square Test for Independence helps us determine whether two categorical variables are related or independent.


What Does “Independence” Mean?

Two categorical variables are independent if the distribution of one variable does not depend on the other.

If knowing the value of one variable gives information about the other, then the variables are dependent.


When Do We Use the Chi-Square Test for Independence?

This test is used when:

  • Both variables are categorical
  • Data is organized in a contingency table
  • Observations are independent
  • Expected frequencies are sufficiently large

Contingency Table

A contingency table summarizes the joint distribution of two categorical variables.

Example: Relationship between gender and product preference.

Product A Product B Total
Male 40 60 100
Female 70 30 100
Total 110 90 200

Setting Up the Hypotheses

Hypothesis Statement
H₀ The two variables are independent
H₁ The two variables are dependent

Expected Frequencies

The chi-square test compares:

  • Observed frequencies
  • Expected frequencies (if variables were independent)

The expected frequency for each cell is calculated as:

(Row Total × Column Total) ÷ Grand Total


Example: Expected Frequency

Expected number of males choosing Product A:

(100 × 110) ÷ 200 = 55


The Chi-Square Statistic

The chi-square statistic measures how far observed counts differ from expected counts.

Large differences → stronger evidence against independence.


Degrees of Freedom

Degrees of freedom for the test are:

(rows − 1) × (columns − 1)

For a 2 × 2 table:

(2 − 1) × (2 − 1) = 1


Decision Rule

Using the p-value approach:

  • If p-value ≤ α → Reject H₀
  • If p-value > α → Fail to reject H₀

Interpretation in Plain Language

If we reject the null hypothesis, we conclude that the two categorical variables are statistically related.

If we fail to reject it, we conclude there is no evidence of a relationship.


Real-World Example

A company studies whether customer satisfaction depends on the type of subscription plan.

A chi-square test helps determine whether satisfaction levels differ by plan type.


Common Mistakes to Avoid

  • Using the test for numerical data
  • Ignoring low expected frequencies
  • Confusing independence with causation
  • Using percentages instead of counts

Quick Check

What does rejecting the null hypothesis mean in this test?


Practice Quiz

Question 1:
What type of data does the chi-square test for independence use?


Question 2:
What are expected frequencies based on?


Question 3:
Does this test prove causation?


Mini Practice

A school studies whether study method (online vs classroom) is related to pass/fail outcomes.

  • What test should be used?
  • What does rejecting H₀ imply?

What’s Next

In the next lesson, we will study Chi-Square Goodness of Fit, which compares observed data to an expected distribution.