Statistics Lesson 36 – Chi-Square Test | Dataplexa

Chi-Square Test for Independence

Up to this point, we have worked mostly with numerical data (means, proportions, regression).

Many real-world questions, however, involve categorical variables.

The Chi-Square Test for Independence helps us determine whether two categorical variables are related or independent.

What Does “Independence” Mean?

Two categorical variables are independent if the distribution of one variable does not depend on the other.

If knowing the value of one variable gives information about the other, then the variables are dependent.

When Do We Use the Chi-Square Test for Independence?

This test is used when:

Both variables are categorical
Data is organized in a contingency table
Observations are independent
Expected frequencies are sufficiently large

Contingency Table

A contingency table summarizes the joint distribution of two categorical variables.

Example: Relationship between gender and product preference.

	Product A	Product B	Total
Male	40	60	100
Female	70	30	100
Total	110	90	200

Setting Up the Hypotheses

Hypothesis	Statement
H₀	The two variables are independent
H₁	The two variables are dependent

Expected Frequencies

The chi-square test compares:

Observed frequencies
Expected frequencies (if variables were independent)

The expected frequency for each cell is calculated as:

(Row Total × Column Total) ÷ Grand Total

Example: Expected Frequency

Expected number of males choosing Product A:

(100 × 110) ÷ 200 = 55

The Chi-Square Statistic

The chi-square statistic measures how far observed counts differ from expected counts.

Large differences → stronger evidence against independence.

Degrees of Freedom

Degrees of freedom for the test are:

(rows − 1) × (columns − 1)

For a 2 × 2 table:

(2 − 1) × (2 − 1) = 1

Decision Rule

Using the p-value approach:

If p-value ≤ α → Reject H₀
If p-value > α → Fail to reject H₀

Interpretation in Plain Language

If we reject the null hypothesis, we conclude that the two categorical variables are statistically related.

If we fail to reject it, we conclude there is no evidence of a relationship.

Real-World Example

A company studies whether customer satisfaction depends on the type of subscription plan.

A chi-square test helps determine whether satisfaction levels differ by plan type.

Common Mistakes to Avoid

Using the test for numerical data
Ignoring low expected frequencies
Confusing independence with causation
Using percentages instead of counts

Quick Check

What does rejecting the null hypothesis mean in this test?

Practice Quiz

Question 1:
What type of data does the chi-square test for independence use?

Question 2:
What are expected frequencies based on?

Question 3:
Does this test prove causation?

Mini Practice

A school studies whether study method (online vs classroom) is related to pass/fail outcomes.

What test should be used?
What does rejecting H₀ imply?

What’s Next

In the next lesson, we will study Chi-Square Goodness of Fit, which compares observed data to an expected distribution.

← Previous Lesson Statistics Index Next ➜