Chi-Square Test for Independence
Up to this point, we have worked mostly with numerical data (means, proportions, regression).
Many real-world questions, however, involve categorical variables.
The Chi-Square Test for Independence helps us determine whether two categorical variables are related or independent.
What Does “Independence” Mean?
Two categorical variables are independent if the distribution of one variable does not depend on the other.
If knowing the value of one variable gives information about the other, then the variables are dependent.
When Do We Use the Chi-Square Test for Independence?
This test is used when:
- Both variables are categorical
- Data is organized in a contingency table
- Observations are independent
- Expected frequencies are sufficiently large
Contingency Table
A contingency table summarizes the joint distribution of two categorical variables.
Example: Relationship between gender and product preference.
| Product A | Product B | Total | |
|---|---|---|---|
| Male | 40 | 60 | 100 |
| Female | 70 | 30 | 100 |
| Total | 110 | 90 | 200 |
Setting Up the Hypotheses
| Hypothesis | Statement |
|---|---|
| H₀ | The two variables are independent |
| H₁ | The two variables are dependent |
Expected Frequencies
The chi-square test compares:
- Observed frequencies
- Expected frequencies (if variables were independent)
The expected frequency for each cell is calculated as:
(Row Total × Column Total) ÷ Grand Total
Example: Expected Frequency
Expected number of males choosing Product A:
(100 × 110) ÷ 200 = 55
The Chi-Square Statistic
The chi-square statistic measures how far observed counts differ from expected counts.
Large differences → stronger evidence against independence.
Degrees of Freedom
Degrees of freedom for the test are:
(rows − 1) × (columns − 1)
For a 2 × 2 table:
(2 − 1) × (2 − 1) = 1
Decision Rule
Using the p-value approach:
- If p-value ≤ α → Reject H₀
- If p-value > α → Fail to reject H₀
Interpretation in Plain Language
If we reject the null hypothesis, we conclude that the two categorical variables are statistically related.
If we fail to reject it, we conclude there is no evidence of a relationship.
Real-World Example
A company studies whether customer satisfaction depends on the type of subscription plan.
A chi-square test helps determine whether satisfaction levels differ by plan type.
Common Mistakes to Avoid
- Using the test for numerical data
- Ignoring low expected frequencies
- Confusing independence with causation
- Using percentages instead of counts
Quick Check
What does rejecting the null hypothesis mean in this test?
The two categorical variables are dependent.
Practice Quiz
Question 1:
What type of data does the chi-square test for independence use?
Categorical data.
Question 2:
What are expected frequencies based on?
Assumption that variables are independent.
Question 3:
Does this test prove causation?
No. It only tests association.
Mini Practice
A school studies whether study method (online vs classroom) is related to pass/fail outcomes.
- What test should be used?
- What does rejecting H₀ imply?
Chi-square test for independence. Rejecting H₀ implies study method and outcome are related.
What’s Next
In the next lesson, we will study Chi-Square Goodness of Fit, which compares observed data to an expected distribution.