Correlation: Pearson and Spearman
In earlier lessons, we compared means using hypothesis tests. Now we move to a different question:
“How are two variables related to each other?”
Correlation measures the strength and direction of a relationship between two variables.
What Is Correlation?
Correlation quantifies how two numerical variables move together.
- Positive correlation → variables increase together
- Negative correlation → one increases while the other decreases
- No correlation → no clear relationship
Correlation values always lie between −1 and +1.
Pearson Correlation
The Pearson correlation coefficient measures the linear relationship between two continuous numerical variables.
It is denoted by r.
When Is Pearson Correlation Appropriate?
- Both variables are numerical
- The relationship is approximately linear
- No strong outliers are present
- Data is measured on interval or ratio scale
Interpreting Pearson r
| Value of r | Interpretation |
|---|---|
| +1 | Perfect positive linear relationship |
| 0 | No linear relationship |
| −1 | Perfect negative linear relationship |
Numerical Example (Pearson)
Suppose we measure hours studied and exam scores:
- Correlation r = 0.88
This indicates a strong positive linear relationship: as study hours increase, exam scores tend to increase.
Spearman Rank Correlation
The Spearman correlation measures the monotonic relationship between two variables.
Instead of using raw values, it uses ranks.
When Is Spearman Correlation Appropriate?
- Data contains outliers
- Relationship is not linear but monotonic
- Variables are ordinal
- Normality assumptions are violated
Key Difference in Intuition
Pearson asks:
“Do the values change together in a straight-line pattern?”
Spearman asks:
“Do the ranks move together consistently?”
Comparison: Pearson vs Spearman
| Aspect | Pearson | Spearman |
|---|---|---|
| Data type | Numerical | Ordinal or numerical |
| Relationship | Linear | Monotonic |
| Sensitive to outliers | Yes | No |
| Uses ranks | No | Yes |
Real-World Example
Consider the relationship between:
- Salary and years of experience
If salary increases steadily but not linearly, Spearman correlation may be stronger than Pearson.
Correlation Does NOT Mean Causation
A strong correlation does not imply that one variable causes the other.
Example:
- Ice cream sales and drowning incidents
Both increase in summer, but one does not cause the other.
Quick Check
Which correlation method is more robust to outliers?
Spearman correlation.
Practice Quiz
Question 1:
Which correlation measures linear relationships?
Pearson correlation.
Question 2:
Which correlation uses ranks instead of raw values?
Spearman correlation.
Question 3:
Can correlation prove cause-and-effect?
No. Correlation does not imply causation.
Mini Practice
You are studying the relationship between:
- Customer satisfaction ratings (1 to 5)
- Repeat purchase frequency
Which correlation method is more appropriate, and why?
Spearman correlation, because satisfaction ratings are ordinal and ranks are more meaningful than raw values.
What’s Next
In the next lesson, we will move from correlation to Simple Linear Regression, where we model and predict relationships.