Statistics Lesson 31 – Correlation | Dataplexa

Correlation: Pearson and Spearman

In earlier lessons, we compared means using hypothesis tests. Now we move to a different question:

“How are two variables related to each other?”

Correlation measures the strength and direction of a relationship between two variables.


What Is Correlation?

Correlation quantifies how two numerical variables move together.

  • Positive correlation → variables increase together
  • Negative correlation → one increases while the other decreases
  • No correlation → no clear relationship

Correlation values always lie between −1 and +1.


Pearson Correlation

The Pearson correlation coefficient measures the linear relationship between two continuous numerical variables.

It is denoted by r.


When Is Pearson Correlation Appropriate?

  • Both variables are numerical
  • The relationship is approximately linear
  • No strong outliers are present
  • Data is measured on interval or ratio scale

Interpreting Pearson r

Value of r Interpretation
+1 Perfect positive linear relationship
0 No linear relationship
−1 Perfect negative linear relationship

Numerical Example (Pearson)

Suppose we measure hours studied and exam scores:

  • Correlation r = 0.88

This indicates a strong positive linear relationship: as study hours increase, exam scores tend to increase.


Spearman Rank Correlation

The Spearman correlation measures the monotonic relationship between two variables.

Instead of using raw values, it uses ranks.


When Is Spearman Correlation Appropriate?

  • Data contains outliers
  • Relationship is not linear but monotonic
  • Variables are ordinal
  • Normality assumptions are violated

Key Difference in Intuition

Pearson asks:

“Do the values change together in a straight-line pattern?”

Spearman asks:

“Do the ranks move together consistently?”


Comparison: Pearson vs Spearman

Aspect Pearson Spearman
Data type Numerical Ordinal or numerical
Relationship Linear Monotonic
Sensitive to outliers Yes No
Uses ranks No Yes

Real-World Example

Consider the relationship between:

  • Salary and years of experience

If salary increases steadily but not linearly, Spearman correlation may be stronger than Pearson.


Correlation Does NOT Mean Causation

A strong correlation does not imply that one variable causes the other.

Example:

  • Ice cream sales and drowning incidents

Both increase in summer, but one does not cause the other.


Quick Check

Which correlation method is more robust to outliers?


Practice Quiz

Question 1:
Which correlation measures linear relationships?


Question 2:
Which correlation uses ranks instead of raw values?


Question 3:
Can correlation prove cause-and-effect?


Mini Practice

You are studying the relationship between:

  • Customer satisfaction ratings (1 to 5)
  • Repeat purchase frequency

Which correlation method is more appropriate, and why?


What’s Next

In the next lesson, we will move from correlation to Simple Linear Regression, where we model and predict relationships.