Scatterplots and Correlation
So far, we have focused on understanding and summarizing single variables. In many real-world problems, however, we want to understand relationships between two variables.
Scatterplots and correlation help us explore how variables move together.
What Is a Scatterplot?
A scatterplot displays the relationship between two numerical variables. Each point on the plot represents one observation.
One variable is plotted on the horizontal axis (X-axis), and the other on the vertical axis (Y-axis).
Simple Example
Suppose we record the number of hours studied and the exam score for students.
| Hours Studied | Exam Score |
|---|---|
| 2 | 55 |
| 4 | 65 |
| 6 | 75 |
| 8 | 85 |
A scatterplot of this data would show points rising from left to right.
Patterns in Scatterplots
When analyzing a scatterplot, we usually look for:
- Direction – upward, downward, or no pattern
- Strength – how closely points follow a pattern
- Form – linear or curved
- Outliers – unusual points
Positive and Negative Relationships
If points tend to rise as we move from left to right, the relationship is positive.
If points tend to fall as we move from left to right, the relationship is negative.
Real-World Examples
- Positive: Hours studied vs exam score
- Negative: Speed vs travel time for a fixed distance
What Is Correlation?
Correlation measures the strength and direction of the relationship between two numerical variables.
The correlation coefficient is usually denoted by r.
Its value lies between:
- −1 → Perfect negative correlation
- 0 → No correlation
- +1 → Perfect positive correlation
Interpreting Correlation Values
| Correlation (r) | Interpretation |
|---|---|
| Close to +1 | Strong positive relationship |
| Close to −1 | Strong negative relationship |
| Close to 0 | Weak or no relationship |
Numerical Example
If the correlation between study hours and exam scores is 0.85, it indicates a strong positive relationship.
This means that as study hours increase, exam scores tend to increase as well.
Correlation Does Not Imply Causation
A strong correlation does not mean that one variable causes the other.
Classic Example
There may be a strong correlation between ice cream sales and sunglasses sales.
This does not mean ice cream causes people to buy sunglasses. The underlying factor is sunny weather.
When Scatterplots Are Useful
- Exploring relationships between variables
- Detecting trends
- Identifying outliers
- Checking assumptions before regression
Quick Check
What does a correlation value close to 0 indicate?
There is little or no linear relationship between the variables.
Practice Quiz
Question 1:
Which plot is best for showing relationships between two numerical variables?
Scatterplot.
Question 2:
If r = −0.9, what kind of relationship exists?
Strong negative relationship.
Question 3:
Does correlation always indicate causation?
No. Correlation does not imply causation.
Mini Practice
A researcher studies the relationship between exercise time and resting heart rate.
- What type of relationship would you expect?
- Would a scatterplot be useful?
A negative relationship is expected. Yes, a scatterplot would clearly show this relationship.
What’s Next
In the next lesson, we will explore Sampling Distributions and the Central Limit Theorem, which explain why sample statistics behave predictably.