Simple Linear Regression
In the previous lesson, we learned how correlation measures the strength and direction of a relationship.
Correlation tells us how variables move together, but it does not allow us to predict.
Simple linear regression takes the next step: it builds a mathematical model to describe and predict relationships.
What Is Simple Linear Regression?
Simple linear regression models the relationship between:
- One independent variable (X)
- One dependent variable (Y)
The goal is to explain how changes in X are associated with changes in Y.
The Regression Equation
The simple linear regression model is written as:
Y = a + bX
- a = intercept
- b = slope
- X = independent variable
- Y = predicted dependent variable
Understanding the Intercept (a)
The intercept represents the predicted value of Y when X equals zero.
In some contexts, this value has a real meaning. In others, it is simply a mathematical starting point.
Understanding the Slope (b)
The slope tells us how much Y changes for a one-unit increase in X.
If b is positive, Y increases as X increases. If b is negative, Y decreases as X increases.
Real-World Interpretation
Suppose we model the relationship between:
- X = hours studied
- Y = exam score
If the regression equation is:
Y = 40 + 5X
This means:
- Each additional hour of study increases the expected score by 5 points
- A student who studies 0 hours is predicted to score 40
How the Best-Fit Line Is Chosen
The regression line is chosen using the least squares method.
This method minimizes the sum of the squared vertical distances between observed values and predicted values.
In simple terms, it finds the line that best fits the data.
Numerical Example
Consider the following data:
| Hours Studied (X) | Score (Y) |
|---|---|
| 2 | 50 |
| 4 | 60 |
| 6 | 72 |
| 8 | 85 |
A regression line fitted to this data might be:
Y = 38 + 6X
If a student studies 5 hours:
Predicted score = 38 + 6(5) = 68
Regression vs Correlation
| Aspect | Correlation | Regression |
|---|---|---|
| Purpose | Measure relationship | Model & predict |
| Direction | No direction | Directional (X → Y) |
| Prediction | No | Yes |
Limitations of Simple Linear Regression
- Only models linear relationships
- Sensitive to outliers
- Does not imply causation
- Requires careful interpretation
Quick Check
What does the slope represent in a regression model?
The expected change in Y for a one-unit increase in X.
Practice Quiz
Question 1:
What is the purpose of simple linear regression?
To model and predict the relationship between two variables.
Question 2:
If b = −3, what does this indicate?
Y decreases by 3 units for every 1-unit increase in X.
Question 3:
Does regression prove causation?
No. Regression shows association, not causation.
Mini Practice
A company models advertising spend (X) and sales revenue (Y) using the equation:
Y = 10,000 + 2,000X
- What does the slope mean?
- Predict revenue when X = 3
Each additional unit of ad spend increases revenue by 2,000.
Predicted revenue = 10,000 + 2,000(3) = 16,000.
What’s Next
In the next lesson, we will learn how to interpret regression output, including coefficients and R-squared.