Regression Residuals and Assumptions
In the previous lesson, we learned how to interpret regression output. However, a regression model is only reliable if certain assumptions are satisfied.
This lesson explains how to check those assumptions using residuals.
What Are Residuals?
A residual is the difference between an observed value and the value predicted by the regression model.
Residual = Observed Y − Predicted Y
Residuals show how well the model fits individual data points.
Why Residuals Matter
If a regression model is appropriate:
- Residuals should behave randomly
- No clear patterns should appear
- Errors should be evenly spread
Patterns in residuals indicate problems with the model.
Key Assumptions of Simple Linear Regression
| Assumption | Meaning |
|---|---|
| Linearity | Relationship between X and Y is linear |
| Independence | Observations are independent |
| Homoscedasticity | Constant variance of residuals |
| Normality | Residuals are approximately normal |
Checking Linearity
To check linearity, we examine a plot of residuals versus X or residuals versus predicted values.
If the model is appropriate:
- Residuals scatter randomly around zero
A curved pattern suggests a nonlinear relationship.
Checking Independence
Independence means that residuals are not related to each other.
Violations often occur in:
- Time series data
- Sequential measurements
Non-independence can lead to misleading conclusions.
Checking Homoscedasticity
Homoscedasticity means the spread of residuals is roughly constant across all values of X.
A funnel-shaped pattern indicates heteroscedasticity, which violates the assumption.
Checking Normality of Residuals
Residuals do not need to be perfectly normal, but they should be approximately symmetric and bell-shaped.
This is often checked using:
- Histograms
- Q–Q plots
Real-World Example
Suppose we model advertising spend and sales.
If residuals increase in spread as spending increases, this suggests the model’s error grows with scale.
In such cases, a transformation or different model may be needed.
Consequences of Violating Assumptions
- Biased standard errors
- Incorrect p-values
- Unreliable confidence intervals
- Misleading predictions
Common Mistakes
- Ignoring residual plots
- Assuming regression always works
- Focusing only on R²
- Overlooking outliers
Quick Check
What does a random residual pattern indicate?
That the regression model is likely appropriate.
Practice Quiz
Question 1:
What does heteroscedasticity mean?
The variance of residuals is not constant.
Question 2:
Do residuals need to be perfectly normal?
No, they should be approximately normal.
Question 3:
Which assumption is most affected by time-ordered data?
Independence.
Mini Practice
A regression model shows a clear curve in the residual plot.
- Which assumption is violated?
- What might you consider doing?
Linearity is violated. A nonlinear model or transformation may be needed.
What’s Next
In the next lesson, we will extend regression to Multiple Linear Regression, where more than one predictor is used.