Statistics Lesson 34 – Regression Assumptions | Dataplexa

Regression Residuals and Assumptions

In the previous lesson, we learned how to interpret regression output. However, a regression model is only reliable if certain assumptions are satisfied.

This lesson explains how to check those assumptions using residuals.


What Are Residuals?

A residual is the difference between an observed value and the value predicted by the regression model.

Residual = Observed Y − Predicted Y

Residuals show how well the model fits individual data points.


Why Residuals Matter

If a regression model is appropriate:

  • Residuals should behave randomly
  • No clear patterns should appear
  • Errors should be evenly spread

Patterns in residuals indicate problems with the model.


Key Assumptions of Simple Linear Regression

Assumption Meaning
Linearity Relationship between X and Y is linear
Independence Observations are independent
Homoscedasticity Constant variance of residuals
Normality Residuals are approximately normal

Checking Linearity

To check linearity, we examine a plot of residuals versus X or residuals versus predicted values.

If the model is appropriate:

  • Residuals scatter randomly around zero

A curved pattern suggests a nonlinear relationship.


Checking Independence

Independence means that residuals are not related to each other.

Violations often occur in:

  • Time series data
  • Sequential measurements

Non-independence can lead to misleading conclusions.


Checking Homoscedasticity

Homoscedasticity means the spread of residuals is roughly constant across all values of X.

A funnel-shaped pattern indicates heteroscedasticity, which violates the assumption.


Checking Normality of Residuals

Residuals do not need to be perfectly normal, but they should be approximately symmetric and bell-shaped.

This is often checked using:

  • Histograms
  • Q–Q plots

Real-World Example

Suppose we model advertising spend and sales.

If residuals increase in spread as spending increases, this suggests the model’s error grows with scale.

In such cases, a transformation or different model may be needed.


Consequences of Violating Assumptions

  • Biased standard errors
  • Incorrect p-values
  • Unreliable confidence intervals
  • Misleading predictions

Common Mistakes

  • Ignoring residual plots
  • Assuming regression always works
  • Focusing only on R²
  • Overlooking outliers

Quick Check

What does a random residual pattern indicate?


Practice Quiz

Question 1:
What does heteroscedasticity mean?


Question 2:
Do residuals need to be perfectly normal?


Question 3:
Which assumption is most affected by time-ordered data?


Mini Practice

A regression model shows a clear curve in the residual plot.

  • Which assumption is violated?
  • What might you consider doing?

What’s Next

In the next lesson, we will extend regression to Multiple Linear Regression, where more than one predictor is used.