Statistics Lesson 35 – Multiple Regression | Dataplexa

Multiple Linear Regression Overview

In the previous lessons, we worked with simple linear regression, which uses only one predictor.

In real-world problems, outcomes are rarely influenced by just a single factor.

Multiple linear regression extends regression by using two or more independent variables to explain and predict an outcome.


What Is Multiple Linear Regression?

Multiple linear regression models the relationship between:

  • One dependent variable (Y)
  • Multiple independent variables (X₁, X₂, X₃, …)

It helps us understand how each predictor contributes to Y while holding other variables constant.


The General Regression Equation

The multiple regression model is written as:

Y = a + b₁X₁ + b₂X₂ + b₃X₃ + …

  • a = intercept
  • b₁, b₂, b₃ = coefficients for each predictor
  • X₁, X₂, X₃ = independent variables

Why Multiple Regression Is Powerful

  • Accounts for multiple influencing factors
  • Reduces misleading conclusions
  • Improves prediction accuracy
  • Allows control of confounding variables

Real-World Example

Suppose we want to predict house prices.

Possible predictors include:

  • Size of the house
  • Number of bedrooms
  • Location score
  • Age of the property

A multiple regression model can estimate the effect of each factor while accounting for the others.


Interpreting Coefficients

Each coefficient represents the expected change in Y for a one-unit increase in that variable, keeping all other variables constant.

This “holding others constant” idea is critical and often misunderstood.


Interpretation Example

Consider the model:

Price = 50,000 + 200(Size) + 15,000(Bedrooms)

  • Increasing size by one unit increases price by 200
  • Adding one bedroom increases price by 15,000

Each interpretation assumes the other variable stays unchanged.


Assumptions of Multiple Linear Regression

The same core assumptions apply as in simple regression:

  • Linearity
  • Independence
  • Homoscedasticity
  • Normality of residuals

Additionally, we must consider:

  • Multicollinearity between predictors

Multicollinearity (Introduction)

Multicollinearity occurs when independent variables are highly correlated with each other.

This can:

  • Distort coefficient estimates
  • Increase standard errors
  • Make interpretation difficult

We will study this in more detail in advanced lessons.


Multiple vs Simple Regression

Aspect Simple Regression Multiple Regression
Number of predictors One Two or more
Real-world realism Limited High
Interpretation Direct Conditional

Common Mistakes

  • Interpreting coefficients without “holding others constant”
  • Ignoring multicollinearity
  • Adding too many predictors blindly
  • Assuming higher R² always means a better model

Quick Check

What does a coefficient represent in multiple regression?


Practice Quiz

Question 1:
Why is multiple regression more realistic than simple regression?


Question 2:
What problem occurs when predictors are highly correlated?


Question 3:
Does adding more predictors always improve a model?


Mini Practice

A company models employee performance using:

  • Experience (years)
  • Training hours
  • Education level

Why might multiple regression be preferred here over simple regression?


What’s Next

In the next lesson, we will move into Chi-Square Tests, which analyze relationships between categorical variables.