Multiple Linear Regression Overview
In the previous lessons, we worked with simple linear regression, which uses only one predictor.
In real-world problems, outcomes are rarely influenced by just a single factor.
Multiple linear regression extends regression by using two or more independent variables to explain and predict an outcome.
What Is Multiple Linear Regression?
Multiple linear regression models the relationship between:
- One dependent variable (Y)
- Multiple independent variables (X₁, X₂, X₃, …)
It helps us understand how each predictor contributes to Y while holding other variables constant.
The General Regression Equation
The multiple regression model is written as:
Y = a + b₁X₁ + b₂X₂ + b₃X₃ + …
- a = intercept
- b₁, b₂, b₃ = coefficients for each predictor
- X₁, X₂, X₃ = independent variables
Why Multiple Regression Is Powerful
- Accounts for multiple influencing factors
- Reduces misleading conclusions
- Improves prediction accuracy
- Allows control of confounding variables
Real-World Example
Suppose we want to predict house prices.
Possible predictors include:
- Size of the house
- Number of bedrooms
- Location score
- Age of the property
A multiple regression model can estimate the effect of each factor while accounting for the others.
Interpreting Coefficients
Each coefficient represents the expected change in Y for a one-unit increase in that variable, keeping all other variables constant.
This “holding others constant” idea is critical and often misunderstood.
Interpretation Example
Consider the model:
Price = 50,000 + 200(Size) + 15,000(Bedrooms)
- Increasing size by one unit increases price by 200
- Adding one bedroom increases price by 15,000
Each interpretation assumes the other variable stays unchanged.
Assumptions of Multiple Linear Regression
The same core assumptions apply as in simple regression:
- Linearity
- Independence
- Homoscedasticity
- Normality of residuals
Additionally, we must consider:
- Multicollinearity between predictors
Multicollinearity (Introduction)
Multicollinearity occurs when independent variables are highly correlated with each other.
This can:
- Distort coefficient estimates
- Increase standard errors
- Make interpretation difficult
We will study this in more detail in advanced lessons.
Multiple vs Simple Regression
| Aspect | Simple Regression | Multiple Regression |
|---|---|---|
| Number of predictors | One | Two or more |
| Real-world realism | Limited | High |
| Interpretation | Direct | Conditional |
Common Mistakes
- Interpreting coefficients without “holding others constant”
- Ignoring multicollinearity
- Adding too many predictors blindly
- Assuming higher R² always means a better model
Quick Check
What does a coefficient represent in multiple regression?
The effect of a predictor on Y while holding other variables constant.
Practice Quiz
Question 1:
Why is multiple regression more realistic than simple regression?
Because real-world outcomes are influenced by multiple factors.
Question 2:
What problem occurs when predictors are highly correlated?
Multicollinearity.
Question 3:
Does adding more predictors always improve a model?
No. It can cause overfitting and interpretation issues.
Mini Practice
A company models employee performance using:
- Experience (years)
- Training hours
- Education level
Why might multiple regression be preferred here over simple regression?
Because performance depends on multiple factors that must be considered together.
What’s Next
In the next lesson, we will move into Chi-Square Tests, which analyze relationships between categorical variables.