Linear Regression
In earlier lessons, you learned how to compare groups and test whether differences exist.
In many practical situations, however, the goal is not just comparison, but prediction and explanation.
Linear Regression is used to model the relationship between a dependent variable and one or more independent variables.
What Is Linear Regression?
Linear regression examines how a numerical outcome changes as another variable changes.
It answers questions such as:
- How does sales change with advertising spend?
- How does salary change with years of experience?
- How does performance change with training hours?
In simple linear regression, one predictor variable is used.
The Regression Equation
The relationship is expressed as:
Y = a + bX
Where:
- Y → Dependent variable (outcome)
- X → Independent variable (predictor)
- a → Intercept
- b → Regression coefficient (slope)
The coefficient b indicates how much Y changes for a one-unit increase in X.
Example Dataset
Consider the relationship between study hours and exam score:
| Student_ID | Study_Hours | Score |
|---|---|---|
| 1801 | 2 | 55 |
| 1802 | 4 | 65 |
| 1803 | 6 | 78 |
| 1804 | 8 | 88 |
The objective is to predict exam score based on study hours.
Key Assumptions
Linear regression relies on several assumptions:
- Linear relationship between X and Y
- Normal distribution of residuals
- Constant variance (homoscedasticity)
- Independence of observations
Violations affect interpretation and prediction accuracy.
Running Linear Regression (Menu)
To perform regression using SPSS menus:
- Go to Analyze → Regression → Linear
- Move the dependent variable to Dependent
- Move the predictor to Independent(s)
- Click OK
SPSS produces model summary, ANOVA, and coefficient tables.
Using SPSS Syntax
REGRESSION
/DEPENDENT Score
/METHOD=ENTER Study_Hours.
This syntax predicts exam score using study hours.
Interpreting the Output
Focus on these key values:
- R Square – proportion of variance explained
- Regression coefficient (B) – effect size
- Sig. (p-value) – significance of predictor
Example interpretation:
- R² = 0.85 → 85% of score variation explained
- B = 4.2 → Each extra study hour increases score by ~4.2 points
- p < 0.05 → Predictor is statistically significant
Common Mistakes
Frequent errors include:
- Assuming causation from regression
- Ignoring assumption diagnostics
- Overinterpreting R²
Regression explains relationships, not guarantees causality.
Quiz 1
What is the purpose of linear regression?
To predict or explain a dependent variable.
Quiz 2
What does the regression coefficient represent?
Change in Y for a one-unit change in X.
Quiz 3
What does R² indicate?
Proportion of variance explained by the model.
Quiz 4
Which SPSS menu is used for regression?
Analyze → Regression → Linear.
Quiz 5
Does regression prove causation?
No.
Mini Practice
Collect data on:
- Advertising spend
- Monthly sales
Run a linear regression to predict sales from advertising spend and interpret the coefficients.
Use Analyze → Regression → Linear and interpret R² and coefficients.
What’s Next
In the next lesson, you will learn about Multiple Linear Regression, which uses more than one predictor variable.