Logistic Regression
So far, you have learned regression techniques used to predict numerical outcomes.
However, many real-world problems involve categorical outcomes, such as Yes/No, Pass/Fail, Buy/Not Buy.
Logistic Regression is used when the dependent variable is binary or categorical.
Why Linear Regression Is Not Suitable
Linear regression assumes the outcome variable can take any numerical value.
For binary outcomes:
- Predictions must stay between 0 and 1
- Relationships are not linear
Logistic regression solves this problem by modeling probabilities instead of raw values.
Understanding Probability and Odds
Logistic regression works with probabilities and odds.
Probability ranges from 0 to 1, while odds compare the chance of success to the chance of failure.
Example:
- Probability = 0.75
- Odds = 0.75 / 0.25 = 3
This means the event is three times more likely to occur than not.
The Logistic Regression Model
Instead of a straight line, logistic regression uses an S-shaped curve.
It models the logit:
log(p / (1 − p)) = a + bX
Where:
- p → probability of success
- a → intercept
- b → coefficient
- X → predictor
Coefficients are interpreted using odds ratios.
Example Dataset
Consider customer purchase behavior:
| Customer_ID | Income | Purchased |
|---|---|---|
| 1901 | 30000 | 0 |
| 1902 | 45000 | 1 |
| 1903 | 60000 | 1 |
| 1904 | 25000 | 0 |
Here:
- Purchased = 1 → customer bought the product
- Purchased = 0 → customer did not buy
When to Use Logistic Regression
Logistic regression is appropriate when:
- Dependent variable is binary
- Independent variables are numeric or categorical
- You want probability-based predictions
Common use cases include:
- Customer churn prediction
- Credit approval decisions
- Employee attrition analysis
Running Logistic Regression (Menu)
To run logistic regression in SPSS:
- Go to Analyze → Regression → Binary Logistic
- Move the binary variable to Dependent
- Move predictors to Covariates
- Click OK
SPSS produces classification tables and coefficient estimates.
Using SPSS Syntax
LOGISTIC REGRESSION VARIABLES Purchased
/METHOD=ENTER Income.
This model predicts purchase probability based on income.
Interpreting the Output
Key elements to interpret:
- B – logistic coefficient
- Exp(B) – odds ratio
- Sig. – significance
Example interpretation:
- Exp(B) = 1.5 → odds increase by 50% per unit increase
- p < 0.05 → predictor is statistically significant
Common Mistakes
Typical errors include:
- Using linear regression for binary outcomes
- Misinterpreting odds ratios as probabilities
- Ignoring model fit statistics
Logistic regression requires careful interpretation.
Quiz 1
When is logistic regression used?
When the dependent variable is binary.
Quiz 2
What does Exp(B) represent?
Odds ratio.
Quiz 3
Why is linear regression unsuitable here?
Predictions can fall outside 0–1 range.
Quiz 4
Which SPSS menu runs logistic regression?
Analyze → Regression → Binary Logistic.
Quiz 5
Does logistic regression predict probabilities?
Yes.
Mini Practice
Create a dataset with:
- Employee_Age
- Years_Experience
- Promotion (Yes/No)
Use logistic regression to predict promotion likelihood.
Use Binary Logistic Regression and interpret odds ratios.
What’s Next
In the next lesson, you will learn about Data Transformations, used to improve model assumptions and interpretability.