Model Diagnostics
Building a statistical model does not end with estimating coefficients.
Model diagnostics are used to check whether a model is reliable, valid, and appropriate for interpretation.
A model that violates assumptions can lead to incorrect conclusions, even if results appear significant.
Why Model Diagnostics Are Important
Statistical models are based on assumptions.
Diagnostics help answer questions such as:
- Are the residuals normally distributed?
- Is variance constant?
- Are observations independent?
- Are there influential outliers?
Ignoring diagnostics is one of the most common mistakes in data analysis.
What Are Residuals?
A residual is the difference between an observed value and the value predicted by the model.
Residual = Observed − Predicted
Residuals capture what the model fails to explain.
Key Diagnostic Checks
Most regression diagnostics focus on:
- Normality of residuals
- Homoscedasticity (constant variance)
- Independence of errors
- Outliers and influential points
Checking Normality of Residuals
Normal residuals indicate that parameter estimates are reliable.
In SPSS, normality can be checked using:
- Histogram of residuals
- Normal Q–Q plot
A roughly bell-shaped histogram suggests normality.
Checking Homoscedasticity
Homoscedasticity means residual variance is constant across predicted values.
Violation leads to unreliable standard errors.
In SPSS:
- Plot residuals vs predicted values
A random scatter indicates homoscedasticity.
Detecting Outliers and Influential Points
Outliers can distort model estimates.
Common diagnostic measures include:
- Standardized residuals
- Cook’s distance
- Leverage values
Large values indicate potentially influential cases.
Running Diagnostics in SPSS (Menu)
To generate diagnostic plots:
- Go to Analyze → Regression → Linear
- Click Plots
- Select residuals and predicted values
- Click OK
SPSS displays diagnostic plots in the output viewer.
SPSS Syntax Example
REGRESSION
/DEPENDENT Sales
/METHOD=ENTER Advertising
/PLOT SCATTER(ZRESID,ZPRED)
/SAVE ZRESID.
Interpreting Diagnostic Results
When diagnostics are acceptable:
- Residuals are centered around zero
- No clear pattern in residual plots
- No extreme influential cases
If diagnostics fail:
- Consider transformations
- Remove problematic outliers
- Use alternative models
Common Mistakes
Frequent errors include:
- Ignoring diagnostic plots
- Trusting p-values blindly
- Removing outliers without justification
Diagnostics should guide decisions, not be ignored.
Quiz 1
What is a residual?
Observed value minus predicted value.
Quiz 2
Why check residual normality?
To ensure reliable parameter estimates.
Quiz 3
What does homoscedasticity mean?
Constant variance of residuals.
Quiz 4
Which plot checks homoscedasticity?
Residuals vs predicted values plot.
Quiz 5
Should diagnostics be skipped if results look good?
No.
Mini Practice
Run a linear regression model using any dataset.
Generate residual plots and evaluate whether model assumptions are satisfied.
Use plots for residuals, check randomness, and look for outliers.
What’s Next
In the next lesson, you will learn about Advanced Charts, used to communicate results more effectively.