Discriminant Analysis
In the previous lesson, you learned how cluster analysis groups observations without prior labels.
Discriminant Analysis is different. It is a supervised classification technique used when group membership is already known.
Its main goal is to classify observations into predefined groups based on predictor variables.
When Discriminant Analysis Is Used
Discriminant analysis is used when:
- The dependent variable is categorical (groups)
- Independent variables are numeric
- Groups are known in advance
Typical questions include:
- Can we classify customers as high or low risk?
- Can employees be classified as high or low performers?
- Can applicants be assigned to acceptance categories?
Cluster Analysis vs Discriminant Analysis
| Aspect | Cluster Analysis | Discriminant Analysis |
|---|---|---|
| Group labels | Unknown | Known |
| Type | Unsupervised | Supervised |
| Main goal | Discover groups | Predict group membership |
Basic Idea Behind Discriminant Analysis
Discriminant analysis creates discriminant functions, which are linear combinations of predictor variables.
These functions maximize the separation between groups.
Each observation is assigned to the group whose centroid it is closest to.
Example Scenario
A company classifies employees into:
- High Performers
- Average Performers
- Low Performers
Predictor variables include:
- Experience
- Training hours
- Performance score
Discriminant analysis builds a model to classify new employees into these performance groups.
Assumptions
Discriminant analysis assumes:
- Multivariate normality
- Equal covariance matrices across groups
- Independent observations
Violations affect classification accuracy.
Running Discriminant Analysis (Menu)
To perform discriminant analysis in SPSS:
- Go to Analyze → Classify → Discriminant
- Set the group variable
- Define group values
- Add predictor variables
- Click OK
SPSS produces classification tables and discriminant functions.
SPSS Syntax Example
DISCRIMINANT
/GROUPS=Performance(1 3)
/VARIABLES Experience Training_Hours Score
/ANALYSIS ALL.
Interpreting the Output
Key outputs include:
- Eigenvalues – discriminatory power
- Canonical correlation – strength of relationship
- Group centroids – group positions
- Classification accuracy
Higher classification accuracy indicates a better model.
Classification Results
SPSS provides a classification table showing:
- Correctly classified cases
- Misclassified cases
- Overall accuracy percentage
Cross-validation results are especially important for judging model performance.
Common Mistakes
Typical errors include:
- Using discriminant analysis with categorical predictors
- Ignoring assumption violations
- Over-trusting classification accuracy
Always validate results carefully.
Quiz 1
Is discriminant analysis supervised?
Yes.
Quiz 2
Are group labels known in advance?
Yes.
Quiz 3
What do discriminant functions do?
Maximize separation between groups.
Quiz 4
Which SPSS menu runs discriminant analysis?
Analyze → Classify → Discriminant.
Quiz 5
Is cross-validation important?
Yes.
Mini Practice
Create a dataset with a known group variable and at least three numeric predictors.
Run discriminant analysis and evaluate classification accuracy.
Focus on classification table and cross-validated accuracy.
What’s Next
In the next lesson, you will learn about Time Series Analysis, used for analyzing data collected over time.