SPSS Lesson 35 – Discriminant Analysis | Dataplexa

Discriminant Analysis

In the previous lesson, you learned how cluster analysis groups observations without prior labels.

Discriminant Analysis is different. It is a supervised classification technique used when group membership is already known.

Its main goal is to classify observations into predefined groups based on predictor variables.


When Discriminant Analysis Is Used

Discriminant analysis is used when:

  • The dependent variable is categorical (groups)
  • Independent variables are numeric
  • Groups are known in advance

Typical questions include:

  • Can we classify customers as high or low risk?
  • Can employees be classified as high or low performers?
  • Can applicants be assigned to acceptance categories?

Cluster Analysis vs Discriminant Analysis

Aspect Cluster Analysis Discriminant Analysis
Group labels Unknown Known
Type Unsupervised Supervised
Main goal Discover groups Predict group membership

Basic Idea Behind Discriminant Analysis

Discriminant analysis creates discriminant functions, which are linear combinations of predictor variables.

These functions maximize the separation between groups.

Each observation is assigned to the group whose centroid it is closest to.


Example Scenario

A company classifies employees into:

  • High Performers
  • Average Performers
  • Low Performers

Predictor variables include:

  • Experience
  • Training hours
  • Performance score

Discriminant analysis builds a model to classify new employees into these performance groups.


Assumptions

Discriminant analysis assumes:

  • Multivariate normality
  • Equal covariance matrices across groups
  • Independent observations

Violations affect classification accuracy.


Running Discriminant Analysis (Menu)

To perform discriminant analysis in SPSS:

  • Go to Analyze → Classify → Discriminant
  • Set the group variable
  • Define group values
  • Add predictor variables
  • Click OK

SPSS produces classification tables and discriminant functions.


SPSS Syntax Example


DISCRIMINANT
  /GROUPS=Performance(1 3)
  /VARIABLES Experience Training_Hours Score
  /ANALYSIS ALL.

Interpreting the Output

Key outputs include:

  • Eigenvalues – discriminatory power
  • Canonical correlation – strength of relationship
  • Group centroids – group positions
  • Classification accuracy

Higher classification accuracy indicates a better model.


Classification Results

SPSS provides a classification table showing:

  • Correctly classified cases
  • Misclassified cases
  • Overall accuracy percentage

Cross-validation results are especially important for judging model performance.


Common Mistakes

Typical errors include:

  • Using discriminant analysis with categorical predictors
  • Ignoring assumption violations
  • Over-trusting classification accuracy

Always validate results carefully.


Quiz 1

Is discriminant analysis supervised?

Yes.


Quiz 2

Are group labels known in advance?

Yes.


Quiz 3

What do discriminant functions do?

Maximize separation between groups.


Quiz 4

Which SPSS menu runs discriminant analysis?

Analyze → Classify → Discriminant.


Quiz 5

Is cross-validation important?

Yes.


Mini Practice

Create a dataset with a known group variable and at least three numeric predictors.

Run discriminant analysis and evaluate classification accuracy.

Focus on classification table and cross-validated accuracy.


What’s Next

In the next lesson, you will learn about Time Series Analysis, used for analyzing data collected over time.