SPSS Lesson 34 – Cluster Analysis | Dataplexa

Cluster Analysis

In many analytical problems, the goal is not prediction or testing, but grouping similar observations.

Cluster Analysis is an unsupervised learning technique used to group cases so that observations within a cluster are similar to each other and different from those in other clusters.


Why Cluster Analysis Is Used

Cluster analysis helps answer questions such as:

  • Which customers have similar buying behavior?
  • Can employees be grouped by performance patterns?
  • Are there natural segments in the data?

Unlike regression or classification, cluster analysis has no target variable.


Key Idea Behind Clustering

Clustering is based on distance or similarity.

Observations that are close to each other (in terms of variable values) are placed in the same cluster.

The goal is to:

  • Maximize similarity within clusters
  • Maximize difference between clusters

Types of Cluster Analysis in SPSS

SPSS mainly supports:

  • Hierarchical Clustering
  • K-Means Clustering

Each method serves a different purpose.


Hierarchical Clustering

Hierarchical clustering:

  • Does not require pre-specifying number of clusters
  • Builds clusters step by step
  • Produces a dendrogram

It is useful for exploratory analysis and small datasets.


K-Means Clustering

K-Means clustering:

  • Requires specifying number of clusters (k)
  • Works well with large datasets
  • Minimizes within-cluster variance

It is commonly used in business applications.


Example Scenario

A retail company collects data on customers:

  • Annual income
  • Spending score

Cluster analysis can segment customers into groups such as:

  • High income – high spenders
  • High income – low spenders
  • Low income – low spenders

Preparing Data for Clustering

Before clustering:

  • Standardize variables (important)
  • Remove extreme outliers
  • Use numeric variables only

Standardization ensures all variables contribute equally.


Running K-Means Clustering (Menu)

To run K-Means in SPSS:

  • Go to Analyze → Classify → K-Means Cluster
  • Select variables
  • Specify number of clusters
  • Click OK

SPSS assigns a cluster number to each observation.


SPSS Syntax Example


QUICK CLUSTER Income Spending_Score
  /CRITERIA=CLUSTER(3)
  /METHOD=KMEANS.

Interpreting Cluster Output

When interpreting clusters:

  • Examine cluster centers (means)
  • Understand characteristics of each cluster
  • Assign meaningful labels

Clusters must be interpreted in business or research context.


Common Mistakes

Typical errors include:

  • Not standardizing variables
  • Choosing wrong number of clusters
  • Over-interpreting random clusters

Clustering is exploratory, not definitive.


Quiz 1

Is cluster analysis supervised or unsupervised?

Unsupervised.


Quiz 2

Does clustering require a dependent variable?

No.


Quiz 3

Which method produces a dendrogram?

Hierarchical clustering.


Quiz 4

Why should variables be standardized?

To ensure equal contribution of variables.


Quiz 5

Is clustering mainly exploratory?

Yes.


Mini Practice

Create a dataset with customer income and spending data.

Apply K-Means clustering with three clusters and describe each cluster.

Standardize variables first, then interpret cluster centers.


What’s Next

In the next lesson, you will learn about Discriminant Analysis, which is used to classify observations into predefined groups.