ML Lesson 26 – Hierarchical Clustering | Dataplexa

Hierarchical Clustering

In the previous lesson, we studied K-Means clustering and learned how the algorithm groups data points by forcing them into a fixed number of clusters. While K-Means is simple and efficient, it has one major limitation.

The number of clusters must be decided in advance. In many real-world problems, we do not know how many natural groups exist in the data.

This is where Hierarchical Clustering becomes useful. Instead of fixing the number of clusters at the beginning, hierarchical clustering builds clusters step by step.


The Core Idea Behind Hierarchical Clustering

Hierarchical clustering creates a hierarchy of clusters. You can think of it as a family tree for data points.

At the lowest level, every data point is its own cluster. As we move upward, similar clusters are merged together.

This process continues until all data points belong to a single cluster.

The result is a structure called a dendrogram, which visually represents how clusters are formed.


Why Hierarchical Clustering Is Different

Unlike K-Means, hierarchical clustering does not require the number of clusters to be specified beforehand.

Instead, we analyze the dendrogram and decide where to cut it based on how much separation we want.

This gives us more flexibility and better interpretability, especially in exploratory data analysis.


Using Our Dataset

We continue using the same dataset to maintain continuity throughout the machine learning module.

Dataplexa ML Housing & Customer Dataset

For this lesson, we ignore the loan approval label and focus on discovering natural customer segments.


Preparing the Data

Hierarchical clustering is distance-based, so feature scaling is again essential.

import pandas as pd
from sklearn.preprocessing import StandardScaler
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("loan_approved", axis=1)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Building the Hierarchical Model

Hierarchical clustering works by computing distances between all pairs of data points.

The linkage method controls how distances between clusters are calculated.

linked = linkage(X_scaled, method="ward")

The Ward method tries to minimize the variance within each cluster, which works well for numerical data.


Visualizing the Dendrogram

The dendrogram helps us decide how many clusters make sense.

plt.figure(figsize=(10, 6))
dendrogram(linked)
plt.title("Dendrogram for Hierarchical Clustering")
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.show()

By observing large vertical gaps in the dendrogram, we can decide where to cut the tree.


Forming Clusters

Once we choose the cut level, we can assign each data point to a cluster.

from sklearn.cluster import AgglomerativeClustering

model = AgglomerativeClustering(n_clusters=3)
clusters = model.fit_predict(X_scaled)

df["cluster"] = clusters
df.head()

Real-World Interpretation

In banking, hierarchical clustering helps analysts understand customer behavior without assumptions.

It allows decision-makers to see small groups within large populations, which is useful for risk analysis and personalization.


Mini Practice

Try changing the linkage method from "ward" to "complete" and observe how the dendrogram changes.

This exercise shows how distance definitions affect clustering results.


Exercises

Exercise 1:
Why does hierarchical clustering not require K beforehand?

Because it builds a hierarchy of clusters and lets us decide the cut level later.

Exercise 2:
What is the purpose of a dendrogram?

It visually represents how clusters merge at different distances.

Quick Quiz

Q1. Is hierarchical clustering scalable to very large datasets?

No. It is computationally expensive for very large datasets.

In the next lesson, we will move into Dimensionality Reduction and understand why reducing features can improve model performance.