Hierarchical Clustering
In the previous lesson, we studied K-Means clustering and learned how the algorithm groups data points by forcing them into a fixed number of clusters. While K-Means is simple and efficient, it has one major limitation.
The number of clusters must be decided in advance. In many real-world problems, we do not know how many natural groups exist in the data.
This is where Hierarchical Clustering becomes useful. Instead of fixing the number of clusters at the beginning, hierarchical clustering builds clusters step by step.
The Core Idea Behind Hierarchical Clustering
Hierarchical clustering creates a hierarchy of clusters. You can think of it as a family tree for data points.
At the lowest level, every data point is its own cluster. As we move upward, similar clusters are merged together.
This process continues until all data points belong to a single cluster.
The result is a structure called a dendrogram, which visually represents how clusters are formed.
Why Hierarchical Clustering Is Different
Unlike K-Means, hierarchical clustering does not require the number of clusters to be specified beforehand.
Instead, we analyze the dendrogram and decide where to cut it based on how much separation we want.
This gives us more flexibility and better interpretability, especially in exploratory data analysis.
Using Our Dataset
We continue using the same dataset to maintain continuity throughout the machine learning module.
Dataplexa ML Housing & Customer Dataset
For this lesson, we ignore the loan approval label and focus on discovering natural customer segments.
Preparing the Data
Hierarchical clustering is distance-based, so feature scaling is again essential.
import pandas as pd
from sklearn.preprocessing import StandardScaler
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Building the Hierarchical Model
Hierarchical clustering works by computing distances between all pairs of data points.
The linkage method controls how distances between clusters are calculated.
linked = linkage(X_scaled, method="ward")
The Ward method tries to minimize the variance within each cluster, which works well for numerical data.
Visualizing the Dendrogram
The dendrogram helps us decide how many clusters make sense.
plt.figure(figsize=(10, 6))
dendrogram(linked)
plt.title("Dendrogram for Hierarchical Clustering")
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.show()
By observing large vertical gaps in the dendrogram, we can decide where to cut the tree.
Forming Clusters
Once we choose the cut level, we can assign each data point to a cluster.
from sklearn.cluster import AgglomerativeClustering
model = AgglomerativeClustering(n_clusters=3)
clusters = model.fit_predict(X_scaled)
df["cluster"] = clusters
df.head()
Real-World Interpretation
In banking, hierarchical clustering helps analysts understand customer behavior without assumptions.
It allows decision-makers to see small groups within large populations, which is useful for risk analysis and personalization.
Mini Practice
Try changing the linkage method from "ward" to "complete" and observe how the dendrogram changes.
This exercise shows how distance definitions affect clustering results.
Exercises
Exercise 1:
Why does hierarchical clustering not require K beforehand?
Exercise 2:
What is the purpose of a dendrogram?
Quick Quiz
Q1. Is hierarchical clustering scalable to very large datasets?
In the next lesson, we will move into Dimensionality Reduction and understand why reducing features can improve model performance.