AI Lesson 53 – Positional Encoding | Dataplexa

Hierarchical Clustering

Hierarchical Clustering is an unsupervised learning technique that groups data by building a hierarchy of clusters. Instead of deciding the number of clusters in advance, this method creates clusters step by step and shows their relationships in a tree-like structure called a dendrogram.

This approach is useful when you want to understand how data naturally forms groups at different levels, rather than forcing a fixed number of clusters.

Real-World Connection

Think about organizing files on your computer. You may first group them into broad folders like “Work” and “Personal”, then further divide them into subfolders. Hierarchical clustering works in a similar way, grouping data from broad to specific levels.

Types of Hierarchical Clustering

  • Agglomerative: Bottom-up approach (most commonly used)
  • Divisive: Top-down approach

In practice, agglomerative clustering is used more often because it is simpler and more efficient.

How Agglomerative Clustering Works

  • Each data point starts as its own cluster
  • The two closest clusters are merged
  • This process repeats until one cluster remains

Distance Measurement Methods

Hierarchical clustering relies on distance calculations. Common methods include:

  • Euclidean distance
  • Manhattan distance
  • Cosine similarity

Linkage Methods

Linkage defines how the distance between clusters is calculated:

  • Single linkage: Closest points between clusters
  • Complete linkage: Farthest points between clusters
  • Average linkage: Average distance
  • Ward’s method: Minimizes variance within clusters

Hierarchical Clustering Example (Python)


from sklearn.cluster import AgglomerativeClustering
import numpy as np

X = np.array([
    [1, 2],
    [1, 4],
    [1, 0],
    [10, 2],
    [10, 4],
    [10, 0]
])

model = AgglomerativeClustering(n_clusters=2)
labels = model.fit_predict(X)

print(labels)
  
[1 1 1 0 0 0]

Understanding the Output

Each number represents the cluster assignment for a data point. Points with similar values are grouped together based on distance and linkage method.

Dendrogram Visualization

A dendrogram visually shows how clusters are merged at each step. The height of the merge indicates the distance between clusters.

When to Use Hierarchical Clustering

  • When the number of clusters is unknown
  • When data relationships matter
  • When interpretability is important

Limitations

  • Computationally expensive for large datasets
  • Once clusters merge, they cannot be undone
  • Sensitive to noise and outliers

Practice Questions

Practice 1: Which hierarchical approach is most commonly used?



Practice 2: What diagram represents hierarchical clusters?



Practice 3: What defines how clusters are merged?



Quick Quiz

Quiz 1: Hierarchical clustering creates which type of structure?





Quiz 2: Which linkage method minimizes variance?





Quiz 3: Hierarchical clustering belongs to which category?





Coming up next: Dimensionality Reduction — reducing features without losing meaning.