AI Course
K-Means Clustering
K-Means is an unsupervised machine learning algorithm used to group data into clusters. Unlike supervised learning, K-Means does not use labeled data. Instead, it finds patterns and similarities within the data and groups similar points together.
The goal of K-Means is simple: data points within the same cluster should be as similar as possible, while data points in different clusters should be as different as possible.
Real-World Connection
Imagine an online shopping website grouping customers based on their buying behavior. Customers who buy similar products are grouped together. These clusters help businesses design personalized offers and recommendations. K-Means performs this grouping automatically using data.
Where K-Means Is Used
- Customer segmentation
- Market research
- Image segmentation
- Document clustering
- Recommendation systems
How K-Means Works
The algorithm follows an iterative process:
- Select the number of clusters (K)
- Randomly initialize K centroids
- Assign each data point to the nearest centroid
- Recalculate centroids based on assignments
- Repeat until centroids stop changing
K-Means Example Using Python
from sklearn.cluster import KMeans
import numpy as np
X = np.array([
[1, 2],
[1, 4],
[1, 0],
[10, 2],
[10, 4],
[10, 0]
])
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)
print(kmeans.labels_)
print(kmeans.cluster_centers_)
Understanding the Output
The labels show which cluster each data point belongs to. Points with similar values are grouped together. The cluster centers represent the average position of points in each cluster.
Choosing the Value of K
Selecting the correct number of clusters is important. A common technique is the Elbow Method, where we plot the error against different values of K and look for a point where improvement slows down.
Limitations of K-Means
- You must choose K in advance
- Works best with spherical clusters
- Sensitive to outliers
- Results may change based on initialization
Practice Questions
Practice 1: K-Means belongs to which type of learning?
Practice 2: What represents the center of a cluster?
Practice 3: What does K represent in K-Means?
Quick Quiz
Quiz 1: What is the main goal of K-Means?
Quiz 2: Which method helps choose the value of K?
Quiz 3: K-Means improves clusters using which process?
Coming up next: Hierarchical Clustering — building clusters step by step.