Clustering| Dataplexa

Clustering in R

Clustering is an unsupervised machine learning technique used to group similar data points together.

Unlike classification, clustering does not use labeled data. Instead, it discovers patterns automatically.

What Is Clustering?

Clustering organizes data into groups called clusters.

Items within the same cluster are more similar to each other than to items in other clusters.

Where Is Clustering Used?

Customer segmentation
Grouping similar documents
Image and pattern recognition
Exploratory data analysis

Common Clustering Methods

R supports several clustering techniques.

K-Means Clustering
Hierarchical Clustering

K-Means Clustering

K-means is one of the most popular clustering algorithms.

It divides data into a fixed number of clusters by minimizing distance within each cluster.

Preparing Data for Clustering

Clustering works best with numeric data.

Scaling data ensures that all variables contribute equally.

data_scaled <- scale(data)

Applying K-Means in R

You must specify the number of clusters (k).

The algorithm then groups data accordingly.

set.seed(123)
kmeans_result <- kmeans(data_scaled, centers = 3)
kmeans_result

Cluster Assignments

Each data point is assigned to a cluster.

This helps analyze group behavior and patterns.

kmeans_result$cluster

Visualizing Clusters

Visualization makes clusters easier to understand.

Scatter plots are commonly used for this purpose.

plot(data_scaled,
     col = kmeans_result$cluster,
     main = "K-Means Clustering")

Hierarchical Clustering

Hierarchical clustering builds clusters step by step.

It does not require specifying the number of clusters in advance.

dist_matrix <- dist(data_scaled)
hc <- hclust(dist_matrix)
plot(hc)

Why Clustering Matters

Reveals hidden patterns
Supports data-driven decisions
Used in many analytics tasks
Foundation for segmentation models

📝 Practice Exercises

Exercise 1

Explain clustering in your own words.

Exercise 2

Scale a numeric dataset.

Exercise 3

Apply K-means clustering with 3 clusters.

Exercise 4

Plot clustered data points.

✅ Practice Answers

Answer 1

Clustering groups similar data points together without using predefined labels.

Answer 2

scaled_data <- scale(data)

Answer 3

kmeans(data_scaled, centers = 3)

Answer 4

plot(data_scaled,
     col = kmeans_result$cluster)

What’s Next?

In the next lesson, you will explore Dimensionality Reduction, which helps simplify complex datasets.

← Previous Lesson R Index Next ➜