Clustering| Dataplexa

Clustering in R

Clustering is an unsupervised machine learning technique used to group similar data points together.

Unlike classification, clustering does not use labeled data. Instead, it discovers patterns automatically.


What Is Clustering?

Clustering organizes data into groups called clusters.

Items within the same cluster are more similar to each other than to items in other clusters.


Where Is Clustering Used?

  • Customer segmentation
  • Grouping similar documents
  • Image and pattern recognition
  • Exploratory data analysis

Common Clustering Methods

R supports several clustering techniques.

  • K-Means Clustering
  • Hierarchical Clustering

K-Means Clustering

K-means is one of the most popular clustering algorithms.

It divides data into a fixed number of clusters by minimizing distance within each cluster.


Preparing Data for Clustering

Clustering works best with numeric data.

Scaling data ensures that all variables contribute equally.

data_scaled <- scale(data)

Applying K-Means in R

You must specify the number of clusters (k).

The algorithm then groups data accordingly.

set.seed(123)
kmeans_result <- kmeans(data_scaled, centers = 3)
kmeans_result

Cluster Assignments

Each data point is assigned to a cluster.

This helps analyze group behavior and patterns.

kmeans_result$cluster

Visualizing Clusters

Visualization makes clusters easier to understand.

Scatter plots are commonly used for this purpose.

plot(data_scaled,
     col = kmeans_result$cluster,
     main = "K-Means Clustering")

Hierarchical Clustering

Hierarchical clustering builds clusters step by step.

It does not require specifying the number of clusters in advance.

dist_matrix <- dist(data_scaled)
hc <- hclust(dist_matrix)
plot(hc)

Why Clustering Matters

  • Reveals hidden patterns
  • Supports data-driven decisions
  • Used in many analytics tasks
  • Foundation for segmentation models

📝 Practice Exercises


Exercise 1

Explain clustering in your own words.

Exercise 2

Scale a numeric dataset.

Exercise 3

Apply K-means clustering with 3 clusters.

Exercise 4

Plot clustered data points.


✅ Practice Answers


Answer 1

Clustering groups similar data points together without using predefined labels.

Answer 2

scaled_data <- scale(data)

Answer 3

kmeans(data_scaled, centers = 3)

Answer 4

plot(data_scaled,
     col = kmeans_result$cluster)

What’s Next?

In the next lesson, you will explore Dimensionality Reduction, which helps simplify complex datasets.