Classification | Dataplexa

Classification in R

Classification is a machine learning technique used to assign data into predefined categories or classes.

It answers questions like whether something belongs to one group or another based on patterns in the data.


What Is Classification?

Classification works by learning from labeled data, where the correct category is already known.

Once trained, the model can classify new, unseen data into one of the existing classes.


Real-World Examples of Classification

  • Email spam detection
  • Pass or fail prediction
  • Customer type identification
  • Risk assessment

Binary vs Multi-Class Classification

There are two main types of classification problems.

  • Binary Classification – Two possible outcomes (yes / no)
  • Multi-Class Classification – More than two categories

Preparing Data for Classification

For classification, the target variable must usually be a factor.

This tells R that the variable represents categories, not numbers.

data$label <- as.factor(data$label)
str(data)

Simple Classification Example

One common approach to classification in R uses logistic regression.

It predicts the probability of a data point belonging to a class.

model <- glm(label ~ feature1 + feature2,
             data = data,
             family = binomial)
summary(model)

Making Classification Predictions

The model outputs probabilities between 0 and 1.

These probabilities can be converted into class labels.

predicted_prob <- predict(model, data, type = "response")
predicted_class <- ifelse(predicted_prob > 0.5, "Yes", "No")
predicted_class

Evaluating a Classification Model

Model evaluation helps us understand how well the classifier performs.

Common evaluation metrics include accuracy, precision, and recall.


Confusion Matrix

A confusion matrix compares predicted classes with actual classes.

It shows correct and incorrect predictions clearly.

table(Actual = data$label,
      Predicted = predicted_class)

Why Classification Matters

  • Helps automate decision-making
  • Used in many real-world applications
  • Core concept in machine learning
  • Foundation for advanced models

📝 Practice Exercises


Exercise 1

Explain classification in simple words.

Exercise 2

Convert a target variable into a factor.

Exercise 3

Create a logistic regression classification model.

Exercise 4

Generate predicted class labels.


✅ Practice Answers


Answer 1

Classification assigns data into categories based on patterns learned from labeled examples.

Answer 2

data$target <- as.factor(data$target)

Answer 3

model <- glm(target ~ x1 + x2,
             data = data,
             family = binomial)

Answer 4

pred <- predict(model, data, type = "response")
ifelse(pred > 0.5, "Class1", "Class2")

What’s Next?

In the next lesson, you will learn about Clustering in R, which focuses on grouping data without predefined labels.