Classification | Dataplexa

Classification in R

Classification is a machine learning technique used to assign data into predefined categories or classes.

It answers questions like whether something belongs to one group or another based on patterns in the data.

What Is Classification?

Classification works by learning from labeled data, where the correct category is already known.

Once trained, the model can classify new, unseen data into one of the existing classes.

There are two main types of classification problems.

For classification, the target variable must usually be a factor.

This tells R that the variable represents categories, not numbers.

data$label <- as.factor(data$label)
str(data)

One common approach to classification in R uses logistic regression.

It predicts the probability of a data point belonging to a class.

model <- glm(label ~ feature1 + feature2,
             data = data,
             family = binomial)
summary(model)

The model outputs probabilities between 0 and 1.

These probabilities can be converted into class labels.

predicted_prob <- predict(model, data, type = "response")
predicted_class <- ifelse(predicted_prob > 0.5, "Yes", "No")
predicted_class

Model evaluation helps us understand how well the classifier performs.

Common evaluation metrics include accuracy, precision, and recall.

A confusion matrix compares predicted classes with actual classes.

It shows correct and incorrect predictions clearly.

table(Actual = data$label,
      Predicted = predicted_class)

Explain classification in simple words.

Convert a target variable into a factor.

Create a logistic regression classification model.

Generate predicted class labels.

Classification assigns data into categories based on patterns learned from labeled examples.

data$target <- as.factor(data$target)

model <- glm(target ~ x1 + x2,
             data = data,
             family = binomial)

pred <- predict(model, data, type = "response")
ifelse(pred > 0.5, "Class1", "Class2")

In the next lesson, you will learn about Clustering in R, which focuses on grouping data without predefined labels.