R Lesson 9 – Factors in R | Dataplexa

Factors in R

In this lesson, you will learn about factors in R. Factors are used to store and work with categorical data.

Categorical data represents values that belong to fixed groups or categories, such as levels, types, or classes.


What Is a Factor?

A factor is a special data structure in R used to represent categorical values.

Instead of treating data as plain text, factors store values as categories called levels. This makes factors very useful for analysis and statistical modeling.


Why Factors Are Important

Factors help R understand that certain values are categories, not regular text.

They are widely used in data analysis, statistics, plotting, and machine learning models.


Creating a Factor

Factors are created using the factor() function.

When you create a factor, R automatically identifies the unique categories and stores them as levels.

status <- factor(c("Open", "Closed", "Open", "Pending"))
status

Here, R converts the text values into a factor with defined levels.


Viewing Factor Levels

You can view all the categories of a factor using the levels() function.

This helps you understand how R internally stores categorical data.

levels(status)

Levels represent the possible categories in the factor.


Checking Factor Structure

The str() function shows how a factor is stored internally.

It displays both the number of elements and the defined levels.

str(status)

This view is useful when debugging or exploring datasets.


Creating Factors with Custom Levels

Sometimes you want categories to follow a specific order.

You can manually define the levels while creating a factor.

priority <- factor(
  c("Low", "High", "Medium"),
  levels = c("Low", "Medium", "High")
)

priority

This ensures the categories follow a meaningful order.


Ordered Factors

An ordered factor is used when the order of categories matters.

This is common in rankings, ratings, and performance levels.

rating <- factor(
  c("Good", "Excellent", "Average"),
  levels = c("Average", "Good", "Excellent"),
  ordered = TRUE
)

rating

Ordered factors allow comparison between categories.


Converting Factors to Characters

Sometimes you may need to convert a factor back into text.

This is done using the as.character() function.

as.character(status)

This removes the categorical structure and returns plain text values.


Converting Factors to Numeric

Direct conversion of factors to numbers can cause incorrect results.

Always convert factors to characters first, then to numeric values.

numbers <- factor(c("10", "20", "30"))
as.numeric(as.character(numbers))

This ensures accurate numeric conversion.


Common Use Cases for Factors

Factors are commonly used to represent categories such as:

  • Status values
  • Groups or classifications
  • Ratings and rankings
  • Labels in charts and graphs

They make analysis more meaningful and structured.


📝 Practice Exercises


Exercise 1

Create a factor that stores three categories: "Small", "Medium", and "Large".

Exercise 2

Display the levels of a factor.

Exercise 3

Create an ordered factor representing "Beginner", "Intermediate", and "Advanced".

Exercise 4

Convert a factor into numeric values safely.


✅ Practice Answers


Answer 1

size <- factor(c("Small", "Medium", "Large"))
size

Answer 2

levels(size)

Answer 3

level <- factor(
  c("Beginner", "Advanced", "Intermediate"),
  levels = c("Beginner", "Intermediate", "Advanced"),
  ordered = TRUE
)

level

Answer 4

values <- factor(c("5", "10", "15"))
as.numeric(as.character(values))

What’s Next?

Now that you understand factors, the next lesson will focus on data frames.

Data frames are the most commonly used structure for real-world data analysis in R.