Factors in R
In this lesson, you will learn about factors in R. Factors are used to store and work with categorical data.
Categorical data represents values that belong to fixed groups or categories, such as levels, types, or classes.
What Is a Factor?
A factor is a special data structure in R used to represent categorical values.
Instead of treating data as plain text, factors store values as categories called levels. This makes factors very useful for analysis and statistical modeling.
Why Factors Are Important
Factors help R understand that certain values are categories, not regular text.
They are widely used in data analysis, statistics, plotting, and machine learning models.
Creating a Factor
Factors are created using the factor() function.
When you create a factor, R automatically identifies the unique categories and stores them as levels.
status <- factor(c("Open", "Closed", "Open", "Pending"))
status
Here, R converts the text values into a factor with defined levels.
Viewing Factor Levels
You can view all the categories of a factor using the levels() function.
This helps you understand how R internally stores categorical data.
levels(status)
Levels represent the possible categories in the factor.
Checking Factor Structure
The str() function shows how a factor is stored internally.
It displays both the number of elements and the defined levels.
str(status)
This view is useful when debugging or exploring datasets.
Creating Factors with Custom Levels
Sometimes you want categories to follow a specific order.
You can manually define the levels while creating a factor.
priority <- factor(
c("Low", "High", "Medium"),
levels = c("Low", "Medium", "High")
)
priority
This ensures the categories follow a meaningful order.
Ordered Factors
An ordered factor is used when the order of categories matters.
This is common in rankings, ratings, and performance levels.
rating <- factor(
c("Good", "Excellent", "Average"),
levels = c("Average", "Good", "Excellent"),
ordered = TRUE
)
rating
Ordered factors allow comparison between categories.
Converting Factors to Characters
Sometimes you may need to convert a factor back into text.
This is done using the as.character() function.
as.character(status)
This removes the categorical structure and returns plain text values.
Converting Factors to Numeric
Direct conversion of factors to numbers can cause incorrect results.
Always convert factors to characters first, then to numeric values.
numbers <- factor(c("10", "20", "30"))
as.numeric(as.character(numbers))
This ensures accurate numeric conversion.
Common Use Cases for Factors
Factors are commonly used to represent categories such as:
- Status values
- Groups or classifications
- Ratings and rankings
- Labels in charts and graphs
They make analysis more meaningful and structured.
📝 Practice Exercises
Exercise 1
Create a factor that stores three categories: "Small", "Medium", and "Large".
Exercise 2
Display the levels of a factor.
Exercise 3
Create an ordered factor representing "Beginner", "Intermediate", and "Advanced".
Exercise 4
Convert a factor into numeric values safely.
✅ Practice Answers
Answer 1
size <- factor(c("Small", "Medium", "Large"))
size
Answer 2
levels(size)
Answer 3
level <- factor(
c("Beginner", "Advanced", "Intermediate"),
levels = c("Beginner", "Intermediate", "Advanced"),
ordered = TRUE
)
level
Answer 4
values <- factor(c("5", "10", "15"))
as.numeric(as.character(values))
What’s Next?
Now that you understand factors, the next lesson will focus on data frames.
Data frames are the most commonly used structure for real-world data analysis in R.