forcats | Dataplexa

forcats: Working with Factors in R

In this lesson, you will learn how to work with categorical data in R using the forcats package.

Categorical data is very common in real datasets — such as gender, region, status, category, or type. forcats makes factor handling simple, readable, and less error-prone.


What Are Factors?

A factor is a special data type in R used to store categorical values.

Factors are different from character strings because they have levels — predefined categories that control ordering and grouping.

status <- factor(c("active", "inactive", "active"))
status

Why Factors Matter

Factors are essential when:

  • Creating plots
  • Running statistical models
  • Grouping and summarizing data

Correct factor handling ensures accurate analysis and visualization.


What Is forcats?

forcats is a package designed to simplify working with factors.

All forcats functions start with fct_, making them easy to identify.


Installing and Loading forcats

Install and load the package before using it.

install.packages("forcats")
library(forcats)

Creating Factors with factor()

You can convert character vectors into factors.

This is often the first step when cleaning categorical data.

gender <- factor(c("male", "female", "female", "male"))
gender

Reordering Factor Levels with fct_reorder()

Sometimes factor levels need to be reordered based on another variable.

This is especially useful for charts and summaries.

scores <- c(88, 92, 79, 95)
names <- factor(c("Alex", "Emma", "John", "Sophia"))

fct_reorder(names, scores)

Renaming Levels with fct_recode()

The fct_recode() function allows you to rename factor levels.

This is useful when cleaning inconsistent labels.

status <- factor(c("A", "I", "A", "I"))

fct_recode(status,
  active = "A",
  inactive = "I"
)

Combining Levels with fct_collapse()

Sometimes many categories can be grouped into fewer levels.

This helps simplify analysis and visualization.

role <- factor(c("admin", "manager", "staff", "staff"))

fct_collapse(role,
  management = c("admin", "manager"),
  employee = "staff"
)

Handling Rare Levels with fct_lump()

Rare categories can distort analysis.

The fct_lump() function groups infrequent levels into "Other".

cities <- factor(c("NY", "LA", "TX", "NY", "LA", "FL"))

fct_lump(cities, n = 2)

Changing Factor Order Manually

You can manually define the order of factor levels.

This is useful when categories have a natural sequence.

priority <- factor(
  c("low", "medium", "high"),
  levels = c("low", "medium", "high")
)

priority

Why forcats Is Important

forcats prevents common mistakes with categorical data.

It improves clarity, consistency, and accuracy in data analysis workflows.


📝 Practice Exercises


Exercise 1

Create a factor from a vector of department names.

Exercise 2

Rename factor levels using fct_recode().

Exercise 3

Group multiple factor levels into one category.

Exercise 4

Move rare categories into an "Other" group.


✅ Practice Answers


Answer 1

dept <- factor(c("HR", "IT", "Finance", "IT"))
dept

Answer 2

fct_recode(dept,
  Human_Resources = "HR",
  Information_Tech = "IT"
)

Answer 3

fct_collapse(dept,
  Admin = c("HR", "Finance"),
  Tech = "IT"
)

Answer 4

fct_lump(dept, n = 2)

What’s Next?

In the next lesson, you will move into advanced loops in R.

This will help you write more flexible and powerful iterative logic.