tidyr | Dataplexa

tidyr: Data Tidying in R

In this lesson, you will learn how to organize and clean messy data using the tidyr package.

tidyr helps you reshape datasets into a clean and consistent structure, making them easier to analyze, visualize, and model.


What Is Data Tidying?

Data tidying means arranging data so that:

  • Each variable has its own column
  • Each observation has its own row
  • Each value has its own cell

This structure is often called tidy data. Most R tools work best when data is tidy.


What Is tidyr?

tidyr is an R package designed to help you transform messy data into tidy data.

It works closely with dplyr and is commonly used in data analysis pipelines.


Installing and Loading tidyr

Before using tidyr, install and load the package.

install.packages("tidyr")
library(tidyr)

Using a Sample Dataset

Let’s start with a dataset that is not tidy.

data <- data.frame(
  name = c("Alex", "Emma"),
  math = c(85, 90),
  science = c(88, 92)
)

data

Here, subject names are stored as column headers instead of values.


Converting Wide Data to Long Format with pivot_longer()

The pivot_longer() function converts wide data into long format.

This is useful when column names represent values.

long_data <- pivot_longer(
  data,
  cols = c(math, science),
  names_to = "subject",
  values_to = "score"
)

long_data

Now each row represents a single subject score per person.


Converting Long Data to Wide Format with pivot_wider()

The pivot_wider() function does the opposite.

It spreads values from rows into columns.

wide_data <- pivot_wider(
  long_data,
  names_from = subject,
  values_from = score
)

wide_data

Separating Columns with separate()

Sometimes multiple values are stored in a single column.

The separate() function splits one column into multiple columns.

people <- data.frame(
  full_name = c("Alex_Smith", "Emma_Clark")
)

separate(people, full_name, into = c("first_name", "last_name"), sep = "_")

Combining Columns with unite()

The unite() function combines multiple columns into one.

This is useful when you want a single identifier column.

unite(people, full_name, first_name, last_name, sep = " ")

Handling Missing Values with drop_na()

Missing values can cause issues during analysis.

The drop_na() function removes rows with missing data.

data_with_na <- data.frame(
  name = c("Alex", "Emma", "John"),
  score = c(90, NA, 85)
)

drop_na(data_with_na)

Why tidyr Is Important

tidyr helps prepare data before analysis and visualization.

Clean data improves accuracy, reduces errors, and makes code easier to understand.


📝 Practice Exercises


Exercise 1

Convert a wide dataset into long format using pivot_longer().

Exercise 2

Convert long-format data back into wide format.

Exercise 3

Split a column containing "city-country" into two columns.

Exercise 4

Remove rows with missing values from a dataset.


✅ Practice Answers


Answer 1

pivot_longer(data, cols = everything(), names_to = "key", values_to = "value")

Answer 2

pivot_wider(long_data, names_from = subject, values_from = score)

Answer 3

separate(data, location, into = c("city", "country"), sep = "-")

Answer 4

drop_na(data)

What’s Next?

In the next lesson, you will learn how to work with text data using the stringr package.

This will help you clean and manipulate textual information efficiently.