Data Aggregation | Dataplexa

Data Aggregation in R

In this lesson, you will learn how to summarize and aggregate data in R.

Data aggregation means combining multiple data values into a single meaningful result. This is one of the most important steps in data analysis because raw data is often too detailed to interpret directly.

What Is Data Aggregation?

Data aggregation is the process of grouping data and applying summary functions such as:

Sum
Mean (average)
Count
Minimum and maximum

Aggregation helps answer questions like:

What is the average sales per category?
How many records exist per group?
What is the total value per region?

Sample Dataset

We will use a simple data frame to demonstrate aggregation concepts.

data <- data.frame(
  department = c("HR", "IT", "IT", "HR", "Finance", "IT"),
  salary = c(40000, 60000, 65000, 42000, 70000, 62000)
)

data

Using `aggregate()` Function

The aggregate() function is the base R method for data aggregation.

It applies a function to grouped data.

aggregate(salary ~ department, data = data, FUN = mean)

This calculates the average salary for each department.

Counting Records Per Group

To count how many rows exist per group, you can use the length function.

aggregate(salary ~ department, data = data, FUN = length)

Calculating Total Values

You can also calculate totals using the sum function.

This is useful for financial or performance reports.

aggregate(salary ~ department, data = data, FUN = sum)

Multiple Aggregations

Sometimes, you need more than one summary metric.

You can use a custom function to return multiple values.

aggregate(
  salary ~ department,
  data = data,
  FUN = function(x) c(
    mean = mean(x),
    max = max(x),
    min = min(x)
  )
)

Aggregation Using `tapply()`

The tapply() function applies a function over subsets of a vector.

It is simpler but very powerful for quick summaries.

tapply(data$salary, data$department, mean)

Aggregation Using `by()`

The by() function splits data into groups and applies a function.

It is often used for exploratory analysis.

by(data$salary, data$department, summary)

Why Data Aggregation Matters

Reduces large datasets into understandable summaries
Helps identify patterns and trends
Forms the basis for reporting and visualization
Essential for business and statistical analysis

📝 Practice Exercises

Exercise 1

Calculate the average salary for each department.

Exercise 2

Find the total salary paid by each department.

Exercise 3

Count how many employees exist in each department.

Exercise 4

Find the minimum and maximum salary per department.

✅ Practice Answers

Answer 1

aggregate(salary ~ department, data = data, FUN = mean)

Answer 2

aggregate(salary ~ department, data = data, FUN = sum)

Answer 3

aggregate(salary ~ department, data = data, FUN = length)

Answer 4

aggregate(
  salary ~ department,
  data = data,
  FUN = function(x) c(min = min(x), max = max(x))
)

What’s Next?

In the next lesson, you will learn about Data Merging in R.

This will help you combine multiple datasets into a single dataset for analysis.

← Previous Lesson R Index Next ➜