Data Aggregation | Dataplexa

Data Aggregation in R

In this lesson, you will learn how to summarize and aggregate data in R.

Data aggregation means combining multiple data values into a single meaningful result. This is one of the most important steps in data analysis because raw data is often too detailed to interpret directly.


What Is Data Aggregation?

Data aggregation is the process of grouping data and applying summary functions such as:

  • Sum
  • Mean (average)
  • Count
  • Minimum and maximum

Aggregation helps answer questions like:

  • What is the average sales per category?
  • How many records exist per group?
  • What is the total value per region?

Sample Dataset

We will use a simple data frame to demonstrate aggregation concepts.

data <- data.frame(
  department = c("HR", "IT", "IT", "HR", "Finance", "IT"),
  salary = c(40000, 60000, 65000, 42000, 70000, 62000)
)

data

Using aggregate() Function

The aggregate() function is the base R method for data aggregation.

It applies a function to grouped data.

aggregate(salary ~ department, data = data, FUN = mean)

This calculates the average salary for each department.


Counting Records Per Group

To count how many rows exist per group, you can use the length function.

aggregate(salary ~ department, data = data, FUN = length)

Calculating Total Values

You can also calculate totals using the sum function.

This is useful for financial or performance reports.

aggregate(salary ~ department, data = data, FUN = sum)

Multiple Aggregations

Sometimes, you need more than one summary metric.

You can use a custom function to return multiple values.

aggregate(
  salary ~ department,
  data = data,
  FUN = function(x) c(
    mean = mean(x),
    max = max(x),
    min = min(x)
  )
)

Aggregation Using tapply()

The tapply() function applies a function over subsets of a vector.

It is simpler but very powerful for quick summaries.

tapply(data$salary, data$department, mean)

Aggregation Using by()

The by() function splits data into groups and applies a function.

It is often used for exploratory analysis.

by(data$salary, data$department, summary)

Why Data Aggregation Matters

  • Reduces large datasets into understandable summaries
  • Helps identify patterns and trends
  • Forms the basis for reporting and visualization
  • Essential for business and statistical analysis

📝 Practice Exercises


Exercise 1

Calculate the average salary for each department.

Exercise 2

Find the total salary paid by each department.

Exercise 3

Count how many employees exist in each department.

Exercise 4

Find the minimum and maximum salary per department.


✅ Practice Answers


Answer 1

aggregate(salary ~ department, data = data, FUN = mean)

Answer 2

aggregate(salary ~ department, data = data, FUN = sum)

Answer 3

aggregate(salary ~ department, data = data, FUN = length)

Answer 4

aggregate(
  salary ~ department,
  data = data,
  FUN = function(x) c(min = min(x), max = max(x))
)

What’s Next?

In the next lesson, you will learn about Data Merging in R.

This will help you combine multiple datasets into a single dataset for analysis.