Data Aggregation in R
In this lesson, you will learn how to summarize and aggregate data in R.
Data aggregation means combining multiple data values into a single meaningful result. This is one of the most important steps in data analysis because raw data is often too detailed to interpret directly.
What Is Data Aggregation?
Data aggregation is the process of grouping data and applying summary functions such as:
- Sum
- Mean (average)
- Count
- Minimum and maximum
Aggregation helps answer questions like:
- What is the average sales per category?
- How many records exist per group?
- What is the total value per region?
Sample Dataset
We will use a simple data frame to demonstrate aggregation concepts.
data <- data.frame(
department = c("HR", "IT", "IT", "HR", "Finance", "IT"),
salary = c(40000, 60000, 65000, 42000, 70000, 62000)
)
data
Using aggregate() Function
The aggregate() function is the base R method for data aggregation.
It applies a function to grouped data.
aggregate(salary ~ department, data = data, FUN = mean)
This calculates the average salary for each department.
Counting Records Per Group
To count how many rows exist per group, you can use the length function.
aggregate(salary ~ department, data = data, FUN = length)
Calculating Total Values
You can also calculate totals using the sum function.
This is useful for financial or performance reports.
aggregate(salary ~ department, data = data, FUN = sum)
Multiple Aggregations
Sometimes, you need more than one summary metric.
You can use a custom function to return multiple values.
aggregate(
salary ~ department,
data = data,
FUN = function(x) c(
mean = mean(x),
max = max(x),
min = min(x)
)
)
Aggregation Using tapply()
The tapply() function applies a function over subsets of a vector.
It is simpler but very powerful for quick summaries.
tapply(data$salary, data$department, mean)
Aggregation Using by()
The by() function splits data into groups and applies a function.
It is often used for exploratory analysis.
by(data$salary, data$department, summary)
Why Data Aggregation Matters
- Reduces large datasets into understandable summaries
- Helps identify patterns and trends
- Forms the basis for reporting and visualization
- Essential for business and statistical analysis
📝 Practice Exercises
Exercise 1
Calculate the average salary for each department.
Exercise 2
Find the total salary paid by each department.
Exercise 3
Count how many employees exist in each department.
Exercise 4
Find the minimum and maximum salary per department.
✅ Practice Answers
Answer 1
aggregate(salary ~ department, data = data, FUN = mean)
Answer 2
aggregate(salary ~ department, data = data, FUN = sum)
Answer 3
aggregate(salary ~ department, data = data, FUN = length)
Answer 4
aggregate(
salary ~ department,
data = data,
FUN = function(x) c(min = min(x), max = max(x))
)
What’s Next?
In the next lesson, you will learn about Data Merging in R.
This will help you combine multiple datasets into a single dataset for analysis.