Advanced Aggregations in Pandas
In earlier lessons, you learned basic aggregations such as
sum(), mean(), and count().
These work well for simple analysis.
In real-world data analysis, however, you often need multiple calculations at once, custom logic, and aggregations across different columns.
This lesson focuses on advanced aggregation techniques in Pandas.
Why Advanced Aggregations Are Important
Businesses rarely ask just one question like:
- What is the total sales?
Instead, they ask:
- Total sales by region
- Average order value per product
- Maximum and minimum sales per month
Advanced aggregations help answer all of these efficiently.
Multiple Aggregations Using agg()
The agg() method allows you to apply
multiple aggregation functions at once.
Example: Calculate total, average, and maximum sales.
import pandas as pd
sales = pd.read_csv("dataplexa_pandas_sales.csv")
sales.agg({
"sales_amount": ["sum", "mean", "max"]
})
This produces a compact summary without writing multiple lines of code.
Aggregations with groupby() and agg()
Most advanced analysis involves grouping data before aggregation.
Example: Sales statistics by region.
sales.groupby("region").agg({
"sales_amount": ["sum", "mean", "count"]
})
Each region now has its own summary statistics.
Renaming Aggregated Columns
By default, aggregated column names can look complex. You can rename them for clarity.
sales.groupby("region").agg(
total_sales=("sales_amount", "sum"),
average_sales=("sales_amount", "mean"),
orders=("sales_amount", "count")
)
This makes reports easier to read and present.
Aggregating Multiple Columns
You can apply different functions to different columns.
Example: Aggregate sales amount and quantity together.
sales.groupby("region").agg(
total_sales=("sales_amount", "sum"),
total_quantity=("quantity", "sum")
)
Using Custom Aggregation Functions
Sometimes built-in functions are not enough. You can define your own.
Example: Calculate sales range (max - min).
def sales_range(x):
return x.max() - x.min()
sales.groupby("region").agg(
sales_range=("sales_amount", sales_range)
)
Custom logic allows you to solve complex business problems.
Combining Aggregations with Sorting
After aggregating, you often want to rank or sort results.
Example: Regions sorted by total sales.
summary = sales.groupby("region").agg(
total_sales=("sales_amount", "sum")
)
summary.sort_values("total_sales", ascending=False)
When to Use Advanced Aggregations
- Building summary dashboards
- Preparing reports for stakeholders
- Feature engineering for machine learning
- Business and financial analysis
Practice Exercise
Using the dataset:
- Calculate total and average sales per product
- Find the region with highest total sales
- Create a custom aggregation of your choice
What’s Next?
In the next lesson, you will explore time series analysis and learn how Pandas handles date-based data.