Pandas Lesson 24 – Aggregations | Dataplexa

Advanced Aggregations in Pandas

In earlier lessons, you learned basic aggregations such as sum(), mean(), and count(). These work well for simple analysis.

In real-world data analysis, however, you often need multiple calculations at once, custom logic, and aggregations across different columns.

This lesson focuses on advanced aggregation techniques in Pandas.


Why Advanced Aggregations Are Important

Businesses rarely ask just one question like:

  • What is the total sales?

Instead, they ask:

  • Total sales by region
  • Average order value per product
  • Maximum and minimum sales per month

Advanced aggregations help answer all of these efficiently.


Multiple Aggregations Using agg()

The agg() method allows you to apply multiple aggregation functions at once.

Example: Calculate total, average, and maximum sales.

import pandas as pd

sales = pd.read_csv("dataplexa_pandas_sales.csv")

sales.agg({
    "sales_amount": ["sum", "mean", "max"]
})

This produces a compact summary without writing multiple lines of code.


Aggregations with groupby() and agg()

Most advanced analysis involves grouping data before aggregation.

Example: Sales statistics by region.

sales.groupby("region").agg({
    "sales_amount": ["sum", "mean", "count"]
})

Each region now has its own summary statistics.


Renaming Aggregated Columns

By default, aggregated column names can look complex. You can rename them for clarity.

sales.groupby("region").agg(
    total_sales=("sales_amount", "sum"),
    average_sales=("sales_amount", "mean"),
    orders=("sales_amount", "count")
)

This makes reports easier to read and present.


Aggregating Multiple Columns

You can apply different functions to different columns.

Example: Aggregate sales amount and quantity together.

sales.groupby("region").agg(
    total_sales=("sales_amount", "sum"),
    total_quantity=("quantity", "sum")
)

Using Custom Aggregation Functions

Sometimes built-in functions are not enough. You can define your own.

Example: Calculate sales range (max - min).

def sales_range(x):
    return x.max() - x.min()

sales.groupby("region").agg(
    sales_range=("sales_amount", sales_range)
)

Custom logic allows you to solve complex business problems.


Combining Aggregations with Sorting

After aggregating, you often want to rank or sort results.

Example: Regions sorted by total sales.

summary = sales.groupby("region").agg(
    total_sales=("sales_amount", "sum")
)

summary.sort_values("total_sales", ascending=False)

When to Use Advanced Aggregations

  • Building summary dashboards
  • Preparing reports for stakeholders
  • Feature engineering for machine learning
  • Business and financial analysis

Practice Exercise

Using the dataset:

  • Calculate total and average sales per product
  • Find the region with highest total sales
  • Create a custom aggregation of your choice

What’s Next?

In the next lesson, you will explore time series analysis and learn how Pandas handles date-based data.