Pandas Lesson 17 – | Dataplexa

GroupBy & Aggregations in Pandas

One of the most powerful features of Pandas is the ability to group data and perform calculations on those groups.

GroupBy allows you to summarize large datasets and answer real business questions such as:

  • Total sales per product
  • Average revenue per region
  • Number of orders per customer

Loading the Dataset

We continue using the same dataset used throughout this Pandas course.

import pandas as pd

df = pd.read_csv("dataplexa_pandas_sales.csv")
df["order_date"] = pd.to_datetime(df["order_date"])

What Is GroupBy?

groupby() splits the data into groups based on one or more columns, applies a function to each group, and then combines the results.

Think of it as:

  • Split
  • Apply
  • Combine

Grouping by a Single Column

Let’s calculate the total sales amount for each product.

df.groupby("product")["sales_amount"].sum()

This groups all rows by product name and sums the sales.


Resetting the Index

GroupBy results return the grouping column as an index. To convert it back into a regular column, use reset_index().

df.groupby("product")["sales_amount"].sum().reset_index()

Calculating Average Values

You can calculate averages using mean().

Example: average sales amount per region.

df.groupby("region")["sales_amount"].mean()

Counting Records in Each Group

To count how many records exist in each group, use count().

Example: number of orders per customer.

df.groupby("customer_id")["order_id"].count()

Grouping by Multiple Columns

You can group data using more than one column.

Example: total sales by region and product.

df.groupby(["region", "product"])["sales_amount"].sum()

Multiple Aggregations at Once

You can apply multiple aggregation functions using agg().

Example: total sales, average sales, and order count per product.

df.groupby("product").agg(
    total_sales=("sales_amount", "sum"),
    avg_sales=("sales_amount", "mean"),
    order_count=("order_id", "count")
)

Grouping by Date

GroupBy is very powerful when combined with date columns.

Example: total sales per month.

df.groupby(df["order_date"].dt.month)["sales_amount"].sum()

Sorting Grouped Results

You can sort aggregated results to find top or lowest values.

df.groupby("product")["sales_amount"]
  .sum()
  .sort_values(ascending=False)

Why GroupBy Is Important

GroupBy is used in almost every real-world data analysis task.

It allows you to:

  • Summarize large datasets
  • Identify trends
  • Prepare data for visualization
  • Generate business reports

Practice Exercise

Using the dataset:

  • Calculate total sales per region
  • Find average sales per product
  • Count orders per customer
  • Group sales by month

What’s Next?

In the next lesson, you will learn how to create Pivot Tables in Pandas, which provide an Excel-style way to summarize grouped data.