Pivot Tables in Pandas
Pivot tables provide a powerful and easy way to summarize large datasets. They are similar to Excel pivot tables and are commonly used for reporting and analysis.
In this lesson, you will learn how to create pivot tables using Pandas to analyze sales data efficiently.
Loading the Dataset
We continue using the same dataset used throughout the Pandas course.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
df["order_date"] = pd.to_datetime(df["order_date"])
What Is a Pivot Table?
A pivot table allows you to:
- Group data by rows and columns
- Apply aggregation functions
- Summarize data in a matrix format
Pivot tables are ideal when you want a structured summary rather than a long list of grouped values.
Basic Pivot Table
Let’s calculate total sales per region using a pivot table.
pd.pivot_table(
df,
values="sales_amount",
index="region",
aggfunc="sum"
)
This creates a summary showing total sales for each region.
Pivot Table with Rows and Columns
Now let’s analyze total sales by region and product.
pd.pivot_table(
df,
values="sales_amount",
index="region",
columns="product",
aggfunc="sum"
)
Each row represents a region, each column represents a product, and each cell contains total sales.
Changing the Aggregation Function
You can change the aggregation function to calculate averages, counts, minimums, or maximums.
Example: average sales per product in each region.
pd.pivot_table(
df,
values="sales_amount",
index="region",
columns="product",
aggfunc="mean"
)
Multiple Aggregations in Pivot Tables
You can apply multiple aggregation functions at once.
pd.pivot_table(
df,
values="sales_amount",
index="product",
aggfunc=["sum", "mean", "count"]
)
This shows total sales, average sales, and order count per product.
Handling Missing Values
Some combinations of rows and columns may not have data.
You can replace missing values using fill_value.
pd.pivot_table(
df,
values="sales_amount",
index="region",
columns="product",
aggfunc="sum",
fill_value=0
)
Pivot Tables with Dates
Pivot tables work well with date-based analysis.
Example: monthly sales summary.
pd.pivot_table(
df,
values="sales_amount",
index=df["order_date"].dt.month,
aggfunc="sum"
)
Sorting Pivot Table Results
You can sort pivot table results just like a DataFrame.
pivot = pd.pivot_table(
df,
values="sales_amount",
index="product",
aggfunc="sum"
)
pivot.sort_values(by="sales_amount", ascending=False)
Pivot Tables vs GroupBy
Both pivot tables and GroupBy summarize data, but:
- GroupBy is more flexible and code-oriented
- Pivot tables are easier for structured summaries
In practice, both are used depending on the task.
Practice Exercise
Using the dataset:
- Create a pivot table showing total sales by region
- Create a pivot table with region as rows and product as columns
- Calculate average sales per product
- Fill missing values with zero
What’s Next?
In the next lesson, you will learn how to merge DataFrames, which allows you to combine multiple datasets into one.