Window Functions in Pandas
So far, you have learned how to group, filter, and aggregate data. However, many real-world problems require calculations that look across rows without collapsing them.
This is where window functions become extremely useful.
What Are Window Functions?
A window function performs a calculation across a set of rows related to the current row, while keeping all rows intact.
Unlike groupby(), window functions:
- Do not reduce the number of rows
- Allow comparisons between neighboring rows
- Enable running totals and rankings
Why Window Functions Matter
Common business use cases include:
- Running total of sales
- Comparing current row with previous row
- Ranking products within a category
All of these can be solved using window functions.
Using rolling() for Moving Calculations
A rolling window applies calculations over a fixed number of rows.
Example: Calculate a 3-row moving average of sales amount.
import pandas as pd
sales = pd.read_csv("dataplexa_pandas_sales.csv")
sales["moving_avg"] = sales["sales_amount"].rolling(window=3).mean()
This calculates the average of the current row and the previous two rows.
Understanding Rolling Window Output
The first few rows may contain NaN values because there
are not enough previous rows to complete the window.
This behavior is expected and normal.
Using expanding() for Cumulative Calculations
An expanding window starts from the first row and grows with each row.
Example: Running total of sales.
sales["running_total"] = sales["sales_amount"].expanding().sum()
Each row includes all previous rows in the calculation.
Using shift() to Compare Rows
The shift() function moves data up or down.
This allows row-to-row comparisons.
Example: Compare current sales with previous day sales.
sales["previous_sales"] = sales["sales_amount"].shift(1)
sales["sales_change"] = sales["sales_amount"] - sales["previous_sales"]
Ranking Data with rank()
Window functions also help rank values without grouping.
Example: Rank sales amount across all records.
sales["sales_rank"] = sales["sales_amount"].rank(ascending=False)
Higher sales receive better ranks.
Ranking Within Groups
You can combine window functions with grouping logic.
Example: Rank products within each region.
sales["region_rank"] = (
sales.groupby("region")["sales_amount"]
.rank(ascending=False)
)
Each region now has its own ranking system.
Common Window Function Patterns
- Rolling averages for trends
- Running totals for growth analysis
- Lag comparisons using
shift() - Ranking without losing row data
Practice Exercise
Try the following:
- Create a running total of sales
- Calculate a 5-row moving average
- Compare current sales with previous sales
What’s Next?
In the next lesson, you will learn about advanced aggregations for deeper analytical insights.