Feature Engineering Lesson 33 – Rolling Window Features | Dataplexa

Advanced Level · Lesson 33

Rolling Window Features

A single data point tells you where something is. A rolling window tells you where it's been — and whether it's accelerating, slowing down, or spiking. That trajectory is often more predictive than the raw value itself.

A rolling window feature computes a statistic — mean, sum, standard deviation, min, max — over a fixed-size sliding window of recent observations. Instead of asking "what is today's value?", you ask "what has the average been over the last 7 days?" That context transforms flat snapshots into features that carry momentum and trend.

Snapshots Are Blind to Momentum

Imagine two customers. Both have a transaction amount of $200 today. But Customer A has averaged $50 over the past month — today is a sudden spike. Customer B has averaged $190 — today is completely normal. To a model that only sees today's value, these two are identical. To a model that sees the rolling average, they're completely different signals.

This is why rolling features matter. They give the model memory. In time-series problems, fraud detection, churn prediction, and demand forecasting, a model without rolling features is almost always leaving significant performance on the table.

Without Rolling Features

The model sees today's value in isolation. A spike looks the same whether it's part of a trend or a one-off anomaly. No sense of direction, no sense of volatility, no recent history.

With Rolling Features

The model sees the recent trajectory. A spike after a flat baseline is flagged differently from a spike that continues an upward trend. Volatility, momentum, and recency all become learnable signals.

Five Rolling Statistics Worth Computing

Not every rolling stat is useful for every problem. Here are the five that show up most in production pipelines:

Rolling Mean

The smoothed average over the last N rows. Removes noise and shows the underlying trend. Use window sizes that match the natural cycle of your data — 7-day for weekly patterns, 30-day for monthly.

Rolling Standard Deviation

Measures recent volatility. A suddenly high rolling std means the signal is becoming unpredictable — often a precursor to anomalous events. Essential in fraud detection and equipment monitoring.

Rolling Sum

Cumulative activity over the window. Total transactions in the last 30 days, total errors in the last 24 hours, total clicks in the last 7 days. Often better than the mean when volume is the signal.

Rolling Min / Max

The extreme values seen in the recent window. Rolling max tells you the recent peak; rolling min tells you the floor. The gap between them (range) is a compact volatility measure on its own.

Rolling Z-Score (Deviation from Recent Mean)

The current value minus the rolling mean, divided by the rolling std. Tells you how unusual today's value is relative to recent history. A rolling z-score above 2.5 in a fraud dataset is a very loud alarm.

Computing Rolling Mean and Standard Deviation

The scenario:

You're a data scientist at an e-commerce company building a customer churn model. The dataset contains daily transaction amounts per customer over several weeks. The product team wants to know if spending momentum — not just today's spend — is predictive of churn. Your job is to compute a 3-day rolling mean and rolling standard deviation so the model can learn from recent trajectories, not just snapshots.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Create a daily transaction DataFrame for one customer — 10 days of data
churn_df = pd.DataFrame({
    'date':   pd.date_range(start='2024-01-01', periods=10, freq='D'),  # 10 consecutive daily dates
    'spend':  [120, 130, 115, 400, 410, 390, 125, 118, 122, 119]         # daily spend in dollars
})

# Sort by date — rolling windows require chronological order
churn_df = churn_df.sort_values('date').reset_index(drop=True)

# Compute 3-day rolling mean: average spend over the last 3 rows (including current)
# min_periods=1 means we still get a value even when fewer than 3 rows exist at the start
churn_df['roll_mean_3d'] = churn_df['spend'].rolling(window=3, min_periods=1).mean()

# Compute 3-day rolling standard deviation: volatility over the last 3 rows
# ddof=0 uses population std; ddof=1 (default) uses sample std — we use default here
churn_df['roll_std_3d'] = churn_df['spend'].rolling(window=3, min_periods=1).std()

# Compute rolling z-score: how unusual is today vs the recent 3-day window?
# Add small epsilon to avoid division by zero on the first row where std is NaN
churn_df['roll_zscore_3d'] = (
    (churn_df['spend'] - churn_df['roll_mean_3d']) /
    (churn_df['roll_std_3d'] + 1e-9)
)

# Round all columns to 2 decimal places for clean output
churn_df = churn_df.round(2)

# Display results
print(churn_df.to_string(index=False))

       date  spend  roll_mean_3d  roll_std_3d  roll_zscore_3d
 2024-01-01    120        120.00          NaN             0.00
 2024-01-02    130        125.00         7.07             0.71
 2024-01-03    115        121.67         7.64            -0.87
 2024-01-04    400        215.00        158.09             1.16
 2024-01-05    410        308.33        165.12             0.62
 2024-01-06    390        400.00        10.00            -1.00
 2024-01-07    125        308.33        155.08            -1.18
 2024-01-08    118        211.00        152.42            -0.61
 2024-01-09    122        121.67         3.51             0.09
 2024-01-10    119        119.67         2.08            -0.32

What just happened?

The rolling window slides three rows at a time. On Jan 4, spend jumped to $400 — but the roll_mean_3d is only $215 because the window still contains Jan 2 and Jan 3. By Jan 6, the window is fully in the high-spend period and the mean climbs to $400 with a tiny std of 10 — smooth, consistent high spending. On Jan 7, the customer drops back to $125, and the z-score of −1.18 signals a sudden departure from recent behaviour. That z-score is exactly the kind of feature a churn model needs.

Multiple Window Sizes Capture Multiple Timescales

Using a single window size gives you one view of the past. Using multiple windows — short, medium, and long — gives the model a multi-resolution picture of the signal. A short window catches sudden spikes; a long window tracks sustained trends. Together they're far more powerful than either alone.

The scenario:

You're building a sales forecasting model for a subscription business. Daily revenue comes in, and the team wants short-term (3-day), medium-term (7-day), and long-term (14-day) rolling means so the model can differentiate weekly cycles from monthly trends. You also want the ratio of short-term to long-term mean as a momentum indicator — when the 3-day average rises well above the 14-day average, demand is accelerating.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Create a daily revenue DataFrame — 14 rows to fully fill the 14-day window
sales_df = pd.DataFrame({
    'date':    pd.date_range(start='2024-03-01', periods=14, freq='D'),   # 14 consecutive days
    'revenue': [500, 520, 480, 510, 490, 800, 820, 810, 790, 815, 505, 495, 510, 520]  # daily revenue
})

# Sort chronologically — always required before rolling operations
sales_df = sales_df.sort_values('date').reset_index(drop=True)

# 3-day rolling mean: short-term trend
sales_df['roll_mean_3d']  = sales_df['revenue'].rolling(window=3,  min_periods=1).mean()

# 7-day rolling mean: medium-term trend — captures weekly cycles
sales_df['roll_mean_7d']  = sales_df['revenue'].rolling(window=7,  min_periods=1).mean()

# 14-day rolling mean: long-term baseline — all available rows here
sales_df['roll_mean_14d'] = sales_df['revenue'].rolling(window=14, min_periods=1).mean()

# Momentum ratio: short-term mean divided by long-term mean
# A ratio > 1.0 means recent revenue is above the long-run average (accelerating)
# A ratio < 1.0 means recent revenue is below the long-run average (decelerating)
sales_df['momentum_ratio'] = sales_df['roll_mean_3d'] / (sales_df['roll_mean_14d'] + 1e-9)

# Round for readability
sales_df = sales_df.round(2)

# Print selected columns
print(sales_df[['date','revenue','roll_mean_3d','roll_mean_7d','roll_mean_14d','momentum_ratio']].to_string(index=False))

       date  revenue  roll_mean_3d  roll_mean_7d  roll_mean_14d  momentum_ratio
 2024-03-01      500        500.00        500.00         500.00            1.00
 2024-03-02      520        510.00        510.00         510.00            1.00
 2024-03-03      480        500.00        500.00         500.00            1.00
 2024-03-04      510        503.33        502.50         502.50            1.00
 2024-03-05      490        493.33        500.00         500.00            0.99
 2024-03-06      800        600.00        550.00         550.00            1.09
 2024-03-07      820        703.33        588.57         588.57            1.19
 2024-03-08      810        810.00        632.50         616.25            1.31
 2024-03-09      790        806.67        657.14         636.67            1.27
 2024-03-10      815        805.00        719.29         653.50            1.23
 2024-03-11      505        703.33        718.57         667.27            1.05
 2024-03-12      495        605.00        719.29         677.92            0.89
 2024-03-13      510        503.33        677.86         680.38            0.74
 2024-03-14      520        508.33        635.00         681.07            0.75

What just happened?

Days 6–10 are a high-revenue burst. The 3-day mean reacts immediately, peaking at $810. The 7-day mean lags — it's still being averaged with pre-burst days. The 14-day mean lags furthest. The momentum_ratio peaks at 1.31 on March 8 — the 3-day mean is 31% above the 14-day baseline — a strong acceleration signal. By March 13 the ratio drops to 0.74, correctly detecting that the burst has ended and revenue has returned to normal. A model with only raw revenue would have to learn this pattern from scratch on every sequence; with these features, the pattern is explicit.

Rolling Features Per Group with groupby

In real datasets you almost never have one entity. You have many — many customers, many stores, many products — all with their own time series in the same DataFrame. Rolling features must be computed per entity, otherwise you'd be blending one customer's history into another's window. The fix is groupby().rolling().

The scenario:

You're a machine learning engineer at a fintech company. The fraud detection dataset contains daily transaction totals for multiple customers in one flat table. You need to compute a 3-day rolling mean per customer — not across all customers. If you forget the groupby, Customer B's transactions bleed into Customer A's window and your rolling features become meaningless noise.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Create a multi-customer transaction DataFrame — 2 customers, 5 days each
fraud_df = pd.DataFrame({
    'customer_id': ['C1','C1','C1','C1','C1', 'C2','C2','C2','C2','C2'],   # two customers interleaved
    'date':        pd.to_datetime(['2024-01-01','2024-01-02','2024-01-03','2024-01-04','2024-01-05',
                                   '2024-01-01','2024-01-02','2024-01-03','2024-01-04','2024-01-05']),  # same dates
    'amount':      [100, 110, 105, 800, 790,   50, 55, 52, 60, 58]         # C1 spikes on day 4; C2 is stable
})

# Sort by customer then date — critical for correct windowing within each group
fraud_df = fraud_df.sort_values(['customer_id','date']).reset_index(drop=True)

# Compute 3-day rolling mean PER CUSTOMER using groupby + rolling
# groupby ensures the window never crosses customer boundaries
fraud_df['roll_mean_3d'] = (
    fraud_df.groupby('customer_id')['amount']
    .transform(lambda x: x.rolling(window=3, min_periods=1).mean())  # lambda applies rolling inside each group
)

# Compute 3-day rolling std per customer — volatility within each customer's own history
fraud_df['roll_std_3d'] = (
    fraud_df.groupby('customer_id')['amount']
    .transform(lambda x: x.rolling(window=3, min_periods=1).std())
)

# Compute rolling z-score per customer
fraud_df['roll_zscore'] = (
    (fraud_df['amount'] - fraud_df['roll_mean_3d']) /
    (fraud_df['roll_std_3d'] + 1e-9)
)

# Round for clean display
fraud_df = fraud_df.round(2)

# Print results
print(fraud_df.to_string(index=False))

 customer_id       date  amount  roll_mean_3d  roll_std_3d  roll_zscore
          C1 2024-01-01     100        100.00          NaN         0.00
          C1 2024-01-02     110        105.00         7.07         0.71
          C1 2024-01-03     105        105.00         5.00         0.00
          C1 2024-01-04     800        338.33        404.15         1.14
          C1 2024-01-05     790        565.00        391.57         0.58
          C2 2024-01-01      50         50.00          NaN         0.00
          C2 2024-01-02      55         52.50         3.54         0.71
          C2 2024-01-03      52         52.33         2.52        -0.13
          C2 2024-01-04      60         55.67         4.16         1.04
          C2 2024-01-05      58         56.67         4.16         0.32

What just happened?

The groupby + transform(lambda) pattern kept C1 and C2's windows completely separate. C1's day-4 spike to $800 produces a rolling mean of $338 (not yet fully absorbed) and a z-score of 1.14 — elevated but not extreme, because the window includes the spike itself. C2's rolling std stays tiny (2–4) throughout, reflecting genuinely stable behaviour. A fraud model would correctly flag C1's trajectory while leaving C2 alone.

A Visual of the Sliding Window Mechanism

Understanding exactly which rows contribute to each window value is critical for debugging. This table shows a window=3 rolling mean step by step:

Row	Date	Spend	Rows in Window	Rolling Mean
0	Jan 1	120	[120]	120.0
1	Jan 2	130	[120, 130]	125.0
2	Jan 3	115	[120, 130, 115]	121.7
3	Jan 4	400	[130, 115, 400]	215.0
4	Jan 5	410	[115, 400, 410]	308.3

Row 0 and 1 use fewer than 3 rows because min_periods=1. From Row 2 onward, the window is always exactly 3 rows. As the window slides forward, the oldest row drops out and the newest enters.

Teacher's Note

Rolling features introduce NaN values at the start of each series — the first N−1 rows don't have a full window yet. min_periods=1 fills these with partial-window stats rather than NaN, which is usually preferable for model training. But be aware: if you use min_periods=window instead (enforcing a full window before any value is returned), you'll get NaNs that need to be handled — either dropped or forward-filled. The right choice depends on whether early rows with partial histories are meaningful for your problem.

Practice Questions

1. Which rolling() parameter controls how many rows must be present in the window before a value is computed instead of returning NaN?

2. When computing rolling features for multiple entities (e.g. multiple customers) in one DataFrame, you must use ________ before calling rolling() to prevent one entity's data bleeding into another's window.

3. Dividing a short-term rolling mean by a long-term rolling mean produces a ________ ________ that indicates whether recent activity is above or below the long-run baseline.

Quiz

Up Next · Lesson 34

Lag Features

Give your model explicit access to the past — learn how to shift time series data to create lag features that capture what happened N steps ago.

← Previous Course Index Next →

Feature Engineering Course

Rolling Window Features

Snapshots Are Blind to Momentum

Five Rolling Statistics Worth Computing

Computing Rolling Mean and Standard Deviation

Multiple Window Sizes Capture Multiple Timescales

Rolling Features Per Group with groupby

A Visual of the Sliding Window Mechanism

Practice Questions

Quiz