Rolling Statistics in Time Series
In the previous lesson, we learned how resampling helps us look at data at different time resolutions.
But sometimes, resampling is too aggressive. We don’t want to collapse data — we want to smooth it gradually.
This is where rolling statistics come in.
Real-World Problem First
Imagine you are monitoring:
- Daily website traffic
- Daily stock prices
- Daily electricity usage
Every day has ups and downs.
If your manager asks:
“Is traffic increasing overall or just fluctuating randomly?”
Looking at raw daily data makes this very hard to answer.
We need a way to see the local trend — not too noisy, not too smooth.
What Are Rolling Statistics?
Rolling statistics compute values over a moving window that slides across time.
At each point, we calculate statistics using only recent data.
Common rolling statistics:
- Rolling mean (moving average)
- Rolling standard deviation
- Rolling min / max
They answer the question:
“What has been happening recently?”
Creating a Realistic Daily Time Series
We will reuse a realistic daily sales dataset. This mimics real business data.
Python: Generate Daily Sales
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(10)
dates = pd.date_range("2023-01-01", periods=365, freq="D")
trend = np.linspace(80, 140, 365)
weekly = 12 * np.sin(2 * np.pi * np.arange(365) / 7)
noise = np.random.normal(0, 6, 365)
sales = trend + weekly + noise
df = pd.DataFrame({"sales": sales}, index=dates)
Here is the raw daily data:
Observation:
- Lots of short-term noise
- Trend exists but is hard to see
- Decision-making is difficult
Rolling Mean (Moving Average)
A rolling mean calculates the average of the last N values.
For example:
- 7-day rolling mean → last week
- 30-day rolling mean → last month
This smooths the data without destroying time structure.
Python: 7-Day Rolling Mean
rolling_7 = df["sales"].rolling(window=7).mean()
Here is how the 7-day rolling mean looks:
What changed?
- Noise reduced
- Weekly pattern smoother
- Trend easier to see
Longer Window: 30-Day Rolling Mean
Now let’s smooth the data even more using a 30-day window.
This focuses on medium-term behavior.
Python: 30-Day Rolling Mean
rolling_30 = df["sales"].rolling(window=30).mean()
Here is the 30-day rolling mean:
Notice:
- Very smooth curve
- Short-term fluctuations removed
- Excellent for trend analysis
Rolling Standard Deviation
Rolling mean shows direction.
Rolling standard deviation shows volatility.
It answers:
“How stable is the data recently?”
Python: Rolling Volatility
rolling_std = df["sales"].rolling(window=30).std()
Here is the rolling volatility:
Interpretation:
- Higher values → unstable period
- Lower values → consistent behavior
- Very useful for risk analysis
Choosing the Right Window Size
| Window | Best For |
|---|---|
| 7 days | Short-term patterns |
| 30 days | Monthly trends |
| 90+ days | Long-term stability |
There is no “perfect” window. It depends on the business question.
Common Mistakes
- Using very large windows and losing patterns
- Comparing raw data directly with smoothed data
- Ignoring missing values at window edges
Key Takeaways
- Rolling statistics smooth data gradually
- Rolling mean reveals local trends
- Rolling std shows volatility
- Window size controls smoothness
Next Lesson
In the next lesson, we will dive into Autocorrelation (ACF) and understand how past values influence future values.