Resampling Time Series Data
In the real world, data does not always come in the format we want. Sometimes it is too detailed. Sometimes it is too noisy.
This is where resampling becomes extremely important.
Real-World Problem First
Imagine you work for a ride-sharing company. You are given minute-level ride demand data.
Your manager asks:
“Can you give me a clear monthly trend of demand?”
If you show minute-level data, it looks chaotic and unreadable.
What you really need is:
- Daily averages
- Weekly totals
- Monthly trends
This transformation is called resampling.
What Is Resampling?
Resampling means changing the time frequency of a time series.
Examples:
- Hourly → Daily
- Daily → Weekly
- Daily → Monthly
Resampling always involves two steps:
- Grouping data by time
- Applying an aggregation (mean, sum, max, etc.)
Let’s Create High-Frequency Data
We will simulate daily sales data for one full year.
This data contains:
- Growth over time
- Weekly seasonality
- Random fluctuations
Python: Creating Daily Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
dates = pd.date_range(start="2023-01-01", periods=365, freq="D")
trend = np.linspace(50, 120, 365)
weekly_seasonality = 10 * np.sin(2 * np.pi * np.arange(365) / 7)
noise = np.random.normal(0, 5, 365)
sales = trend + weekly_seasonality + noise
df = pd.DataFrame({"sales": sales}, index=dates)
Now let’s visualize the raw daily data.
What you should notice:
- Very noisy signal
- Hard to see long-term trend
- Too detailed for decision making
Why Daily Data Is Hard to Read
At daily level:
- Random events dominate
- Business patterns are hidden
- Executives get confused
So we resample.
Resampling to Weekly Data
First, we aggregate daily sales into weekly averages.
This removes short-term noise while keeping meaningful patterns.
Python: Weekly Resampling
weekly_sales = df.resample("W").mean()
Here is how the weekly data looks:
What changed?
- Noise reduced
- Trend becomes clearer
- Still responsive to changes
Weekly resampling is common in operations and logistics.
Resampling to Monthly Data
Now let’s go one level higher: monthly averages.
This is what executives usually want.
Python: Monthly Resampling
monthly_sales = df.resample("M").mean()
Here is the monthly view:
What you should clearly see now:
- Very smooth curve
- Clear upward growth
- No short-term distractions
This is perfect for:
- Business reports
- Strategy planning
- Long-term forecasting
Choosing the Right Aggregation
Resampling is not just about frequency — aggregation matters.
| Goal | Aggregation |
|---|---|
| Total revenue | sum |
| Average demand | mean |
| Peak usage | max |
| Worst performance | min |
Always match aggregation with business meaning.
Common Mistakes to Avoid
- Resampling before cleaning missing values
- Using sum when mean is required
- Losing seasonality by over-aggregating
Once you resample too aggressively, information is lost forever.
Key Takeaways
- Resampling changes time frequency
- It simplifies noisy data
- Different stakeholders need different resolutions
- Always choose aggregation carefully
Next Lesson
In the next lesson, we’ll study rolling statistics — how moving averages and rolling windows reveal hidden patterns.