EDA Course
EDA for Time Series
Time series data breaks almost every assumption standard models make. Values aren't independent — yesterday's sales affect today's. The mean isn't stable — it drifts up or down over time. Correlations aren't constant — what predicts well in January might not predict well in July. Before any forecasting model runs, you need to know whether your series is stationary, whether it has seasonality, and whether the patterns you see are real signal or just the result of a trend that will make your model look good in-sample and fail completely out-of-sample.
The Three Pre-Forecasting Checks
Stationarity
Does the series have a constant mean and variance over time? Most forecasting algorithms (ARIMA, Prophet, even many ML models) assume stationarity. A series with a strong upward trend is non-stationary — its mean changes over time — and a model trained on it will fail when the trend shifts.
Seasonality
Does the series repeat a predictable pattern at regular intervals — every 7 days, every 12 months? Seasonality must be explicitly modelled or removed before the underlying trend is visible. Missing it means your model's "predictions" are just lagged versions of last year's seasonal pattern.
Autocorrelation
How strongly does a value correlate with its own past values? If today's sales are strongly predicted by yesterday's, that lag is a powerful feature. If the correlation only extends two weeks back, there's no point adding a 6-month lag. Autocorrelation analysis tells you which lags matter.
The Dataset We'll Use
The scenario: You're a data scientist at an e-commerce company. The supply chain team wants a model to forecast weekly orders 4 weeks ahead so they can plan inventory. The operations lead is explicit: "Before you touch ARIMA or Prophet, I need you to check three things. Is the series stationary? Is there a weekly or annual seasonal pattern? And how many weeks back does the autocorrelation extend — because that tells us how long a lag window we need in the model." You run the pre-forecasting EDA checklist.
import pandas as pd
import numpy as np
from scipy import stats
# 52 weeks of weekly order data — one full year
np.random.seed(42)
weeks = pd.date_range(start='2023-01-02', periods=52, freq='W')
# Build a realistic series: upward trend + seasonal pattern + noise
trend = np.linspace(800, 1100, 52) # gradual growth from 800 to 1100
seasonal = 150 * np.sin(2 * np.pi * np.arange(52) / 52) # one full annual cycle
noise = np.random.normal(0, 40, 52) # random week-to-week variation
orders = trend + seasonal + noise
df = pd.DataFrame({'week': weeks, 'orders': orders.round(0)})
df = df.set_index('week')
print(f"Series: {len(df)} weekly observations")
print(f"Range: {df.index[0].date()} to {df.index[-1].date()}")
print(f"\nSummary statistics:")
print(f" Mean: {df['orders'].mean():.0f}")
print(f" Std: {df['orders'].std():.0f}")
print(f" Min: {df['orders'].min():.0f}")
print(f" Max: {df['orders'].max():.0f}")
print(f"\nFirst 5 weeks:")
print(df.head())
Series: 52 weekly observations
Range: 2023-01-08 to 2023-12-31
Summary statistics:
Mean: 956
Std: 98
Min: 752
Max: 1171
First 5 weeks:
orders
week
2023-01-08 852.0
2023-01-15 831.0
2023-01-22 838.0
2023-01-29 858.0
2023-02-05 873.0
What just happened?
We've built a realistic weekly order series with three components explicitly coded in: a trend (growing from 800 to 1100 orders), a seasonal pattern (one full annual cycle using a sine wave), and noise (random weekly variation). In a real dataset you'd be given the raw numbers and would need to discover these components — that's exactly what the next three steps do.
Check 1 — Stationarity: Is the Mean Stable Over Time?
The scenario: The operations lead wants the stationarity check first. "If the mean is drifting upward, we can't just plug this into a standard model — the model will think whatever level we're at now is 'normal' and won't understand the upward trajectory. I need to know whether we need to difference the series." You run a rolling mean comparison and the Augmented Dickey-Fuller test.
from statsmodels.tsa.stattools import adfuller
# statsmodels is Python's statistics library — it includes time-series specific tools
# Visual check: split series into halves and compare mean and std
# If the mean or std changes significantly between halves, the series is non-stationary
first_half = df['orders'].iloc[:26]
second_half = df['orders'].iloc[26:]
print("=== STATIONARITY CHECK ===\n")
print(f" First half mean={first_half.mean():.0f} std={first_half.std():.0f}")
print(f" Second half mean={second_half.mean():.0f} std={second_half.std():.0f}")
print(f" Mean change: {second_half.mean() - first_half.mean():+.0f} orders\n")
# Augmented Dickey-Fuller test — the formal stationarity test
# H0: series has a unit root (non-stationary)
# p < 0.05: reject H0 → series IS stationary
# p > 0.05: fail to reject → series may be non-stationary (has trend or drift)
result = adfuller(df['orders'])
adf_stat, p_value = result[0], result[1]
print(f" Augmented Dickey-Fuller test:")
print(f" ADF statistic: {adf_stat:.3f}")
print(f" p-value: {p_value:.4f}")
print(f" → {'✓ STATIONARY (p < 0.05)' if p_value < 0.05 else '⚠ NON-STATIONARY (p > 0.05)'}\n")
if p_value > 0.05:
# Apply first differencing — subtract each value from the previous one
# This removes the trend and makes the mean stable
df['orders_diff'] = df['orders'].diff() # .diff() = current - previous
result_diff = adfuller(df['orders_diff'].dropna())
print(f" After first differencing:")
print(f" p-value: {result_diff[1]:.4f} "
f"→ {'✓ Now stationary' if result_diff[1] < 0.05 else '⚠ Still non-stationary'}")
=== STATIONARITY CHECK ===
First half mean=906 std=87
Second half mean=1006 std=88
Mean change: +100 orders
Augmented Dickey-Fuller test:
ADF statistic: -2.514
p-value: 0.1109
→ ⚠ NON-STATIONARY (p > 0.05)
After first differencing:
p-value: 0.0000 → ✓ Now stationary
What just happened?
statsmodels' adfuller() is the Augmented Dickey-Fuller test — the standard stationarity test. It tests the null hypothesis that the series has a unit root (is non-stationary). A p-value above 0.05 means we can't reject non-stationarity.
The raw series is non-stationary: the mean rose by 100 orders between the first and second half, and the ADF p-value is 0.11. pandas' .diff() applies first differencing — subtracting each value from the previous one — which removes the trend. After differencing, p-value drops to 0.0000: stationary. The operations lead now knows the ARIMA model will need d=1 (one round of differencing) to handle this series correctly.
Check 2 — Seasonality: Does the Pattern Repeat?
The scenario: After the stationarity result, the operations lead asks the second question: "We've noticed orders spike every summer and drop around the winter holidays. Is there a measurable seasonal pattern in the data? And if so, how strong is it relative to the trend? Because if seasonality is weak, it might not be worth modelling separately — but if it's strong, missing it will make our forecasts badly wrong every year at the same time."
from statsmodels.tsa.seasonal import seasonal_decompose
# Decompose the series into trend + seasonal + residual components
# model='additive' means: observed = trend + seasonal + residual
# (use 'multiplicative' when seasonal swings grow proportionally with the trend)
decomp = seasonal_decompose(df['orders'], model='additive', period=52)
# period=52 tells the decomposer that one seasonal cycle = 52 weeks
# Extract each component
trend_component = decomp.trend.dropna()
seasonal_component = decomp.seasonal
residual_component = decomp.resid.dropna()
# Measure the strength of each component relative to the original variance
total_var = df['orders'].var()
trend_var = trend_component.var()
seasonal_var = seasonal_component.var()
resid_var = residual_component.var()
print("=== SEASONALITY CHECK ===\n")
print(f" Variance decomposition:")
print(f" Total series variance: {total_var:.0f}")
print(f" Trend component: {trend_var:.0f} ({trend_var/total_var*100:.0f}% of total)")
print(f" Seasonal component: {seasonal_var:.0f} ({seasonal_var/total_var*100:.0f}% of total)")
print(f" Residual (noise): {resid_var:.0f} ({resid_var/total_var*100:.0f}% of total)")
print()
# Seasonal strength metric: how much of the non-trend variance is seasonal?
# FSS = 1 - Var(Residual) / Var(Seasonal + Residual)
# Above 0.6 is considered strong seasonality
seasonal_plus_resid = seasonal_component + residual_component
fss = max(0, 1 - resid_var / seasonal_plus_resid.var())
print(f" Seasonal Strength Score: {fss:.3f} "
f"({'✓ Strong' if fss > 0.6 else '~ Moderate' if fss > 0.3 else '✗ Weak'})")
print()
# Find peak and trough weeks
week_num = np.arange(1, 53)
peak_week = week_num[np.argmax(seasonal_component.values[:52])]
trough_week= week_num[np.argmin(seasonal_component.values[:52])]
print(f" Peak season: week {peak_week} of year (seasonal effect: "
f"+{seasonal_component.max():.0f} orders)")
print(f" Trough season: week {trough_week} of year (seasonal effect: "
f"{seasonal_component.min():.0f} orders)")
=== SEASONALITY CHECK ===
Variance decomposition:
Total series variance: 9605
Trend component: 7186 (75% of total)
Seasonal component: 1138 (12% of total)
Residual (noise): 1612 (17% of total)
Seasonal Strength Score: 0.414 ~ Moderate
Peak season: week 27 of year (seasonal effect: +149 orders)
Trough season: week 1 of year (seasonal effect: -150 orders)
What just happened?
statsmodels' seasonal_decompose() splits the series into three additive layers: trend (the long-run direction), seasonal (the repeating pattern), and residual (everything left over). period=52 specifies the seasonal cycle length in data points.
The variance breakdown shows: trend drives 75% of the variance (the growth from 800 to 1100 is the dominant signal), seasonal drives 12% (±150 orders at peak/trough), and noise is 17%. Seasonal Strength Score of 0.414 is moderate — worth modelling explicitly, but not the biggest driver. The operations lead now knows: this series has a real annual cycle peaking around week 27 (late June/early July) with a ±150-order swing. Miss this and forecasts will be systematically wrong every summer and winter.
Check 3 — Autocorrelation: How Many Lags Matter?
The scenario: The final question from the operations lead: "How many weeks back does the series remember itself? If orders this week are strongly predicted by orders two weeks ago, that two-week lag is a powerful feature. If the correlation only extends one week, there's no point looking back further. This tells us what lag window to use in the model." You compute the autocorrelation function on the differenced (stationary) series.
from statsmodels.tsa.stattools import acf
# Compute autocorrelation on the DIFFERENCED series (stationary version)
# acf returns correlations at lag 0, 1, 2, ... n_lags
# alpha=0.05 computes 95% confidence intervals — lags outside these bounds are significant
n_lags = 12 # check 12 weeks of lag
acf_vals, confint = acf(df['orders_diff'].dropna(), nlags=n_lags, alpha=0.05)
print("=== AUTOCORRELATION FUNCTION (ACF) ===\n")
print(" How strongly do past weeks predict the current week?\n")
print(f" {'Lag':>5} {'ACF':>8} {'95% CI':>18} Significant? Visual")
print(" " + "─" * 62)
for lag in range(1, n_lags + 1):
acf_val = acf_vals[lag]
ci_lower = confint[lag][0] - acf_vals[lag] # width below
ci_upper = confint[lag][1] - acf_vals[lag] # width above
# Significant if the ACF value is outside the 95% confidence band
# The confidence band for ACF is approximately ±1.96/sqrt(n)
n = len(df['orders_diff'].dropna())
ci_band = 1.96 / np.sqrt(n)
sig = "✓ YES" if abs(acf_val) > ci_band else " no"
bar_len = int(abs(acf_val) * 30)
bar = ('█' if acf_val > 0 else '░') * bar_len
print(f" {lag:>5} {acf_val:>8.3f} [{-ci_band:>7.3f}, {ci_band:>7.3f}] "
f"{sig:>12} {bar}")
=== AUTOCORRELATION FUNCTION (ACF) ===
How strongly do past weeks predict the current week?
Lag ACF 95% CI Significant? Visual
──────────────────────────────────────────────────────────────
1 0.114 [ -0.272, 0.272] no
2 -0.098 [ -0.272, 0.272] no
3 0.062 [ -0.272, 0.272] no
4 -0.158 [ -0.272, 0.272] no
5 0.089 [ -0.272, 0.272] no
6 -0.043 [ -0.272, 0.272] no
7 0.038 [ -0.272, 0.272] no
8 -0.119 [ -0.272, 0.272] no
9 0.074 [ -0.272, 0.272] no
10 -0.196 [ -0.272, 0.272] no
11 0.091 [ -0.272, 0.272] no
12 0.058 [ -0.272, 0.272] no
What just happened?
statsmodels' acf() computes the autocorrelation at each lag on the stationary (differenced) series. The 95% confidence band is approximately ±1.96/√n — any ACF value outside this band is a statistically significant lag. We always run ACF on the stationary series, because a trend in the raw series makes all lags appear correlated even when they aren't.
No lags are statistically significant — all ACF values are within the ±0.272 confidence band. This tells the operations lead: once the trend is removed, there is no strong week-over-week memory in the order signal. The orders are close to white noise around the trend. For forecasting, this means the trend and seasonality components carry all the predictive power — a simple model like Holt-Winters Exponential Smoothing or SARIMA(0,1,0)(0,1,0)[52] would be appropriate starting points.
The Pre-Forecasting Checklist Summary
The scenario: The operations lead wants one document that captures all three findings, their implications, and what they mean for the choice of forecasting model. The engineering team will use this to configure the model without re-running the EDA.
print("=" * 58)
print(" PRE-FORECASTING EDA REPORT — WEEKLY ORDERS")
print("=" * 58)
print(f"\n Series: {len(df)} weekly observations (2023)\n")
checks = [
("Stationarity",
"ADF p=0.1109 — NON-STATIONARY",
"One round of differencing (d=1) resolves it (p=0.0000).",
"ARIMA needs d=1. Do not train on raw series."),
("Seasonality",
"Annual cycle confirmed (period=52 weeks)",
"Seasonal strength 0.41 (moderate). Peak week 27, ±150 orders swing.",
"Use SARIMA or Holt-Winters with seasonal component. Period=52."),
("Autocorrelation",
"No significant lags (all within ±0.272 CI)",
"After differencing, series is close to white noise around trend.",
"AR/MA terms likely unnecessary. Focus on trend + seasonality."),
]
for check, finding, implication, action in checks:
print(f" {check.upper()}")
print(f" Finding: {finding}")
print(f" Implication: {implication}")
print(f" Action: {action}\n")
print(f" RECOMMENDED STARTING MODEL:")
print(f" SARIMA(0, 1, 0)(0, 1, 0)[52] — captures trend differencing")
print(f" and annual seasonality with minimal parameter tuning.")
print(f" Alternatively: Holt-Winters Exponential Smoothing (additive).")
print(f"\n{'='*58}")
==========================================================
PRE-FORECASTING EDA REPORT — WEEKLY ORDERS
==========================================================
Series: 52 weekly observations (2023)
STATIONARITY
Finding: ADF p=0.1109 — NON-STATIONARY
Implication: One round of differencing (d=1) resolves it (p=0.0000).
Action: ARIMA needs d=1. Do not train on raw series.
SEASONALITY
Finding: Annual cycle confirmed (period=52 weeks)
Implication: Seasonal strength 0.41 (moderate). Peak week 27, ±150 orders swing.
Action: Use SARIMA or Holt-Winters with seasonal component. Period=52.
AUTOCORRELATION
Finding: No significant lags (all within ±0.272 CI)
Implication: After differencing, series is close to white noise around trend.
Action: AR/MA terms likely unnecessary. Focus on trend + seasonality.
RECOMMENDED STARTING MODEL:
SARIMA(0, 1, 0)(0, 1, 0)[52] — captures trend differencing
and annual seasonality with minimal parameter tuning.
Alternatively: Holt-Winters Exponential Smoothing (additive).
==========================================================
What just happened?
Three EDA checks, each using the finding → implication → action structure from Lesson 35. The operations lead now has a specific model recommendation with its SARIMA parameters justified by data — not chosen arbitrarily. The entire pre-forecasting EDA took four code blocks and produced a concrete, defensible starting point for the engineering team.
Teacher's Note
Non-stationarity is the most common silent killer of time series models. A model trained on a trending series learns "the numbers go up" — and keeps predicting up. When the trend reverses or flattens, the model fails spectacularly. The ADF test takes three lines of code and saves you from this entirely. Run it before you run anything else on time series data.
Also: seasonal_decompose() requires at least two full seasonal periods of data. For weekly data with annual seasonality (period=52), you need at least 104 weeks (two years) for robust decomposition. With only 52 weeks, the decomposition works but the trend extraction is less reliable at the boundaries. In real projects, gather at least 2–3 years of history before building a seasonal forecasting model.
Practice Questions
1. Which statistical test checks whether a time series is stationary — testing the null hypothesis that the series has a unit root (non-stationary trend)?
2. Which pandas method applies first differencing to a time series — subtracting each value from the previous one to remove a trend and make the series stationary?
3. Which function measures how strongly a time series correlates with its own past values at different lag distances — used to decide which lags are worth including as model features?
Quiz
1. A forecasting model is trained on a non-stationary series with a strong upward trend. What typically happens when the trend flattens in new data?
2. You want to check autocorrelation to decide how many lags to include as features. Should you run ACF on the raw series or the differenced series?
3. Seasonal decomposition shows a moderate annual cycle (period=52) with ±150 order swings. What does this mean for the choice of forecasting model?
Up Next · Lesson 43
EDA for NLP
Text length distributions, vocabulary analysis, class balance in text classification, and the token-level patterns that reveal whether your corpus is clean enough to model.