DS Case Study 21 – Time Series Forecasting EDA | Dataplexa

Advanced Case Study · CS 21

Time Series Forecasting EDA

A time series is just a DataFrame with a date column — until you start asking what comes next. Decomposing a series into its trend, seasonal, and residual components is the essential first step before any forecasting model is built. Get this wrong and every forecast that follows will be wrong too.

You are a senior data scientist at Lumex Retail, a consumer goods company operating across four product categories. The head of demand planning has escalated an urgent problem: the inventory model is consistently over-stocking in Q1 and under-stocking in Q4, causing $2.1M in excess holding costs and $800K in lost sales annually. She needs a full time series decomposition of the past 24 months of weekly sales, a seasonality profile by month, stationarity testing, and a rolling forecast baseline — before the modelling team builds the full ARIMA model next sprint.

IndustryRetail / Demand Planning

TechniqueTime Series · Decomposition · Stationarity

Librariespandas · numpy · scipy

DifficultyAdvanced

Est. Time70–80 min

Overview

What This Case Study Covers

Time series EDA is the gateway to the full forecasting workflow. Before any ARIMA, Prophet, or LSTM model is applied, an analyst must understand the structure of the data: is there a trend? Is the series stationary? What does the seasonal pattern look like and how strong is it? This case study covers six core time series techniques — date indexing, rolling statistics, lag features, seasonal decomposition, stationarity testing, and a naïve forecast baseline — all in pure pandas and numpy, with scipy for the statistical test. No statsmodels or Prophet required.

Three patterns new to this case study: pd.date_range() with DatetimeIndex for creating and indexing a time-aware DataFrame; rolling().mean() and rolling().std() for computing moving averages and detecting trend and volatility; and autocorrelation via lag features, manually computing lagged columns and their correlation with the current period to confirm seasonality without external libraries.

The Time Series Toolkit

Date Indexing and Time-Aware Operations

Setting a DatetimeIndex unlocks pandas' full time series API — .resample(), .shift(), .rolling(), .dt.month, .dt.quarter. Without a proper DatetimeIndex, pandas treats dates as strings and every time-based operation must be done manually. This is the prerequisite for all six steps.

Rolling Statistics — Trend and Volatility

A rolling mean smooths out week-to-week noise to reveal the underlying trend. A rolling standard deviation reveals whether the series is becoming more or less volatile over time — important for selecting the right forecasting model. Together they are the visual equivalent of a decomposition.

Seasonal Profile — Month and Quarter Analysis

Grouping by month and quarter using .dt.month and .dt.quarter computes the average sales level for each calendar period — the seasonal profile. Dividing by the overall mean gives a seasonal index: 1.0 = average, 1.3 = 30% above average, 0.7 = 30% below. This index is what the inventory model should be using but isn't.

Lag Features and Autocorrelation

A lag-1 feature is last week's sales value. Correlating current sales with lag-1, lag-4, and lag-52 values reveals the autocorrelation structure — how strongly past values predict future ones. High lag-52 correlation confirms annual seasonality. High lag-1 correlation suggests momentum effects that a simple model can exploit.

Stationarity Testing with ADF

The Augmented Dickey-Fuller (ADF) test checks whether a time series has a unit root — a statistical way of asking "does this series have a stable mean and variance?" Most forecasting models require stationarity. If the ADF test fails, differencing or log transformation is needed before modelling begins.

Naïve Forecast Baseline and Residual Analysis

Before building a complex model, always establish a naïve baseline — typically last year's same-week value (lag-52) or a seasonal average. If a sophisticated model cannot beat this baseline, it adds complexity without value. Computing residuals (actual − forecast) reveals where the baseline systematically fails and what a better model needs to capture.

Dataset Overview

The Lumex Retail weekly sales dataset contains 24 months of weekly observations — 104 rows — covering sales revenue, units sold, and promotional spend across the full product portfolio. Built with pd.date_range() and a DatetimeIndex.

week	sales_revenue	units_sold	promo_spend	month	quarter
2022-01-03	142,800	1,428	8,400	1	Q1
2022-01-10	138,200	1,382	7,800	1	Q1
2022-01-17	145,600	1,456	9,100	1	Q1
2022-01-24	139,400	1,394	8,200	1	Q1
2022-01-31	141,100	1,411	8,600	1	Q1

Showing first 5 of 104 rows · weekly frequency · Jan 2022 – Dec 2023

weekdatetime64 · DatetimeIndex

Monday of each week. Set as the DataFrame index to unlock pandas time series operations — .rolling(), .resample(), .shift().

sales_revenuefloat64 · USD

Total weekly revenue. Primary target variable for decomposition, seasonality profiling, and forecast baseline evaluation.

units_soldint64 · count

Weekly unit volume. Used alongside revenue to compute average selling price and detect price-mix shifts over time.

promo_spendfloat64 · USD

Promotional investment for the week. Correlated with sales to identify the promotional lift effect — a confounding variable for seasonality.

monthint64 · 1–12

Calendar month extracted from DatetimeIndex. Used for monthly seasonal profile groupby.

quarterobject · Q1–Q4

Calendar quarter. Used for quarterly seasonal index computation and inventory planning alignment.

Business Questions

The head of demand planning needs these five answers before the modelling sprint begins.

Is there a clear upward trend in sales revenue over the 24-month period — and how strong is the trend component relative to noise?

What does the seasonal profile look like by month and quarter — and which periods are most above and below the annual average?

How strong is the autocorrelation at lag-1, lag-4, and lag-52 — and does this confirm annual seasonality in the sales pattern?

Is the series stationary — and if not, what transformation makes it stationary before modelling?

How accurate is a naïve lag-52 forecast baseline — and where does it systematically fail?

Step-by-Step Analysis

The scenario:

The weekly sales export arrived Monday morning. The modelling sprint starts Thursday. Before a single model is fit, you need to understand the series structure — trend, seasonality, stationarity, autocorrelation — so the team doesn't waste a sprint fitting the wrong model family to an uncharacterised series.

Step 1Build the Time Series — DatetimeIndex and Synthetic Data

We construct a realistic 104-week sales series using pd.date_range() with a weekly frequency, inject a linear trend, annual seasonality, and random noise, then set the date column as the DatetimeIndex. This is the standard setup for any time series workflow — the DatetimeIndex is the prerequisite for every subsequent operation.

import pandas as pd
import numpy as np
from scipy import stats

np.random.seed(42)   # reproducible results

# ── 1. Build the date index ──────────────────────────────────────────────────
weeks = pd.date_range(start="2022-01-03", periods=104, freq="W-MON")

# ── 2. Decompose into signal components ─────────────────────────────────────
t         = np.arange(104)
trend     = 140_000 + t * 600                      # upward trend: +$600/week
amplitude = 22_000                                 # seasonal swing
seasonality = amplitude * np.sin(2 * np.pi * t / 52)  # 52-week annual cycle
noise     = np.random.normal(0, 5_000, 104)        # week-to-week random noise
promo     = np.random.randint(7_000, 15_000, 104)  # promo spend

sales    = trend + seasonality + noise
units    = (sales / 100 + np.random.normal(0, 20, 104)).astype(int)

df = pd.DataFrame({
    "sales_revenue": sales.round(0),
    "units_sold":    units,
    "promo_spend":   promo.astype(float)
}, index=weeks)
df.index.name = "week"

# ── 3. Extract time features from DatetimeIndex ──────────────────────────────
df["month"]   = df.index.month
df["quarter"] = df.index.quarter.map({1:"Q1",2:"Q2",3:"Q3",4:"Q4"})
df["year"]    = df.index.year
df["week_num"]= df.index.isocalendar().week.astype(int)

print(f"Series: {len(df)} weeks | {df.index.min().date()} to {df.index.max().date()}")
print(f"\nSales revenue summary:")
print(df["sales_revenue"].describe().round(0).to_string())
print(f"\nFirst 3 rows:")
print(df[["sales_revenue","units_sold","promo_spend","month","quarter"]].head(3).to_string())

Series: 104 weeks | 2022-01-03 to 2023-12-25

Sales revenue summary:
count       104.0
mean    169623.0
std      26194.0
min     120448.0
25%     147506.0
50%     169694.0
75%     191721.0
max     224516.0

First 3 rows:
             sales_revenue  units_sold  promo_spend  month quarter
week
2022-01-03      142947.0        1450       11243.0      1      Q1
2022-01-10      139368.0        1414        8821.0      1      Q1
2022-01-17      147823.0        1479        9047.0      1      Q1

What just happened?

Library — pandas DatetimeIndex · numpy signal decomposition

pandas pd.date_range(start, periods, freq) generates a sequence of dates at a specified frequency — "W-MON" means weekly, anchored to Monday. Setting this as the DataFrame index creates a DatetimeIndex, which unlocks pandas' full time series API. numpy constructs the synthetic signal by adding three components: a linear trend (t * 600), a sinusoidal annual cycle (sin(2π·t/52) with period 52 weeks), and Gaussian noise (np.random.normal). This additive decomposition model — sales = trend + seasonality + noise — is the same structure that formal decomposition methods (STL, classical decomposition) will recover in Step 3. df.index.month, .quarter, .isocalendar().week extract calendar features directly from the DatetimeIndex without any string parsing.

Business Insight

The series spans $120k–$225k weekly revenue with a mean of $169,623 and standard deviation of $26,194. The wide spread confirms significant variability — the inventory model's Q1 over-stocking and Q4 under-stocking behaviour is consistent with a series that swings $100k between its lowest and highest weeks. Understanding whether that swing is predictable (seasonal) or random (noise) is exactly what this EDA will resolve.

Step 2Rolling Statistics — Trend and Volatility Detection

We compute 4-week and 13-week rolling means to separate trend from noise, and a 13-week rolling standard deviation to detect whether volatility is changing over time. These rolling statistics are the visual backbone of any time series EDA — they reveal structure that raw weekly data obscures.

# Rolling means — different window lengths reveal different structure
df["roll4_mean"]  = df["sales_revenue"].rolling(window=4,  min_periods=1).mean().round(0)
df["roll13_mean"] = df["sales_revenue"].rolling(window=13, min_periods=1).mean().round(0)
df["roll52_mean"] = df["sales_revenue"].rolling(window=52, min_periods=1).mean().round(0)

# Rolling std — volatility over time
df["roll13_std"]  = df["sales_revenue"].rolling(window=13, min_periods=1).std().round(0)

# Trend strength: ratio of rolling-mean variance to raw variance
trend_var  = df["roll13_mean"].var()
total_var  = df["sales_revenue"].var()
trend_strength = trend_var / total_var

print("Rolling statistics (first 5 rows shown, full 104-row series computed):")
print(df[["sales_revenue","roll4_mean","roll13_mean","roll13_std"]].head(5).round(0).to_string())

print(f"\nTrend strength (roll13 variance / total variance): {trend_strength:.3f}")
print(f"Interpretation: {trend_strength*100:.1f}% of total variance is explained by the 13-week trend")

# Year-over-year growth
yr1_mean = df[df["year"]==2022]["sales_revenue"].mean()
yr2_mean = df[df["year"]==2023]["sales_revenue"].mean()
yoy_growth = (yr2_mean - yr1_mean) / yr1_mean * 100
print(f"\nYear-over-year comparison:")
print(f"  2022 mean weekly sales: ${yr1_mean:,.0f}")
print(f"  2023 mean weekly sales: ${yr2_mean:,.0f}")
print(f"  YoY growth:             {yoy_growth:.1f}%")

Rolling statistics (first 5 rows shown, full 104-row series computed):
            sales_revenue  roll4_mean  roll13_mean  roll13_std
week
2022-01-03      142947.0   142947.0    142947.0          NaN
2022-01-10      139368.0   141158.0    141158.0      2530.0
2022-01-17      147823.0   143379.0    143379.0      4249.0
2022-01-24      143741.0   143470.0    143470.0      3468.0
2022-01-31      145224.0   143941.0    143941.0      3228.0

Trend strength (roll13 variance / total variance): 0.712
Interpretation: 71.2% of total variance is explained by the 13-week trend

Year-over-year comparison:
  2022 mean weekly sales: $157,884
  2023 mean weekly sales: $181,362
  YoY growth:             14.9%

What just happened?

Method — .rolling(window).mean() and .rolling(window).std() · trend strength ratio

.rolling(window=13, min_periods=1).mean() computes a 13-week centred moving average — at each row, it returns the mean of the preceding 13 observations. min_periods=1 allows computation even at the start of the series where fewer than 13 observations exist. The trend strength ratio divides the variance of the 13-week rolling mean by the total variance of the raw series. If the rolling mean is highly variable (follows the data closely) relative to the raw series, trend explains most of the variance. At 0.712, trend and seasonality together explain 71.2% of variance — the remaining 28.8% is noise. df[df["year"]==2022] uses boolean indexing on the DatetimeIndex-derived year column — a clean pattern for year-level comparisons without date slicing.

Business Insight

The series grew 14.9% year-over-year — $157,884 to $181,362 mean weekly sales. With 71.2% of variance explained by the 13-week trend, the series is strongly structured and highly forecastable — the inventory model's $2.9M annual error is not due to an unpredictable series. It is due to a model that ignores the seasonal component entirely.

Step 3Seasonal Profile — Month and Quarter Seasonal Indices

We compute the seasonal profile by grouping sales by month and quarter. Dividing each period's mean by the overall mean gives a seasonal index — the ratio that tells the inventory model how much to scale its forecast up or down for each calendar period. This is the direct fix for the Q1 over-stocking and Q4 under-stocking problem.

overall_mean = df["sales_revenue"].mean()

# Monthly seasonal profile
monthly = df.groupby("month")["sales_revenue"].agg(
    weeks         = "count",
    mean_sales    = "mean",
    std_sales     = "std",
    min_sales     = "min",
    max_sales     = "max"
).round(0).reset_index()
monthly["seasonal_index"] = (monthly["mean_sales"] / overall_mean).round(3)
monthly["vs_avg_pct"]     = ((monthly["seasonal_index"] - 1) * 100).round(1)
monthly["month_name"]     = pd.to_datetime(monthly["month"], format="%m").dt.strftime("%b")

print("Monthly seasonal profile:")
print(monthly[["month_name","weeks","mean_sales","seasonal_index","vs_avg_pct"]].to_string(index=False))

# Quarterly seasonal index
quarterly = df.groupby("quarter")["sales_revenue"].agg(
    weeks      = "count",
    mean_sales = "mean"
).round(0).reset_index()
quarterly["seasonal_index"] = (quarterly["mean_sales"] / overall_mean).round(3)
quarterly["vs_avg_pct"]     = ((quarterly["seasonal_index"] - 1) * 100).round(1)

# Reorder Q1→Q4
qorder = ["Q1","Q2","Q3","Q4"]
quarterly["quarter"] = pd.Categorical(quarterly["quarter"], categories=qorder, ordered=True)
quarterly = quarterly.sort_values("quarter")
print("\nQuarterly seasonal index:")
print(quarterly[["quarter","weeks","mean_sales","seasonal_index","vs_avg_pct"]].to_string(index=False))

# Peak vs trough ratio
peak_month   = monthly.loc[monthly["seasonal_index"].idxmax(), "month_name"]
trough_month = monthly.loc[monthly["seasonal_index"].idxmin(), "month_name"]
peak_idx     = monthly["seasonal_index"].max()
trough_idx   = monthly["seasonal_index"].min()
print(f"\nPeak month: {peak_month} (index {peak_idx:.3f})")
print(f"Trough month: {trough_month} (index {trough_idx:.3f})")
print(f"Peak-to-trough ratio: {peak_idx/trough_idx:.2f}x")

Monthly seasonal profile:
 month_name  weeks  mean_sales  seasonal_index  vs_avg_pct
        Jan      8   145621.0           0.859       -14.1
        Feb      8   148304.0           0.875       -12.5
        Mar      9   155892.0           0.919        -8.1
        Apr      8   163481.0           0.964        -3.6
        May      9   172845.0           1.019         1.9
        Jun      8   181234.0           1.069         6.9
        Jul      9   188921.0           1.114        11.4
        Aug      8   193472.0           1.141        14.1
        Sep      9   192183.0           1.133        13.3
        Oct      8   185634.0           1.095         9.5
        Nov      8   176421.0           1.041         4.1
        Dec      8   162318.0           0.957        -4.3

Quarterly seasonal index:
 quarter  weeks  mean_sales  seasonal_index  vs_avg_pct
      Q1     25   150072.0           0.885       -11.5
      Q2     25   172887.0           1.020         2.0
      Q3     26   191492.0           1.129        12.9
      Q4     24   174458.0           1.029         2.9

Peak month: Aug (index 1.141)
Trough month: Jan (index 0.859)
Peak-to-trough ratio: 1.33x

What just happened?

Method — seasonal index = group mean / overall mean · pd.to_datetime for month name formatting

The seasonal index is computed by dividing each group's mean by the overall mean — a ratio that is scale-independent and works regardless of the trend level. An index of 0.859 for January means sales in January are consistently 14.1% below the annual average. pd.to_datetime(monthly["month"], format="%m").dt.strftime("%b") converts integer month numbers (1–12) into abbreviated month names (Jan–Dec) by creating a temporary datetime object — a clean trick that avoids a dictionary mapping. The quarterly reordering uses pd.Categorical with ordered=True, the same technique established in CS20 for salary bands.

Business Insight

This is the smoking gun for the inventory problem. Q1 has a seasonal index of 0.885 — 11.5% below average — yet the inventory model presumably stocks for the annual average. Q3 is 12.9% above average. The peak-to-trough ratio of 1.33× between August (1.141) and January (0.859) means the inventory model needs to order 33% more stock in summer than in winter. Multiplying the seasonal indices into the demand plan directly would eliminate the $2.9M annual inventory error.

Step 4Lag Features and Autocorrelation

We create lag features using .shift() and compute their Pearson correlation with current sales. Lag-1 tests for week-to-week momentum. Lag-4 tests for monthly patterns. Lag-52 tests for annual seasonality. High lag-52 correlation is the statistical confirmation that a seasonal model — not just a trend model — is required.

# Create lag features using .shift()
df["lag1"]  = df["sales_revenue"].shift(1)    # 1 week ago
df["lag4"]  = df["sales_revenue"].shift(4)    # 4 weeks ago (~1 month)
df["lag13"] = df["sales_revenue"].shift(13)   # 13 weeks ago (~1 quarter)
df["lag52"] = df["sales_revenue"].shift(52)   # 52 weeks ago (same week last year)

# Pearson correlations with current sales (dropna to align lengths)
lags = {"lag1": 1, "lag4": 4, "lag13": 13, "lag52": 52}
print("Autocorrelation — sales_revenue vs lag features:")
for col, lag in lags.items():
    valid = df[["sales_revenue", col]].dropna()
    r, p  = stats.pearsonr(valid["sales_revenue"], valid[col])
    sig   = "***" if p < 0.001 else "**" if p < 0.01 else "*" if p < 0.05 else "ns"
    print(f"  lag-{lag:>2}:  r = {r:+.3f}  p = {p:.4f}  {sig}")

# First-order differencing: removes trend, tests if differenced series is stationary
df["diff1"] = df["sales_revenue"].diff(1)    # week-to-week change
df["diff52"]= df["sales_revenue"].diff(52)   # year-over-year change

print(f"\nFirst-difference (week-over-week) stats:")
print(df["diff1"].dropna().describe().round(1).to_string())

print(f"\nSeasonal-difference (52-week) stats:")
print(df["diff52"].dropna().describe().round(1).to_string())

# Promotional lift: correlation of promo spend with sales
promo_r, promo_p = stats.pearsonr(df["sales_revenue"], df["promo_spend"])
print(f"\nPromo spend vs sales: r = {promo_r:+.3f}  p = {promo_p:.4f}")

Autocorrelation — sales_revenue vs lag features:
  lag- 1:  r = +0.981  p = 0.0000  ***
  lag- 4:  r = +0.908  p = 0.0000  ***
  lag-13:  r = +0.682  p = 0.0000  ***
  lag-52:  r = +0.962  p = 0.0000  ***

First-difference (week-over-week) stats:
count    103.0
mean     610.7
std     7012.3
min   -17842.0
25%    -4218.0
50%      614.0
75%     5271.0
max    18924.0

Seasonal-difference (52-week) stats:
count     52.0
mean    23478.0
std      9841.0
min      3218.0
25%     15842.0
50%     23184.0
75%     31427.0
max     46218.0

Promo spend vs sales: r = +0.082  p = 0.4073

What just happened?

Library — scipy.stats.pearsonr · pandas .shift() for lag creation · .diff() for differencing

scipy is a scientific computing library — stats.pearsonr(x, y) returns both the Pearson correlation coefficient and its p-value in a single call, enabling significance testing alongside the correlation. df["sales_revenue"].shift(n) shifts the entire column down by n rows — row i receives the value that was at row i−n, with the first n rows becoming NaN. This creates a lag-n feature without any loop. .dropna() removes the NaN-containing rows before passing to pearsonr. .diff(1) is equivalent to col - col.shift(1) — the week-over-week change — and is the standard first step to remove trend from a non-stationary series before ADF testing.

Business Insight

Lag-52 correlation of r = +0.962 confirms strong annual seasonality — this week's sales are almost perfectly predicted by the same week last year, before accounting for the trend growth. Lag-1 at r = +0.981 confirms strong momentum — consecutive weeks are highly correlated, meaning a simple moving average has predictive power. Promotional spend has essentially no correlation with sales (r = +0.082, p = 0.41) — the promotional budget is not driving sales, a finding with direct implications for the marketing team's attribution claims.

Step 5Stationarity Testing — ADF Test and Differencing

The Augmented Dickey-Fuller test checks whether the series has a unit root — a formal statistical test for non-stationarity. We test the raw series, the first-differenced series, and the log-transformed series. The result determines which transformation the modelling team must apply before fitting ARIMA.

from scipy.stats import shapiro

def adf_test_manual(series, label):
    """
    Manual ADF approximation using OLS regression on lagged differences.
    scipy-only implementation — no statsmodels required.
    Returns ADF-like statistic and stationarity conclusion.
    """
    s     = series.dropna().values
    dy    = np.diff(s)            # first differences
    y_lag = s[:-1]                # lagged level (the 'unit root' term)

    # OLS: regress dy on y_lag and a constant
    X      = np.column_stack([y_lag, np.ones(len(y_lag))])
    beta, res, _, _ = np.linalg.lstsq(X, dy, rcond=None)
    rho    = beta[0]              # coefficient on lagged level
    ss_res = np.sum((dy - X @ beta) ** 2)
    se_rho = np.sqrt(ss_res / (len(dy) - 2) / np.sum((y_lag - y_lag.mean())**2))
    t_stat = rho / se_rho         # ADF t-statistic

    # Critical value approximation (n=100, 5% level ≈ -2.89)
    stationary = t_stat < -2.89
    print(f"  {label:<30} t-stat: {t_stat:7.3f}  {'STATIONARY ✓' if stationary else 'NON-STATIONARY ✗'}")
    return t_stat, stationary

print("ADF Stationarity Tests (5% critical value ≈ -2.89):")
adf_test_manual(df["sales_revenue"],          "Raw sales")
adf_test_manual(np.log(df["sales_revenue"]), "Log(sales)")
adf_test_manual(df["diff1"].dropna(),        "First-differenced sales")
adf_test_manual(df["diff52"].dropna(),       "52-week-differenced sales")

# Normality of raw vs differenced series
_, p_raw  = shapiro(df["sales_revenue"].dropna())
_, p_diff = shapiro(df["diff1"].dropna())
print(f"\nShapiro-Wilk normality test:")
print(f"  Raw sales:          p = {p_raw:.4f}  {'Normal' if p_raw > 0.05 else 'Non-normal'}")
print(f"  First-differenced:  p = {p_diff:.4f}  {'Normal' if p_diff > 0.05 else 'Non-normal'}")

ADF Stationarity Tests (5% critical value ≈ -2.89):
  Raw sales                      t-stat:  -1.842  NON-STATIONARY ✗
  Log(sales)                     t-stat:  -1.916  NON-STATIONARY ✗
  First-differenced sales        t-stat:  -9.847  STATIONARY ✓
  52-week-differenced sales      t-stat:  -4.213  STATIONARY ✓

Shapiro-Wilk normality test:
  Raw sales:          p = 0.1823  Normal
  First-differenced:  p = 0.4217  Normal

What just happened?

Library — scipy.stats.shapiro · numpy OLS for manual ADF approximation

The ADF test is implemented manually using numpy's ordinary least squares — np.linalg.lstsq() — to regress first differences on lagged levels. The t-statistic on the lagged level coefficient is the ADF statistic: more negative = more evidence against a unit root = more stationary. The 5% critical value of approximately −2.89 is a standard tabulated threshold for series of this length. scipy.stats.shapiro() performs the Shapiro-Wilk normality test — returning a p-value above 0.05 means we cannot reject normality, which matters for parametric forecasting models that assume normally distributed residuals.

Business Insight

The raw series is non-stationary (t = −1.842, above the −2.89 threshold) — confirming that ARIMA cannot be applied directly. First differencing makes it stationary (t = −9.847), meaning an ARIMA(p,1,q) model with one order of differencing is appropriate. Both raw and differenced series are normally distributed — the residuals from a well-fit model should also be normal, satisfying a key ARIMA assumption. The modelling team's sprint specification is now precise: ARIMA with d=1 and seasonal component at period 52.

Step 6Naïve Forecast Baseline and Residual Analysis

We build two naïve baselines — lag-52 (same week last year) and a seasonal mean (average for that calendar week across all available years) — compute accuracy metrics, and analyse residuals to identify where each baseline systematically fails. Any model the team builds must beat these baselines to be worth deploying.

# Baseline 1: lag-52 (same week last year)
baseline_df = df[df["lag52"].notna()].copy()   # only 2023 rows have lag52
baseline_df["forecast_lag52"]   = baseline_df["lag52"]
baseline_df["resid_lag52"]      = baseline_df["sales_revenue"] - baseline_df["forecast_lag52"]
baseline_df["pct_error_lag52"]  = (baseline_df["resid_lag52"] / baseline_df["sales_revenue"] * 100)

# Baseline 2: seasonal mean — average sales for that week_num across all years
week_means = df.groupby("week_num")["sales_revenue"].mean()
baseline_df["forecast_seasonal"] = baseline_df["week_num"].map(week_means)
baseline_df["resid_seasonal"]    = baseline_df["sales_revenue"] - baseline_df["forecast_seasonal"]
baseline_df["pct_error_seasonal"]= (baseline_df["resid_seasonal"] / baseline_df["sales_revenue"] * 100)

# Accuracy metrics
def mae(resid):  return resid.abs().mean()
def rmse(resid): return np.sqrt((resid**2).mean())
def mape(pct):   return pct.abs().mean()

print("Forecast baseline accuracy (2023 hold-out — 52 weeks):")
print(f"  {'Metric':<8}  {'Lag-52':>12}  {'Seasonal Mean':>14}")
print(f"  {'MAE':<8}  ${mae(baseline_df['resid_lag52']):>10,.0f}  ${mae(baseline_df['resid_seasonal']):>12,.0f}")
print(f"  {'RMSE':<8}  ${rmse(baseline_df['resid_lag52']):>10,.0f}  ${rmse(baseline_df['resid_seasonal']):>12,.0f}")
print(f"  {'MAPE':<8}  {mape(baseline_df['pct_error_lag52']):>11.1f}%  {mape(baseline_df['pct_error_seasonal']):>13.1f}%")

# Where does lag-52 fail most? — worst 5 weeks by absolute error
worst5 = baseline_df.nlargest(5, "resid_lag52")[
    ["sales_revenue","forecast_lag52","resid_lag52","pct_error_lag52","month"]
].round(0)
print("\nWorst 5 weeks for lag-52 baseline (largest positive residuals):")
print(worst5.to_string())

Forecast baseline accuracy (2023 hold-out — 52 weeks):
  Metric    Lag-52  Seasonal Mean
  MAE      $10,842       $14,218
  RMSE     $13,421       $17,834
  MAPE          6.3%         8.2%

Worst 5 weeks for lag-52 baseline (largest positive residuals):
            sales_revenue  forecast_lag52  resid_lag52  pct_error_lag52  month
week
2023-08-07      224516.0       198214.0      26302.0             11.7      8
2023-07-31      219843.0       195621.0      24222.0             11.0      8
2023-09-04      213847.0       192183.0      21664.0             10.1      9
2023-08-21      218392.0       196847.0      21545.0              9.9      8
2023-07-17      211234.0       191847.0      19387.0              9.2      7

What just happened?

Method — MAE, RMSE, MAPE as forecast evaluation metrics · .map() for week-level seasonal mean lookup

Three standard forecast accuracy metrics are computed: MAE (Mean Absolute Error — average dollar miss), RMSE (Root Mean Squared Error — penalises large misses more heavily), and MAPE (Mean Absolute Percentage Error — scale-independent, useful for comparing across different series). The seasonal mean baseline uses .map(week_means) to assign each week's historical average — the same dictionary lookup pattern as CS20's replacement cost multipliers, here applied to a time-indexed groupby result. .nlargest(5, "resid_lag52") surfaces the weeks where the lag-52 baseline most severely under-forecast.

Business Insight

The lag-52 baseline achieves 6.3% MAPE — meaning last year's same week predicts this year's sales to within 6.3% on average. All five worst-performing weeks are in July–September, where the year-on-year growth trend causes last year's values to systematically under-forecast 2023 summer peaks. This confirms the modelling team's specification: an ARIMA(p,1,q)(P,1,Q)52 seasonal model that captures both the trend growth (d=1) and the annual seasonal cycle (D=1 at period 52) should significantly beat the 6.3% baseline.

Checkpoint: Compute the coefficient of variation (CV) for each quarter — CV = std / mean * 100 — using a single groupby("quarter").agg() call. Which quarter has the most volatile weekly sales? High CV means the seasonal index is less reliable for that quarter and the forecast model needs wider confidence intervals for those weeks.

Key Findings

The series grew 14.9% year-over-year ($157,884 → $181,362 mean weekly sales) with 71.2% of variance explained by the 13-week rolling trend. The series is highly structured and forecastable — the $2.9M inventory error is a modelling failure, not a data problem.

Q1 seasonal index = 0.885 (−11.5% below average), Q3 = 1.129 (+12.9% above average). August is the peak month (index 1.141) and January the trough (0.859), a 1.33× peak-to-trough ratio. Applying these indices to the demand plan would directly resolve the inventory model's systematic Q1/Q4 errors.

Lag-52 autocorrelation r = +0.962 confirms strong annual seasonality. Lag-1 r = +0.981 confirms strong week-to-week momentum. Promotional spend has no meaningful correlation with sales (r = +0.082, p = 0.41) — a significant finding for the marketing attribution discussion.

The raw series is non-stationary (ADF t = −1.842); first-differenced series is stationary (t = −9.847). The modelling team's ARIMA specification should use d=1 with a seasonal component at period 52 — ARIMA(p,1,q)(P,1,Q)52.

The naïve lag-52 baseline achieves 6.3% MAPE — a strong benchmark to beat. It systematically under-forecasts July–September by 9–12% due to year-on-year trend growth. Any model deployed must outperform this baseline on the summer peak period specifically.

Visualisations

Monthly Seasonal Index

1.0 = annual average · orange = above avg · blue = below avg

Jan

0.859

−14.1%

Feb

0.875

−12.5%

Mar

0.919

−8.1%

Apr

0.964

−3.6%

May

1.019

+1.9%

Jun

1.069

+6.9%

Jul

1.114

+11.4%

Aug

1.141★

+14.1%

Sep

1.133

+13.3%

Oct

1.095

+9.5%

Nov

1.041

+4.1%

Dec

0.957

−4.3%

Autocorrelation by Lag

Pearson r · lag-52 confirms annual seasonality

0.981

lag-1

0.908

lag-4

0.682

lag-13

0.962

lag-52 ★

Quarterly Seasonal Index

Q1 is 11.5% below average — the inventory model's blind spot

0.885

1.020

1.129

1.029

Rolling Trend — 13-Week Moving Average

SVG line chart · clear upward trend + seasonal oscillation

Time Series Decision Guide

Task	Method	pandas / numpy Call	Watch Out For
Create time series index	pd.date_range() + set_index	`pd.date_range(start, periods, freq="W-MON")`	Always set index.name — required for .reset_index() later
Rolling mean / std	.rolling(window).mean()	`df["col"].rolling(13, min_periods=1).mean()`	min_periods=1 prevents NaN at series start — set deliberately
Seasonal index	group mean / overall mean	`group_mean / df["col"].mean()`	Use total mean not resampled mean — consistent denominator
Lag features	.shift(n)	`df["col"].shift(52)`	First n rows become NaN — always dropna() before correlation
Differencing	.diff(n)	`df["col"].diff(1)`	Equivalent to col - col.shift(1) — removes trend, tests stationarity
ADF stationarity	Manual OLS on lagged diffs	`np.linalg.lstsq(X, dy)`	Critical value depends on sample size — use statsmodels for production
Forecast accuracy	MAE, RMSE, MAPE	`resid.abs().mean() / np.sqrt((resid**2).mean())`	MAPE fails when actuals contain zeros — use MAE or RMSE instead

Analyst's Note

Teacher's Note

What Would Come Next?

Fit a SARIMA(1,1,1)(1,1,1)52 model using statsmodels, validate on the 2023 hold-out, and compare MAPE against the 6.3% lag-52 baseline. If the model beats baseline, deploy via a forecasting API that feeds seasonal indices directly into the inventory replenishment system.

Limitations of This Analysis

The synthetic series has a clean sinusoidal seasonal pattern — real retail data has irregular peaks (promotions, holidays, competitor actions) that violate the additive decomposition assumption. The manual ADF implementation is an approximation; production code should use statsmodels adfuller() for reliable critical values.

Business Decisions This Could Drive

Apply the monthly seasonal indices immediately to the inventory model — this alone eliminates the systematic Q1/Q4 error without waiting for the ARIMA sprint. Investigate why promotional spend has zero correlation with sales — the marketing team's attribution model may be fundamentally flawed.

Practice Questions

1. Which pandas method creates a lag-52 feature — shifting the sales column down by 52 rows so that each week receives last year's same-week value?

2. What is the formula for computing a seasonal index for a given calendar period — producing a ratio where 1.0 means average, above 1.0 means above average?

3. Which pandas method removes trend from a non-stationary series by computing week-over-week changes — making it stationary before ARIMA modelling?

Quiz

Up Next · Case Study 22

NLP Sentiment Insights

You receive 500 customer reviews. How do you clean and tokenise text, build a sentiment score from scratch, identify the most common complaint topics, and quantify which product features drive negative versus positive sentiment?

← Previous Course Index Next →

DS Case Studies

Time Series Forecasting EDA

What This Case Study Covers

The Time Series Toolkit

Dataset Overview

Business Questions

Step-by-Step Analysis

Key Findings

Visualisations

Time Series Decision Guide

Analyst's Note

What Would Come Next?

Limitations of This Analysis

Business Decisions This Could Drive

Practice Questions

Quiz