Building a Complete Forecasting Pipeline
Until now, we learned individual concepts — trend, seasonality, models, and evaluation.
In real projects, these steps are never used in isolation. They are connected together into a forecasting pipeline.
This lesson shows how everything fits together — logically and visually.
A Real-World Scenario
Imagine you work for a food delivery company.
You are asked to forecast daily order volume so the company can:
- Schedule delivery staff
- Prepare inventory
- Avoid delays during peak days
Let’s build a simple but realistic forecasting pipeline for this situation.
Step 1: Understanding the Raw Data
Every pipeline starts with raw data.
Below is a simulated daily order count:
- Orders slowly increase as the business grows
- Weekly seasonality (weekends are busier)
- Some random noise
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(4)
days = np.arange(140)
trend = 0.2 * days
seasonal = 10 * np.sin(2 * np.pi * days / 7)
noise = np.random.normal(0, 4, size=140)
orders = 80 + trend + seasonal + noise
plt.figure(figsize=(9,4))
plt.plot(orders)
plt.title("Daily Orders")
plt.show()
From this plot, we can already observe:
- A slow upward trend
- Clear weekly repetition
- Some randomness
This visual understanding is critical before modeling.
Step 2: Train–Test Split (Time-Aware)
We never train on future data.
So we split the series:
- Past → training
- Future → testing
train = orders[:120]
test = orders[120:]
This mirrors how forecasting works in the real world:
You only know the past when predicting the future.
Step 3: Choose a Baseline Model
Before using complex models, we start simple.
A common baseline is the naive forecast:
Tomorrow’s value ≈ today’s value.
forecast = np.repeat(train[-1], len(test))
This forecast:
- Ignores trend
- Ignores seasonality
- Sets a performance baseline
Any serious model must beat this.
Step 4: Evaluate the Forecast
We compare predictions against actual future values.
Visual comparison comes first.
plt.figure(figsize=(9,4))
plt.plot(test, label="Actual")
plt.plot(forecast, label="Forecast")
plt.legend()
plt.show()
What we see:
- Actual orders keep changing
- Forecast stays flat
This tells us the model is too simple.
Step 5: Residual Analysis
Residuals show what the model failed to capture.
residuals = test - forecast
plt.figure(figsize=(9,4))
plt.plot(residuals)
plt.title("Residuals")
plt.show()
Residuals are not random.
They still contain:
- Trend
- Seasonality
That means the model is underfitting.
Step 6: Improve the Pipeline
Now we know what to fix.
Next steps usually include:
- Removing trend (differencing)
- Modeling seasonality (SARIMA, features)
- Using ML or deep learning models
The pipeline repeats:
Build → Visualize → Evaluate → Improve
Why Pipelines Matter
Without a pipeline:
- Models are built randomly
- Evaluation is unreliable
- Production failures are common
With a pipeline:
- Decisions are systematic
- Errors are visible early
- Models improve logically
Practice Questions
Q1. Why do we always start with a simple baseline?
Q2. What does structured residuals indicate?
Key Takeaways
- Forecasting is a process, not a single model
- Visualization guides every decision
- Evaluation tells what to improve next
- Strong pipelines lead to reliable forecasts
From the next lesson onward, we move into stronger classical forecasting models.