Time Series Lesson 24 – Pipeline | Dataplexa

Building a Complete Forecasting Pipeline

Until now, we learned individual concepts — trend, seasonality, models, and evaluation.

In real projects, these steps are never used in isolation. They are connected together into a forecasting pipeline.

This lesson shows how everything fits together — logically and visually.

A Real-World Scenario

Imagine you work for a food delivery company.

You are asked to forecast daily order volume so the company can:

Schedule delivery staff
Prepare inventory
Avoid delays during peak days

Let’s build a simple but realistic forecasting pipeline for this situation.

Step 1: Understanding the Raw Data

Every pipeline starts with raw data.

Below is a simulated daily order count:

Orders slowly increase as the business grows
Weekly seasonality (weekends are busier)
Some random noise

Python: Raw Time Series

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(4)
days = np.arange(140)
trend = 0.2 * days
seasonal = 10 * np.sin(2 * np.pi * days / 7)
noise = np.random.normal(0, 4, size=140)

orders = 80 + trend + seasonal + noise

plt.figure(figsize=(9,4))
plt.plot(orders)
plt.title("Daily Orders")
plt.show()

From this plot, we can already observe:

A slow upward trend
Clear weekly repetition
Some randomness

This visual understanding is critical before modeling.

Step 2: Train–Test Split (Time-Aware)

We never train on future data.

So we split the series:

Past → training
Future → testing

Python: Time-Aware Split

train = orders[:120]
test = orders[120:]

This mirrors how forecasting works in the real world:

You only know the past when predicting the future.

Step 3: Choose a Baseline Model

Before using complex models, we start simple.

A common baseline is the naive forecast:

Tomorrow’s value ≈ today’s value.

Python: Naive Forecast

forecast = np.repeat(train[-1], len(test))

This forecast:

Ignores trend
Ignores seasonality
Sets a performance baseline

Any serious model must beat this.

Step 4: Evaluate the Forecast

We compare predictions against actual future values.

Visual comparison comes first.

Python: Forecast vs Actual

plt.figure(figsize=(9,4))
plt.plot(test, label="Actual")
plt.plot(forecast, label="Forecast")
plt.legend()
plt.show()

What we see:

Actual orders keep changing
Forecast stays flat

This tells us the model is too simple.

Step 5: Residual Analysis

Residuals show what the model failed to capture.

Python: Residuals

residuals = test - forecast

plt.figure(figsize=(9,4))
plt.plot(residuals)
plt.title("Residuals")
plt.show()

Residuals are not random.

They still contain:

Trend
Seasonality

That means the model is underfitting.

Step 6: Improve the Pipeline

Now we know what to fix.

Next steps usually include:

Removing trend (differencing)
Modeling seasonality (SARIMA, features)
Using ML or deep learning models

The pipeline repeats:

Build → Visualize → Evaluate → Improve

Why Pipelines Matter

Without a pipeline:

Models are built randomly
Evaluation is unreliable
Production failures are common

With a pipeline:

Decisions are systematic
Errors are visible early
Models improve logically

Practice Questions

Q1. Why do we always start with a simple baseline?

To understand whether more complex models truly add value.

Q2. What does structured residuals indicate?

The model failed to capture important patterns like trend or seasonality.

Key Takeaways

Forecasting is a process, not a single model
Visualization guides every decision
Evaluation tells what to improve next
Strong pipelines lead to reliable forecasts

From the next lesson onward, we move into stronger classical forecasting models.

← Previous Course Index Next →