Time Series Lesson 24 – Pipeline | Dataplexa

Building a Complete Forecasting Pipeline

Until now, we learned individual concepts — trend, seasonality, models, and evaluation.

In real projects, these steps are never used in isolation. They are connected together into a forecasting pipeline.

This lesson shows how everything fits together — logically and visually.


A Real-World Scenario

Imagine you work for a food delivery company.

You are asked to forecast daily order volume so the company can:

  • Schedule delivery staff
  • Prepare inventory
  • Avoid delays during peak days

Let’s build a simple but realistic forecasting pipeline for this situation.


Step 1: Understanding the Raw Data

Every pipeline starts with raw data.

Below is a simulated daily order count:

  • Orders slowly increase as the business grows
  • Weekly seasonality (weekends are busier)
  • Some random noise
Python: Raw Time Series
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(4)
days = np.arange(140)
trend = 0.2 * days
seasonal = 10 * np.sin(2 * np.pi * days / 7)
noise = np.random.normal(0, 4, size=140)

orders = 80 + trend + seasonal + noise

plt.figure(figsize=(9,4))
plt.plot(orders)
plt.title("Daily Orders")
plt.show()

From this plot, we can already observe:

  • A slow upward trend
  • Clear weekly repetition
  • Some randomness

This visual understanding is critical before modeling.


Step 2: Train–Test Split (Time-Aware)

We never train on future data.

So we split the series:

  • Past → training
  • Future → testing
Python: Time-Aware Split
train = orders[:120]
test = orders[120:]

This mirrors how forecasting works in the real world:

You only know the past when predicting the future.


Step 3: Choose a Baseline Model

Before using complex models, we start simple.

A common baseline is the naive forecast:

Tomorrow’s value ≈ today’s value.

Python: Naive Forecast
forecast = np.repeat(train[-1], len(test))

This forecast:

  • Ignores trend
  • Ignores seasonality
  • Sets a performance baseline

Any serious model must beat this.


Step 4: Evaluate the Forecast

We compare predictions against actual future values.

Visual comparison comes first.

Python: Forecast vs Actual
plt.figure(figsize=(9,4))
plt.plot(test, label="Actual")
plt.plot(forecast, label="Forecast")
plt.legend()
plt.show()

What we see:

  • Actual orders keep changing
  • Forecast stays flat

This tells us the model is too simple.


Step 5: Residual Analysis

Residuals show what the model failed to capture.

Python: Residuals
residuals = test - forecast

plt.figure(figsize=(9,4))
plt.plot(residuals)
plt.title("Residuals")
plt.show()

Residuals are not random.

They still contain:

  • Trend
  • Seasonality

That means the model is underfitting.


Step 6: Improve the Pipeline

Now we know what to fix.

Next steps usually include:

  • Removing trend (differencing)
  • Modeling seasonality (SARIMA, features)
  • Using ML or deep learning models

The pipeline repeats:

Build → Visualize → Evaluate → Improve


Why Pipelines Matter

Without a pipeline:

  • Models are built randomly
  • Evaluation is unreliable
  • Production failures are common

With a pipeline:

  • Decisions are systematic
  • Errors are visible early
  • Models improve logically

Practice Questions

Q1. Why do we always start with a simple baseline?

To understand whether more complex models truly add value.

Q2. What does structured residuals indicate?

The model failed to capture important patterns like trend or seasonality.

Key Takeaways

  • Forecasting is a process, not a single model
  • Visualization guides every decision
  • Evaluation tells what to improve next
  • Strong pipelines lead to reliable forecasts

From the next lesson onward, we move into stronger classical forecasting models.