Time Series Lesson 30 – Random Forest | Dataplexa

Random Forest for Time Series Forecasting

Linear regression gave us a clean baseline.

But real-world time series rarely move in straight lines.

Electricity demand reacts differently on weekends, holidays, and extreme weather. These relationships are not linear.

This is where Random Forest becomes useful.

The Core Idea Behind Random Forest

Instead of learning one global equation, Random Forest learns many small decision rules.

Each tree focuses on a different view of the data. The final prediction is an average of all trees.

This allows the model to:

Capture non-linear behavior
Handle sudden changes better
Adapt to complex patterns

Our Real-World Example

We continue forecasting daily electricity usage.

This time, instead of using only yesterday’s value, we use multiple past days.

This helps the model understand short-term memory.

Preparing Lag Features

We create three lag features:

Usage at t-1
Usage at t-2
Usage at t-3

Python: Lag Feature Creation

import numpy as np

np.random.seed(5)
days = np.arange(200)
usage = 120 + 0.25*days + 12*np.sin(2*np.pi*days/7) + np.random.normal(0,4,200)

X = np.column_stack([
    usage[:-3],
    usage[1:-2],
    usage[2:-1]
])

y = usage[3:]

Each row now represents:

“Given the last 3 days, predict the next day.”

Time-Aware Train Test Split

Python: Split

split = int(len(X) * 0.8)

X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

Training the Random Forest Model

Each tree learns different decision boundaries.

Together, they form a strong predictor.

Python: Random Forest Model

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=200,
    max_depth=6,
    random_state=42
)

model.fit(X_train, y_train)
predictions = model.predict(X_test)

Actual vs Forecasted Values

This plot compares true electricity demand with Random Forest predictions.

What stands out:

Predictions adapt faster to changes
Peaks and drops are handled better
Less smooth than linear regression — more realistic

Why Random Forest Improves Forecasts

Random Forest does not assume linearity.

It learns rules like:

If last 3 days were high → expect high tomorrow
If sudden drop happened → reduce forecast
If pattern repeats → follow it

This makes it powerful for short-term forecasting.

Understanding Prediction Errors Visually

Errors show how confident and stable the model is.

Observations:

Most errors are small
Large errors appear near sudden changes
No systematic bias

When to Use Random Forest

Random Forest works best when:

Patterns are non-linear
Short-term memory matters
Interpretability is still important

It struggles with:

Very long forecasting horizons
Strict seasonal extrapolation

Practice Questions

Q1. Why do multiple lag features help Random Forest?

They allow trees to learn short-term dependencies and interactions.

Q2. Why is Random Forest less smooth than linear regression?

Because it makes rule-based decisions instead of fitting a global line.

Key Takeaways

Random Forest captures non-linear behavior
Lag features give the model memory
Forecasts are more responsive to changes

Next lesson: we’ll push performance further using Gradient Boosting.

← Previous Course Index Next →