Time Series Lesson 30 – Random Forest | Dataplexa

Random Forest for Time Series Forecasting

Linear regression gave us a clean baseline.

But real-world time series rarely move in straight lines.

Electricity demand reacts differently on weekends, holidays, and extreme weather. These relationships are not linear.

This is where Random Forest becomes useful.


The Core Idea Behind Random Forest

Instead of learning one global equation, Random Forest learns many small decision rules.

Each tree focuses on a different view of the data. The final prediction is an average of all trees.

This allows the model to:

  • Capture non-linear behavior
  • Handle sudden changes better
  • Adapt to complex patterns

Our Real-World Example

We continue forecasting daily electricity usage.

This time, instead of using only yesterday’s value, we use multiple past days.

This helps the model understand short-term memory.


Preparing Lag Features

We create three lag features:

  • Usage at t-1
  • Usage at t-2
  • Usage at t-3
Python: Lag Feature Creation
import numpy as np

np.random.seed(5)
days = np.arange(200)
usage = 120 + 0.25*days + 12*np.sin(2*np.pi*days/7) + np.random.normal(0,4,200)

X = np.column_stack([
    usage[:-3],
    usage[1:-2],
    usage[2:-1]
])

y = usage[3:]

Each row now represents:

“Given the last 3 days, predict the next day.”


Time-Aware Train Test Split

Python: Split
split = int(len(X) * 0.8)

X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

Training the Random Forest Model

Each tree learns different decision boundaries.

Together, they form a strong predictor.

Python: Random Forest Model
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=200,
    max_depth=6,
    random_state=42
)

model.fit(X_train, y_train)
predictions = model.predict(X_test)

Actual vs Forecasted Values

This plot compares true electricity demand with Random Forest predictions.

What stands out:

  • Predictions adapt faster to changes
  • Peaks and drops are handled better
  • Less smooth than linear regression — more realistic

Why Random Forest Improves Forecasts

Random Forest does not assume linearity.

It learns rules like:

  • If last 3 days were high → expect high tomorrow
  • If sudden drop happened → reduce forecast
  • If pattern repeats → follow it

This makes it powerful for short-term forecasting.


Understanding Prediction Errors Visually

Errors show how confident and stable the model is.

Observations:

  • Most errors are small
  • Large errors appear near sudden changes
  • No systematic bias

When to Use Random Forest

Random Forest works best when:

  • Patterns are non-linear
  • Short-term memory matters
  • Interpretability is still important

It struggles with:

  • Very long forecasting horizons
  • Strict seasonal extrapolation

Practice Questions

Q1. Why do multiple lag features help Random Forest?

They allow trees to learn short-term dependencies and interactions.

Q2. Why is Random Forest less smooth than linear regression?

Because it makes rule-based decisions instead of fitting a global line.

Key Takeaways

  • Random Forest captures non-linear behavior
  • Lag features give the model memory
  • Forecasts are more responsive to changes

Next lesson: we’ll push performance further using Gradient Boosting.

30