Random Forest for Time Series Forecasting
Linear regression gave us a clean baseline.
But real-world time series rarely move in straight lines.
Electricity demand reacts differently on weekends, holidays, and extreme weather. These relationships are not linear.
This is where Random Forest becomes useful.
The Core Idea Behind Random Forest
Instead of learning one global equation, Random Forest learns many small decision rules.
Each tree focuses on a different view of the data. The final prediction is an average of all trees.
This allows the model to:
- Capture non-linear behavior
- Handle sudden changes better
- Adapt to complex patterns
Our Real-World Example
We continue forecasting daily electricity usage.
This time, instead of using only yesterday’s value, we use multiple past days.
This helps the model understand short-term memory.
Preparing Lag Features
We create three lag features:
- Usage at t-1
- Usage at t-2
- Usage at t-3
import numpy as np
np.random.seed(5)
days = np.arange(200)
usage = 120 + 0.25*days + 12*np.sin(2*np.pi*days/7) + np.random.normal(0,4,200)
X = np.column_stack([
usage[:-3],
usage[1:-2],
usage[2:-1]
])
y = usage[3:]
Each row now represents:
“Given the last 3 days, predict the next day.”
Time-Aware Train Test Split
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
Training the Random Forest Model
Each tree learns different decision boundaries.
Together, they form a strong predictor.
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(
n_estimators=200,
max_depth=6,
random_state=42
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Actual vs Forecasted Values
This plot compares true electricity demand with Random Forest predictions.
What stands out:
- Predictions adapt faster to changes
- Peaks and drops are handled better
- Less smooth than linear regression — more realistic
Why Random Forest Improves Forecasts
Random Forest does not assume linearity.
It learns rules like:
- If last 3 days were high → expect high tomorrow
- If sudden drop happened → reduce forecast
- If pattern repeats → follow it
This makes it powerful for short-term forecasting.
Understanding Prediction Errors Visually
Errors show how confident and stable the model is.
Observations:
- Most errors are small
- Large errors appear near sudden changes
- No systematic bias
When to Use Random Forest
Random Forest works best when:
- Patterns are non-linear
- Short-term memory matters
- Interpretability is still important
It struggles with:
- Very long forecasting horizons
- Strict seasonal extrapolation
Practice Questions
Q1. Why do multiple lag features help Random Forest?
Q2. Why is Random Forest less smooth than linear regression?
Key Takeaways
- Random Forest captures non-linear behavior
- Lag features give the model memory
- Forecasts are more responsive to changes
Next lesson: we’ll push performance further using Gradient Boosting.