LightGBM for Time Series Forecasting
LightGBM is another powerful gradient-boosting algorithm, designed to be faster and more memory-efficient than XGBoost. In real production systems, LightGBM is often preferred when datasets are large and forecasts must be generated quickly.
You will see LightGBM heavily used in:
- Retail demand forecasting
- Energy consumption prediction
- Ride-sharing demand estimation
- Financial risk modeling
Real-World Scenario
Suppose you manage inventory for a grocery delivery company. You must forecast daily item demand to avoid stockouts and food waste.
Demand depends on:
- Recent sales history
- Weekly shopping behavior
- Short-term demand spikes
LightGBM is well-suited for this type of fast, pattern-driven forecasting.
Step 1: Create a Time Series Dataset
We simulate daily demand with trend, weekly seasonality, and noise.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(7)
days = 320
time = np.arange(days)
trend = time * 0.25
weekly = 18 * np.sin(2 * np.pi * time / 7)
noise = np.random.normal(0, 6, days)
demand = 60 + trend + weekly + noise
df = pd.DataFrame({"demand": demand})
df.head()
What this plot represents:
- Gradual growth in demand over time
- Clear weekly shopping cycles
- Random day-to-day fluctuations
Step 2: Feature Engineering
Just like XGBoost, LightGBM needs engineered features to understand time.
We create:
- Short-term lags
- Weekly lag
- Rolling averages
df["lag_1"] = df["demand"].shift(1)
df["lag_2"] = df["demand"].shift(2)
df["lag_7"] = df["demand"].shift(7)
df["rolling_7"] = df["demand"].rolling(7).mean()
df["rolling_14"] = df["demand"].rolling(14).mean()
df = df.dropna()
df.head()
Why this works:
- Lag features give memory
- Weekly lag captures recurring behavior
- Rolling means smooth random noise
Step 3: Time-Aware Train-Test Split
We respect the time order to avoid data leakage.
X = df.drop("demand", axis=1)
y = df["demand"]
split = int(len(df) * 0.8)
X_train, X_test = X.iloc[:split], X.iloc[split:]
y_train, y_test = y.iloc[:split], y.iloc[split:]
Step 4: Train LightGBM Model
LightGBM grows trees leaf-wise, which allows it to capture complex patterns quickly.
from lightgbm import LGBMRegressor
model = LGBMRegressor(
n_estimators=400,
learning_rate=0.05,
max_depth=6,
subsample=0.8,
colsample_bytree=0.8,
random_state=7
)
model.fit(X_train, y_train)
preds = model.predict(X_test)
Step 5: Forecast Visualization
This visualization tells us whether the model truly learned the demand behavior.
Key observations:
- Predictions closely follow real demand
- Weekly oscillations are preserved
- Short-term noise is partially smoothed
Prediction Error Behavior
Error plots reveal model stability and bias.
What this shows:
- No long-term drift in errors
- Random spread around zero
- Model reacts well to demand changes
XGBoost vs LightGBM (Intuition)
- XGBoost: more conservative, robust
- LightGBM: faster, sharper learning
- LightGBM scales better for very large data
In practice, teams often try both and compare results.
Practice Questions
Q1. Why does LightGBM train faster than XGBoost?
Q2. Does LightGBM understand time automatically?
Next lesson: Time-series cross-validation and why normal CV fails.