Time Series Lesson 33 – LightGBM | Dataplexa

LightGBM for Time Series Forecasting

LightGBM is another powerful gradient-boosting algorithm, designed to be faster and more memory-efficient than XGBoost. In real production systems, LightGBM is often preferred when datasets are large and forecasts must be generated quickly.

You will see LightGBM heavily used in:

Retail demand forecasting
Energy consumption prediction
Ride-sharing demand estimation
Financial risk modeling

Real-World Scenario

Suppose you manage inventory for a grocery delivery company. You must forecast daily item demand to avoid stockouts and food waste.

Demand depends on:

Recent sales history
Weekly shopping behavior
Short-term demand spikes

LightGBM is well-suited for this type of fast, pattern-driven forecasting.

Step 1: Create a Time Series Dataset

We simulate daily demand with trend, weekly seasonality, and noise.

Python: Generate Demand Data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(7)

days = 320
time = np.arange(days)

trend = time * 0.25
weekly = 18 * np.sin(2 * np.pi * time / 7)
noise = np.random.normal(0, 6, days)

demand = 60 + trend + weekly + noise

df = pd.DataFrame({"demand": demand})
df.head()

What this plot represents:

Gradual growth in demand over time
Clear weekly shopping cycles
Random day-to-day fluctuations

Step 2: Feature Engineering

Just like XGBoost, LightGBM needs engineered features to understand time.

We create:

Short-term lags
Weekly lag
Rolling averages

Python: Feature Creation

df["lag_1"] = df["demand"].shift(1)
df["lag_2"] = df["demand"].shift(2)
df["lag_7"] = df["demand"].shift(7)

df["rolling_7"] = df["demand"].rolling(7).mean()
df["rolling_14"] = df["demand"].rolling(14).mean()

df = df.dropna()
df.head()

Why this works:

Lag features give memory
Weekly lag captures recurring behavior
Rolling means smooth random noise

Step 3: Time-Aware Train-Test Split

We respect the time order to avoid data leakage.

Python: Train-Test Split

X = df.drop("demand", axis=1)
y = df["demand"]

split = int(len(df) * 0.8)

X_train, X_test = X.iloc[:split], X.iloc[split:]
y_train, y_test = y.iloc[:split], y.iloc[split:]

Step 4: Train LightGBM Model

LightGBM grows trees leaf-wise, which allows it to capture complex patterns quickly.

Python: Train LightGBM

from lightgbm import LGBMRegressor

model = LGBMRegressor(
    n_estimators=400,
    learning_rate=0.05,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=7
)

model.fit(X_train, y_train)

preds = model.predict(X_test)

Step 5: Forecast Visualization

This visualization tells us whether the model truly learned the demand behavior.

Key observations:

Predictions closely follow real demand
Weekly oscillations are preserved
Short-term noise is partially smoothed

Prediction Error Behavior

Error plots reveal model stability and bias.

What this shows:

No long-term drift in errors
Random spread around zero
Model reacts well to demand changes

XGBoost vs LightGBM (Intuition)

XGBoost: more conservative, robust
LightGBM: faster, sharper learning
LightGBM scales better for very large data

In practice, teams often try both and compare results.

Practice Questions

Q1. Why does LightGBM train faster than XGBoost?

Because LightGBM grows trees leaf-wise and uses histogram-based splitting.

Q2. Does LightGBM understand time automatically?

No. Time information must be provided through engineered features.

Next lesson: Time-series cross-validation and why normal CV fails.

← Previous Course Index Next →