Time Series Lesson 45 – Attention | Dataplexa

Attention Models for Time Series

Traditional sequence models treat all past data almost equally. But in real life, not every past value matters the same.

Attention models solve this by learning where to focus.


Why Attention Is Needed

Think about forecasting daily sales.

  • Yesterday’s sales matter a lot
  • Last week’s same day matters
  • Sales from 6 months ago may not matter

Attention models automatically learn which past points are important.


Real-World Example: Online Sales Forecasting

Consider an e-commerce store:

  • Recent promotions affect demand
  • Weekend patterns repeat
  • Old data slowly loses relevance

Attention allows the model to assign higher weight to the most useful historical moments.


Sales Time Series

This chart shows simulated daily sales with:

  • Trend
  • Weekly seasonality
  • Occasional spikes

How Attention Works (Conceptually)

  1. Each past timestep produces a hidden state
  2. The model scores how relevant each state is
  3. Important states get higher weights
  4. Weighted sum is used for prediction

Attention Weights Visualization

Below you see how attention assigns importance to different timesteps.

Notice:

  • Recent days get higher weights
  • Some weekly points stand out
  • Older values fade away

Attention Model Structure

Python: Attention Layer Concept
# h = LSTM hidden states

scores = Dense(1)(h)
weights = Softmax(axis=1)(scores)

context = Sum(weights * h)

Key idea:

  • The model decides what matters
  • No manual feature engineering

Why Attention Improves Forecasting

  • Handles long sequences better
  • Reduces information overload
  • Improves interpretability

You can visually inspect which timesteps influenced predictions.


Common Use Cases

  • Financial forecasting
  • Demand prediction
  • Energy load forecasting
  • Traffic and mobility data

Limitations

  • More parameters
  • Needs more data
  • Slower training

Attention is powerful but should be used wisely.


Practice Questions

Q1. Why does attention outperform plain LSTM for long sequences?

Because it selectively focuses on relevant past information instead of compressing everything into one state.

Q2. Can attention explain model decisions?

Yes. Attention weights show which timesteps influenced predictions.

Next lesson: Transformer models for time series forecasting.