Time Series Lesson 45 – Attention | Dataplexa

Attention Models for Time Series

Traditional sequence models treat all past data almost equally. But in real life, not every past value matters the same.

Attention models solve this by learning where to focus.

Why Attention Is Needed

Think about forecasting daily sales.

Attention models automatically learn which past points are important.

Consider an e-commerce store:

Attention allows the model to assign higher weight to the most useful historical moments.

This chart shows simulated daily sales with:

Below you see how attention assigns importance to different timesteps.

Notice:

Python: Attention Layer Concept

# h = LSTM hidden states

scores = Dense(1)(h)
weights = Softmax(axis=1)(scores)

context = Sum(weights * h)

Key idea:

You can visually inspect which timesteps influenced predictions.

Attention is powerful but should be used wisely.

Q1. Why does attention outperform plain LSTM for long sequences?

Because it selectively focuses on relevant past information instead of compressing everything into one state.

Q2. Can attention explain model decisions?

Yes. Attention weights show which timesteps influenced predictions.

Next lesson: Transformer models for time series forecasting.