Encoder–Decoder Architecture
Encoder–Decoder architecture is the structural backbone behind many powerful time-series forecasting systems.
Instead of predicting values directly from raw history, this architecture separates understanding from generation.
Why Encoder–Decoder Exists
In real-world time series, raw historical data is noisy, long, and complex.
Trying to predict directly from it often leads to:
- Information overload
- Loss of long-term context
- Unstable future predictions
Encoder–Decoder solves this by splitting responsibilities.
Core Idea
- Encoder learns what happened
- Decoder decides what will happen next
The encoder does not forecast. The decoder does not see raw input.
They communicate using a learned internal representation called context.
Real-World Example: Traffic Flow Forecasting
Consider a city traffic system:
- Input: last 24 hours of traffic volume (minute-level)
- Output: next 6 hours of congestion pattern
Traffic behavior depends on:
- Rush hours
- Slowdowns
- Recovery phases
Encoder–Decoder captures these patterns holistically.
Visual Understanding
The visualization below shows:
- Encoder reading past traffic
- Decoder generating future flow
How to Read the Plot
- Dark line → historical traffic flow
- Green line → forecasted future congestion
- The transition point is learned internally
The decoder never directly sees historical points — only encoded understanding.
Encoder Role (Deep Understanding)
The encoder processes the entire input sequence and compresses:
- Trend
- Seasonality
- Spikes and drops
This compression is not data loss — it is abstraction.
Decoder Role (Sequence Generation)
The decoder unfolds the future one step at a time.
Each step depends on:
- Encoded context
- Previously generated outputs
This allows smooth, consistent future sequences.
Conceptual Encoder–Decoder Flow
# Encode full input sequence
context_vector = encoder(past_sequence)
# Initialize decoder with context
decoder_state = context_vector
future = []
for t in range(horizon):
prediction, decoder_state = decoder(decoder_state)
future.append(prediction)
Important:
- The encoder runs once
- The decoder runs sequentially
Why This Architecture Is Powerful
- Separates learning from prediction
- Handles long sequences efficiently
- Improves multi-step stability
This architecture is the foundation for:
- Seq2Seq models
- Attention mechanisms
- Transformers
Limitations to Be Aware Of
- Single context vector may bottleneck information
- Long sequences can strain memory
These limitations led to the development of attention-based models.
Practice Questions
Q1. Why doesn’t the decoder see raw historical data?
Q2. What happens if the context vector is weak?
Next lesson: One-dimensional CNNs for time series.