Gated Recurrent Units (GRUs)
LSTMs solved the long-term memory problem of basic RNNs, but they introduced complexity. GRUs were created as a simpler and faster alternative.
In many real-world forecasting tasks, GRUs perform just as well as LSTMs while being easier to train.
Why GRUs Were Introduced
LSTMs use multiple gates and separate memory cells. While powerful, they can be computationally heavy.
GRUs simplify this by:
- Removing the separate cell state
- Using fewer gates
- Merging memory and hidden state
The goal is the same: remember important information and forget the rest.
GRU Gates (Intuitive View)
GRUs use two gates:
- Update gate: How much past information should be kept?
- Reset gate: How much past information should be ignored?
These gates dynamically control how memory flows through time.
Real-World Example: Website Traffic Forecasting
Consider daily website traffic.
Traffic depends on:
- Recent days (news, campaigns)
- Weekly patterns
- Longer-term popularity trends
GRUs balance short-term responsiveness and long-term stability.
Visual Comparison: RNN vs LSTM vs GRU
The plot below compares:
- Actual website traffic
- RNN prediction (short memory)
- LSTM prediction (strong long memory)
- GRU prediction (balanced memory)
How to Read This Plot
- The black line is the true traffic pattern
- The purple line drifts due to weak memory (RNN)
- The green line is stable but slower to adapt (LSTM)
- The orange line adapts quickly while staying stable (GRU)
GRUs often hit the sweet spot between speed and accuracy.
Conceptual GRU Logic
memory = 0
predictions = []
for value in series:
update_gate = 0.85
memory = update_gate * memory + (1 - update_gate) * value
predictions.append(memory)
What this represents:
- Memory is updated smoothly
- New information is blended carefully
- Noise influence is reduced
When GRUs Are a Better Choice
- Smaller datasets
- Faster training required
- Limited computational resources
- Near-real-time forecasting
Many production systems prefer GRUs for efficiency.
Key Differences: LSTM vs GRU
| Aspect | LSTM | GRU |
|---|---|---|
| Number of gates | 3 | 2 |
| Separate memory cell | Yes | No |
| Training speed | Slower | Faster |
| Performance | Very strong | Comparable |
Practice Questions
Q1. Why are GRUs faster to train than LSTMs?
Q2. In what scenario would GRUs be preferred?
Next lesson: Bidirectional models — learning from past and future context.