Long Short-Term Memory Networks (LSTMs)
In the previous lesson, we saw how Recurrent Neural Networks process sequences step by step. But they suffer from a serious limitation.
They forget information that is far back in time.
LSTMs were designed to solve exactly this problem.
The Real Problem with Basic RNNs
Imagine forecasting daily electricity usage.
Consumption today may depend on:
- Yesterday’s weather
- The past few days of temperature
- Seasonal usage patterns from weeks ago
A basic RNN struggles to remember information that far back. This is called the long-term dependency problem.
How LSTMs Solve This
LSTMs introduce a smarter memory system.
Instead of one hidden state, they use:
- A cell state (long-term memory)
- Gates that control information flow
These gates decide what to:
- Keep
- Forget
- Update
Understanding LSTM Gates (Intuition Only)
You don’t need equations to understand this. Think like this:
- Forget gate: What old information is no longer useful?
- Input gate: What new information should be stored?
- Output gate: What information should influence the prediction?
This is why LSTMs remember patterns for long durations.
Real-World Example: Weekly Sales Forecasting
Suppose we are forecasting daily sales for a store.
Sales depend on:
- Short-term fluctuations
- Weekly patterns
- Longer trends
An LSTM can retain all of this information simultaneously.
Visualizing Long-Term Memory
The plot below compares:
- Actual sales data
- RNN-style short memory prediction
- LSTM-style long memory prediction
How to Read This Plot
- The black line is the actual time series
- The purple line shows RNN behavior (short memory)
- The green line shows LSTM behavior (long memory)
Notice how:
- RNN predictions drift away over time
- LSTM predictions stay aligned with the overall pattern
That stability comes from controlled memory.
LSTM Logic (Conceptual Python)
memory = 0
predictions = []
for value in series:
memory = 0.95 * memory + 0.05 * value
predictions.append(memory)
Conceptually:
- Old information decays slowly
- Important patterns persist
- Noise has less influence
Why LSTMs Are Widely Used
- Financial forecasting
- Demand prediction
- Energy consumption
- Traffic forecasting
Any problem where long-term context matters.
Key Takeaways
- RNNs remember short-term patterns
- LSTMs remember long-term patterns
- Gates control information flow
- Better stability in forecasting
Practice Questions
Q1. Why do LSTMs perform better than RNNs for long sequences?
Q2. What role does the forget gate play?
Next lesson: GRUs — a simpler alternative to LSTMs.