AI Course
RNN, LSTM, and GRU in NLP
In the previous lesson, we learned why sequence modeling is important in Natural Language Processing. However, simple sequence representations are not enough to learn long-term dependencies in language. This is where Recurrent Neural Networks and their improved versions come into play.
This lesson explains how RNNs work, why they struggle with long sequences, and how LSTM and GRU solve those problems.
Real-World Connection
When a voice assistant understands a long sentence, or when a translation system remembers context from the beginning of a paragraph, it relies on sequence-aware neural networks. These networks maintain memory over time, allowing them to understand relationships across words.
What Is a Recurrent Neural Network (RNN)?
A Recurrent Neural Network is a neural network designed to process sequences. Unlike traditional networks, RNNs reuse the same weights at every time step and pass information forward through a hidden state.
- Processes input one word at a time
- Maintains a hidden memory
- Shares parameters across time steps
How an RNN Works
At each step, the RNN takes the current input and the previous hidden state to produce a new hidden state. This allows information from earlier words to influence later predictions.
Simple RNN Example
import numpy as np
inputs = ["I", "love", "AI"]
hidden_state = np.zeros(4)
for word in inputs:
hidden_state = hidden_state + 0.1
print(hidden_state)
Understanding the Example
This simplified example shows how information accumulates over time. In real RNNs, the hidden state is updated using learned weights and activation functions.
The Problem with Basic RNNs
Standard RNNs struggle with long sequences due to the vanishing and exploding gradient problem. As sequences grow longer, the network forgets earlier information.
- Difficulty remembering long-term context
- Unstable training
- Limited practical usage
What Is LSTM?
Long Short-Term Memory (LSTM) networks were designed to solve the limitations of RNNs. LSTMs introduce a memory cell and gates that control information flow.
- Forget gate decides what to remove
- Input gate decides what to store
- Output gate controls output
LSTM Example
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np
X = np.random.rand(10, 5, 1)
y = np.random.rand(10, 1)
model = Sequential([
LSTM(32, input_shape=(5,1)),
Dense(1)
])
model.compile(optimizer="adam", loss="mse")
model.fit(X, y, epochs=1)
What Is GRU?
Gated Recurrent Unit (GRU) is a simplified version of LSTM. It combines the forget and input gates into a single update gate, making it faster and easier to train.
- Fewer parameters than LSTM
- Faster training
- Good performance on many tasks
LSTM vs GRU
- LSTM has separate memory cell and gates
- GRU has a simpler structure
- LSTM is better for very long sequences
- GRU is computationally efficient
Where RNNs, LSTMs, and GRUs Are Used
- Speech recognition
- Machine translation
- Text generation
- Sentiment analysis
- Time-series forecasting
Practice Questions
Practice 1: What does RNN stand for?
Practice 2: What does LSTM stand for?
Practice 3: What does GRU stand for?
Quick Quiz
Quiz 1: What major problem do basic RNNs face?
Quiz 2: What makes LSTM different from RNN?
Quiz 3: Which model is a simpler alternative to LSTM?
Coming up next: Transformers in NLP — the architecture that replaced RNNs for most modern NLP tasks.