AI Lesson 70 – RNN,LSTM & GRU in NLP | Dataplexa

RNN, LSTM, and GRU in NLP

In the previous lesson, we learned why sequence modeling is important in Natural Language Processing. However, simple sequence representations are not enough to learn long-term dependencies in language. This is where Recurrent Neural Networks and their improved versions come into play.

This lesson explains how RNNs work, why they struggle with long sequences, and how LSTM and GRU solve those problems.

Real-World Connection

When a voice assistant understands a long sentence, or when a translation system remembers context from the beginning of a paragraph, it relies on sequence-aware neural networks. These networks maintain memory over time, allowing them to understand relationships across words.

What Is a Recurrent Neural Network (RNN)?

A Recurrent Neural Network is a neural network designed to process sequences. Unlike traditional networks, RNNs reuse the same weights at every time step and pass information forward through a hidden state.

Processes input one word at a time
Maintains a hidden memory
Shares parameters across time steps

How an RNN Works

At each step, the RNN takes the current input and the previous hidden state to produce a new hidden state. This allows information from earlier words to influence later predictions.

Simple RNN Example


import numpy as np

inputs = ["I", "love", "AI"]
hidden_state = np.zeros(4)

for word in inputs:
    hidden_state = hidden_state + 0.1
    print(hidden_state)

[0.1 0.1 0.1 0.1] [0.2 0.2 0.2 0.2] [0.3 0.3 0.3 0.3]

Understanding the Example

This simplified example shows how information accumulates over time. In real RNNs, the hidden state is updated using learned weights and activation functions.

The Problem with Basic RNNs

Standard RNNs struggle with long sequences due to the vanishing and exploding gradient problem. As sequences grow longer, the network forgets earlier information.

Difficulty remembering long-term context
Unstable training
Limited practical usage

What Is LSTM?

Long Short-Term Memory (LSTM) networks were designed to solve the limitations of RNNs. LSTMs introduce a memory cell and gates that control information flow.

Forget gate decides what to remove
Input gate decides what to store
Output gate controls output

LSTM Example


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np

X = np.random.rand(10, 5, 1)
y = np.random.rand(10, 1)

model = Sequential([
    LSTM(32, input_shape=(5,1)),
    Dense(1)
])

model.compile(optimizer="adam", loss="mse")
model.fit(X, y, epochs=1)

Training completed successfully

What Is GRU?

Gated Recurrent Unit (GRU) is a simplified version of LSTM. It combines the forget and input gates into a single update gate, making it faster and easier to train.

Fewer parameters than LSTM
Faster training
Good performance on many tasks

LSTM vs GRU

LSTM has separate memory cell and gates
GRU has a simpler structure
LSTM is better for very long sequences
GRU is computationally efficient

Where RNNs, LSTMs, and GRUs Are Used

Speech recognition
Machine translation
Text generation
Sentiment analysis
Time-series forecasting

Practice Questions

Practice 1: What does RNN stand for?

Practice 2: What does LSTM stand for?

Practice 3: What does GRU stand for?

Quick Quiz

Quiz 1: What major problem do basic RNNs face?

Vanishing gradient
Overfitting
Tokenization

Quiz 2: What makes LSTM different from RNN?

Gates
Token count
TF-IDF

Quiz 3: Which model is a simpler alternative to LSTM?

GRU
RNN
CNN

Coming up next: Transformers in NLP — the architecture that replaced RNNs for most modern NLP tasks.

← Previous Course Index Next →

AI Course