AI Lesson 70 – RNN,LSTM & GRU in NLP | Dataplexa

RNN, LSTM, and GRU in NLP

In the previous lesson, we learned why sequence modeling is important in Natural Language Processing. However, simple sequence representations are not enough to learn long-term dependencies in language. This is where Recurrent Neural Networks and their improved versions come into play.

This lesson explains how RNNs work, why they struggle with long sequences, and how LSTM and GRU solve those problems.

Real-World Connection

When a voice assistant understands a long sentence, or when a translation system remembers context from the beginning of a paragraph, it relies on sequence-aware neural networks. These networks maintain memory over time, allowing them to understand relationships across words.

What Is a Recurrent Neural Network (RNN)?

A Recurrent Neural Network is a neural network designed to process sequences. Unlike traditional networks, RNNs reuse the same weights at every time step and pass information forward through a hidden state.

  • Processes input one word at a time
  • Maintains a hidden memory
  • Shares parameters across time steps

How an RNN Works

At each step, the RNN takes the current input and the previous hidden state to produce a new hidden state. This allows information from earlier words to influence later predictions.

Simple RNN Example


import numpy as np

inputs = ["I", "love", "AI"]
hidden_state = np.zeros(4)

for word in inputs:
    hidden_state = hidden_state + 0.1
    print(hidden_state)
  
[0.1 0.1 0.1 0.1] [0.2 0.2 0.2 0.2] [0.3 0.3 0.3 0.3]

Understanding the Example

This simplified example shows how information accumulates over time. In real RNNs, the hidden state is updated using learned weights and activation functions.

The Problem with Basic RNNs

Standard RNNs struggle with long sequences due to the vanishing and exploding gradient problem. As sequences grow longer, the network forgets earlier information.

  • Difficulty remembering long-term context
  • Unstable training
  • Limited practical usage

What Is LSTM?

Long Short-Term Memory (LSTM) networks were designed to solve the limitations of RNNs. LSTMs introduce a memory cell and gates that control information flow.

  • Forget gate decides what to remove
  • Input gate decides what to store
  • Output gate controls output

LSTM Example


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np

X = np.random.rand(10, 5, 1)
y = np.random.rand(10, 1)

model = Sequential([
    LSTM(32, input_shape=(5,1)),
    Dense(1)
])

model.compile(optimizer="adam", loss="mse")
model.fit(X, y, epochs=1)
  
Training completed successfully

What Is GRU?

Gated Recurrent Unit (GRU) is a simplified version of LSTM. It combines the forget and input gates into a single update gate, making it faster and easier to train.

  • Fewer parameters than LSTM
  • Faster training
  • Good performance on many tasks

LSTM vs GRU

  • LSTM has separate memory cell and gates
  • GRU has a simpler structure
  • LSTM is better for very long sequences
  • GRU is computationally efficient

Where RNNs, LSTMs, and GRUs Are Used

  • Speech recognition
  • Machine translation
  • Text generation
  • Sentiment analysis
  • Time-series forecasting

Practice Questions

Practice 1: What does RNN stand for?



Practice 2: What does LSTM stand for?



Practice 3: What does GRU stand for?



Quick Quiz

Quiz 1: What major problem do basic RNNs face?





Quiz 2: What makes LSTM different from RNN?





Quiz 3: Which model is a simpler alternative to LSTM?





Coming up next: Transformers in NLP — the architecture that replaced RNNs for most modern NLP tasks.