Sequence-to-Sequence (Seq2Seq) Models
So far, you have learned how RNNs, LSTMs, GRUs, and Bidirectional models process sequences and understand context.
But many real-world NLP problems require something more powerful: mapping one sequence to another sequence.
In this lesson, you will understand:
- What Seq2Seq models are
- Why simple RNNs are not enough
- How encoder–decoder architecture works
- Where Seq2Seq models are used in NLP
What Is a Sequence-to-Sequence Problem?
A Sequence-to-Sequence (Seq2Seq) problem is one where:
- The input is a sequence
- The output is also a sequence
- The lengths of input and output may be different
Examples:
- Machine translation (English → French)
- Text summarization (long text → short summary)
- Question answering (question → answer sentence)
- Speech-to-text
Why Normal RNNs Are Not Enough
A standard RNN usually:
- Takes one input
- Produces one output
But in translation:
- Input sentence length ≠ output sentence length
- We need to understand the full input before generating output
This limitation led to the development of Seq2Seq models.
Core Idea of Seq2Seq Models
Seq2Seq models split the task into two main parts:
- Encoder: reads and understands the input sequence
- Decoder: generates the output sequence
The encoder converts the input sequence into a context representation, and the decoder uses that context to produce the output step by step.
Encoder–Decoder Architecture
The architecture looks like this conceptually:
- Encoder RNN reads input word by word
- Final hidden state represents the entire input meaning
- Decoder RNN starts generating output words
The encoder and decoder can be built using:
- LSTM
- GRU
- Bidirectional variants (encoder side)
Simple Example: Machine Translation
Consider this translation:
Input (English): “I love NLP” Output (French): “J’aime le NLP”
How Seq2Seq handles this:
- Encoder reads “I → love → NLP”
- Encoder creates a context vector
- Decoder generates “J’aime → le → NLP”
The decoder stops when it predicts an end-of-sentence token.
Training vs Inference (Very Important)
Seq2Seq models behave differently during:
- Training
- Inference (prediction)
During training:
- The correct previous word is given to the decoder
- This is called Teacher Forcing
During inference:
- The decoder uses its own previous prediction
- Errors can accumulate
Basic Seq2Seq Model (Conceptual Code)
Below is a simplified conceptual example of a Seq2Seq model.
Where to run this code:
- Google Colab (recommended)
- Jupyter Notebook with TensorFlow
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model
# Encoder
encoder_inputs = Input(shape=(None, 128))
encoder_lstm = LSTM(256, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]
# Decoder
decoder_inputs = Input(shape=(None, 128))
decoder_lstm = LSTM(256, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(5000, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
Understanding the Code
Let’s break it down step by step:
- Encoder: processes the full input sequence
- Encoder states: capture the sentence meaning
- Decoder: generates output using encoder states
- Dense layer: predicts the next word
This structure forms the foundation of many NLP systems.
Limitations of Basic Seq2Seq Models
Classic Seq2Seq models have important limitations:
- Single context vector can become a bottleneck
- Long sentences are harder to encode
- Performance drops with long dependencies
These limitations led to the development of Attention mechanisms, which you will learn next.
Where Seq2Seq Models Are Used
Seq2Seq models form the backbone of:
- Machine translation systems
- Chatbots
- Text summarization
- Speech recognition
Even modern transformers are conceptually built on the encoder–decoder idea.
Assignment / Homework
Theory:
- Explain why Seq2Seq models are needed
- Describe encoder and decoder roles
Practical:
- Build a simple Seq2Seq model using dummy data
- Experiment with LSTM vs GRU
Practice Environment:
- Google Colab
- Jupyter Notebook
Practice Questions
Q1. What is the main goal of a Seq2Seq model?
Q2. What does the encoder produce?
Quick Quiz
Q1. Which technique feeds the correct word during training?
Q2. Why do Seq2Seq models struggle with long sentences?
Quick Recap
- Seq2Seq models map sequences to sequences
- They use encoder–decoder architecture
- Widely used in translation and summarization
- Basic Seq2Seq models have context bottlenecks
Next lesson: Encoder–Decoder Architecture (Deep Dive)