NLP Lesson 36 – Seq2Seq | Dataplexa

Sequence-to-Sequence (Seq2Seq) Models

So far, you have learned how RNNs, LSTMs, GRUs, and Bidirectional models process sequences and understand context.

But many real-world NLP problems require something more powerful: mapping one sequence to another sequence.

In this lesson, you will understand:

What Seq2Seq models are
Why simple RNNs are not enough
How encoder–decoder architecture works
Where Seq2Seq models are used in NLP

What Is a Sequence-to-Sequence Problem?

A Sequence-to-Sequence (Seq2Seq) problem is one where:

The input is a sequence
The output is also a sequence
The lengths of input and output may be different

Examples:

Machine translation (English → French)
Text summarization (long text → short summary)
Question answering (question → answer sentence)
Speech-to-text

Why Normal RNNs Are Not Enough

A standard RNN usually:

Takes one input
Produces one output

But in translation:

Input sentence length ≠ output sentence length
We need to understand the full input before generating output

This limitation led to the development of Seq2Seq models.

Core Idea of Seq2Seq Models

Seq2Seq models split the task into two main parts:

Encoder: reads and understands the input sequence
Decoder: generates the output sequence

The encoder converts the input sequence into a context representation, and the decoder uses that context to produce the output step by step.

Encoder–Decoder Architecture

The architecture looks like this conceptually:

Encoder RNN reads input word by word
Final hidden state represents the entire input meaning
Decoder RNN starts generating output words

The encoder and decoder can be built using:

LSTM
GRU
Bidirectional variants (encoder side)

Simple Example: Machine Translation

Consider this translation:

Input (English): “I love NLP” Output (French): “J’aime le NLP”

How Seq2Seq handles this:

Encoder reads “I → love → NLP”
Encoder creates a context vector
Decoder generates “J’aime → le → NLP”

The decoder stops when it predicts an end-of-sentence token.

Training vs Inference (Very Important)

Seq2Seq models behave differently during:

Training
Inference (prediction)

During training:

The correct previous word is given to the decoder
This is called Teacher Forcing

During inference:

The decoder uses its own previous prediction
Errors can accumulate

Basic Seq2Seq Model (Conceptual Code)

Below is a simplified conceptual example of a Seq2Seq model.

Where to run this code:

Google Colab (recommended)
Jupyter Notebook with TensorFlow

Python Example: Basic Seq2Seq Architecture

from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model

# Encoder
encoder_inputs = Input(shape=(None, 128))
encoder_lstm = LSTM(256, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, 128))
decoder_lstm = LSTM(256, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(5000, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

Understanding the Code

Let’s break it down step by step:

Encoder: processes the full input sequence
Encoder states: capture the sentence meaning
Decoder: generates output using encoder states
Dense layer: predicts the next word

This structure forms the foundation of many NLP systems.

Limitations of Basic Seq2Seq Models

Classic Seq2Seq models have important limitations:

Single context vector can become a bottleneck
Long sentences are harder to encode
Performance drops with long dependencies

These limitations led to the development of Attention mechanisms, which you will learn next.

Where Seq2Seq Models Are Used

Seq2Seq models form the backbone of:

Machine translation systems
Chatbots
Text summarization
Speech recognition

Even modern transformers are conceptually built on the encoder–decoder idea.

Assignment / Homework

Theory:

Explain why Seq2Seq models are needed
Describe encoder and decoder roles

Practical:

Build a simple Seq2Seq model using dummy data
Experiment with LSTM vs GRU

Practice Environment:

Google Colab
Jupyter Notebook

Practice Questions

Q1. What is the main goal of a Seq2Seq model?

To map an input sequence to an output sequence.

Q2. What does the encoder produce?

A context representation of the input sequence.

Quick Quiz

Q1. Which technique feeds the correct word during training?

Teacher Forcing.

Q2. Why do Seq2Seq models struggle with long sentences?

Because all information is compressed into a single context vector.

Quick Recap

Seq2Seq models map sequences to sequences
They use encoder–decoder architecture
Widely used in translation and summarization
Basic Seq2Seq models have context bottlenecks

Next lesson: Encoder–Decoder Architecture (Deep Dive)

← Previous Course Index Next →