NLP Lesson 36 – Seq2Seq | Dataplexa

Sequence-to-Sequence (Seq2Seq) Models

So far, you have learned how RNNs, LSTMs, GRUs, and Bidirectional models process sequences and understand context.

But many real-world NLP problems require something more powerful: mapping one sequence to another sequence.

In this lesson, you will understand:

  • What Seq2Seq models are
  • Why simple RNNs are not enough
  • How encoder–decoder architecture works
  • Where Seq2Seq models are used in NLP

What Is a Sequence-to-Sequence Problem?

A Sequence-to-Sequence (Seq2Seq) problem is one where:

  • The input is a sequence
  • The output is also a sequence
  • The lengths of input and output may be different

Examples:

  • Machine translation (English → French)
  • Text summarization (long text → short summary)
  • Question answering (question → answer sentence)
  • Speech-to-text

Why Normal RNNs Are Not Enough

A standard RNN usually:

  • Takes one input
  • Produces one output

But in translation:

  • Input sentence length ≠ output sentence length
  • We need to understand the full input before generating output

This limitation led to the development of Seq2Seq models.


Core Idea of Seq2Seq Models

Seq2Seq models split the task into two main parts:

  • Encoder: reads and understands the input sequence
  • Decoder: generates the output sequence

The encoder converts the input sequence into a context representation, and the decoder uses that context to produce the output step by step.


Encoder–Decoder Architecture

The architecture looks like this conceptually:

  • Encoder RNN reads input word by word
  • Final hidden state represents the entire input meaning
  • Decoder RNN starts generating output words

The encoder and decoder can be built using:

  • LSTM
  • GRU
  • Bidirectional variants (encoder side)

Simple Example: Machine Translation

Consider this translation:

Input (English): “I love NLP” Output (French): “J’aime le NLP”

How Seq2Seq handles this:

  • Encoder reads “I → love → NLP”
  • Encoder creates a context vector
  • Decoder generates “J’aime → le → NLP”

The decoder stops when it predicts an end-of-sentence token.


Training vs Inference (Very Important)

Seq2Seq models behave differently during:

  • Training
  • Inference (prediction)

During training:

  • The correct previous word is given to the decoder
  • This is called Teacher Forcing

During inference:

  • The decoder uses its own previous prediction
  • Errors can accumulate

Basic Seq2Seq Model (Conceptual Code)

Below is a simplified conceptual example of a Seq2Seq model.

Where to run this code:

  • Google Colab (recommended)
  • Jupyter Notebook with TensorFlow
Python Example: Basic Seq2Seq Architecture
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model

# Encoder
encoder_inputs = Input(shape=(None, 128))
encoder_lstm = LSTM(256, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, 128))
decoder_lstm = LSTM(256, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(5000, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

Understanding the Code

Let’s break it down step by step:

  • Encoder: processes the full input sequence
  • Encoder states: capture the sentence meaning
  • Decoder: generates output using encoder states
  • Dense layer: predicts the next word

This structure forms the foundation of many NLP systems.


Limitations of Basic Seq2Seq Models

Classic Seq2Seq models have important limitations:

  • Single context vector can become a bottleneck
  • Long sentences are harder to encode
  • Performance drops with long dependencies

These limitations led to the development of Attention mechanisms, which you will learn next.


Where Seq2Seq Models Are Used

Seq2Seq models form the backbone of:

  • Machine translation systems
  • Chatbots
  • Text summarization
  • Speech recognition

Even modern transformers are conceptually built on the encoder–decoder idea.


Assignment / Homework

Theory:

  • Explain why Seq2Seq models are needed
  • Describe encoder and decoder roles

Practical:

  • Build a simple Seq2Seq model using dummy data
  • Experiment with LSTM vs GRU

Practice Environment:

  • Google Colab
  • Jupyter Notebook

Practice Questions

Q1. What is the main goal of a Seq2Seq model?

To map an input sequence to an output sequence.

Q2. What does the encoder produce?

A context representation of the input sequence.

Quick Quiz

Q1. Which technique feeds the correct word during training?

Teacher Forcing.

Q2. Why do Seq2Seq models struggle with long sentences?

Because all information is compressed into a single context vector.

Quick Recap

  • Seq2Seq models map sequences to sequences
  • They use encoder–decoder architecture
  • Widely used in translation and summarization
  • Basic Seq2Seq models have context bottlenecks

Next lesson: Encoder–Decoder Architecture (Deep Dive)