Encoder–Decoder Models
Encoder–Decoder models are a fundamental architecture in Deep Learning used to handle problems where the input and output are both sequences, but may have different lengths.
These models are the backbone of many powerful systems such as machine translation, text summarization, speech recognition, and conversational AI.
The Core Problem Encoder–Decoder Solves
Traditional neural networks assume fixed-size inputs and outputs.
However, many real-world problems do not follow this rule.
Examples:
• Translating a sentence from English to French • Converting speech audio into text • Generating a summary from a long document
The input sequence length and output sequence length are not the same. Encoder–Decoder architecture was designed specifically for this scenario.
High-Level Architecture
The architecture is divided into two main components:
Encoder Decoder
The encoder reads the entire input sequence and compresses the information into a fixed-length representation, often called a context vector.
The decoder then uses this representation to generate the output sequence step by step.
Understanding the Encoder
The encoder processes the input sequence one element at a time.
At each step, it updates its hidden state based on the current input and the previous hidden state.
Once the full input sequence is processed, the final hidden state is assumed to capture the meaning of the entire sequence.
In practice, encoders are implemented using:
• RNN • LSTM • GRU
Understanding the Decoder
The decoder is another sequence model that generates the output sequence.
It starts with the context vector produced by the encoder and generates one output element at a time.
At each step, the decoder:
• Uses the previous output • Uses the previous hidden state • Predicts the next output token
This continues until a special end-of-sequence token is generated.
Real-World Example: Language Translation
Suppose the input sentence is:
"I am learning deep learning"
The encoder reads the entire sentence and encodes its meaning.
The decoder then produces:
"Je suis en train d'apprendre l'apprentissage profond"
Even though the sentence lengths differ, the semantic meaning is preserved.
Limitations of Basic Encoder–Decoder Models
While powerful, early encoder–decoder models suffered from a major limitation.
They relied on a single fixed-length context vector to represent the entire input.
For long sequences, this becomes a bottleneck. Important information may be lost.
This limitation directly led to the development of Attention Mechanisms, which we will explore in upcoming lessons.
Simple Encoder–Decoder Structure (Conceptual Code)
encoder = LSTM(units=256, return_state=True)
decoder = LSTM(units=256, return_sequences=True)
# Encoder processes input sequence
encoder_output, state_h, state_c = encoder(input_sequence)
# Decoder uses encoder states
decoder_output = decoder(target_sequence,
initial_state=[state_h, state_c])
This example illustrates how the encoder’s final states initialize the decoder.
Why Encoder–Decoder Models Matter
Encoder–Decoder architecture is a turning point in sequence modeling.
It enables machines to:
• Understand entire sequences before responding • Generate flexible-length outputs • Model complex temporal relationships
Almost every modern sequence model builds upon this foundation.
Mini Practice
Think about the following:
• Why is a single fixed-size context vector limiting for long sentences? • What happens when the input sequence becomes very long?
Exercises
Exercise 1:
What is the primary role of the encoder?
Exercise 2:
Why does the decoder need previous outputs?
Quick Check
Q: Can encoder–decoder models handle variable-length input and output?
In the next lesson, we will build upon this architecture and explore Sequence-to-Sequence (Seq2Seq) models with practical insights.