GenAI Lesson 32 – Enc-Dec | Dataplexa

Encoder–Decoder Architecture: How Transformers Generate Outputs

Not all generative problems are solved by predicting the next token.

Some tasks require the model to read an entire input, understand it deeply, and then produce a structured output.

This lesson explains how the encoder–decoder architecture enables that behavior, and how engineers decide when to use it.

The Problem This Architecture Solves

Consider tasks like:

Machine translation
Summarization
Question answering

In all these cases:

Input and output lengths differ
Output depends on the full input
Generation must be guided, not free-form

A single-stream model is not ideal here.

Why Encoder and Decoder Are Separated

Engineers separate responsibilities:

Encoder: Understand the input
Decoder: Generate the output

This separation makes the system easier to train, debug, and reason about.

What the Encoder Actually Does

The encoder processes the entire input sequence at once.

It produces contextual representations for every input token.

After encoding, the input is no longer raw text — it becomes structured meaning.

What the Decoder Actually Does

The decoder generates output tokens step by step.

At each step, it:

Looks at previously generated tokens
Attends to the encoder’s output
Predicts the next token

This is controlled, conditional generation.

How Information Flows Between Them

The key connection is cross-attention.

Cross-attention allows the decoder to:

Focus on relevant parts of the input
Ignore irrelevant details
Dynamically adjust during generation

This is why translations stay aligned with input meaning.

High-Level Architecture Flow

An encoder–decoder Transformer follows this flow:

Input tokens → encoder
Encoder outputs → memory
Decoder attends to memory + past outputs
Next token prediction

Each component has a clear role.

Thinking Like an Engineer Before Coding

Before implementation, engineers decide:

Input representation size
Number of encoder layers
Number of decoder layers
Attention head configuration

These decisions affect accuracy and latency.

Minimal Encoder–Decoder Skeleton

This example shows structure, not a full production model.


import torch
import torch.nn as nn

encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)

decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)

This code defines two independent stacks.

What matters is how they interact.

How Encoder Output Is Used

The encoder produces a memory tensor.

The decoder queries this memory at every generation step.

This is how the output remains grounded in the input.

Cross-Attention Conceptually

In cross-attention:

Queries come from the decoder
Keys and values come from the encoder

This allows precise alignment between input and output.

Why Decoder Is Autoregressive

The decoder predicts one token at a time.

Masking ensures it cannot see future tokens.

This preserves causal generation.

Where This Architecture Is Used Today

Translation systems
Instruction-following models
Speech-to-text pipelines

Encoder–decoder models are still widely used in production.

Common Learner Mistakes

Confusing self-attention with cross-attention
Thinking encoder output is static text
Assuming decoder sees the full output

Understanding data flow is critical here.

Practice

Which component processes the full input?

Which component generates output tokens?

What mechanism connects encoder and decoder?

Quick Quiz

Encoder primarily focuses on?

Input understanding
Output generation
Noise reduction

Why is the decoder masked?

Prevent future access
Improve speed
Save memory

Cross-attention enables?

Input–output alignment
Random generation
Parallel decoding

Recap: Encoder–decoder Transformers separate understanding from generation using cross-attention.

Next up: Decoder-Only Models — why GPT-style architectures dominate modern LLMs.

← Previous Course Index Next →

Generative AI Course