NLP Lesson 41 – Machine Translation | Dataplexa

Machine Translation with Attention

In earlier lessons, you learned how Seq2Seq models work and how Attention mechanisms (Bahdanau and Luong) solve their major limitations.

Now we bring everything together. This lesson explains how Machine Translation systems actually use attention to translate sentences more accurately.

This is a very important lesson for:

  • NLP fundamentals
  • Deep learning interviews
  • Understanding Transformers later

What Is Machine Translation?

Machine Translation (MT) is the task of automatically translating text from one language to another.

Examples:

  • English → French
  • Spanish → English
  • Hindi → Telugu

Modern MT systems are built using neural networks and are called Neural Machine Translation (NMT).


Problems with Early Translation Systems

Before neural networks, translation systems were:

  • Rule-based (grammar rules)
  • Statistical (phrase tables)

These systems:

  • Were hard to scale
  • Failed on complex sentences
  • Did not generalize well

Neural models improved this dramatically.


Seq2Seq Translation (Without Attention)

In a basic Seq2Seq translation model:

  • The encoder reads the source sentence
  • The decoder generates the target sentence
  • A single context vector connects them

Example:

"I love NLP" → "J'aime le NLP"

The major issue:

  • One fixed context vector is not enough

Why Attention Is Critical in Translation

In real translation:

  • Different output words depend on different input words
  • Word order may change
  • Long sentences need selective focus

Attention allows the decoder to:

  • Look at relevant source words
  • Ignore irrelevant parts
  • Align words properly

Translation with Attention – High-Level Flow

At each decoding step:

  1. Decoder predicts the next word
  2. Attention scores are computed
  3. Relevant encoder states are weighted
  4. A context vector is formed
  5. The output word is generated

This happens for every word in the translated sentence.


Word Alignment (Key Concept)

Attention implicitly learns word alignment.

Example:

English: "I eat apples"
French: "Je mange des pommes"

Attention learns:

  • "I" ↔ "Je"
  • "eat" ↔ "mange"
  • "apples" ↔ "pommes"

This alignment is learned automatically — no labels needed.


Encoder–Decoder with Attention Architecture

The full architecture consists of:

  • Encoder: processes source sentence
  • Attention: computes relevance
  • Decoder: generates translated sentence

The attention module sits between encoder and decoder, acting as a dynamic bridge.


Conceptual Translation Pseudocode

This pseudocode shows how translation works with attention.

Practice Environment:

  • Google Colab
  • Jupyter Notebook
Machine Translation with Attention – Flow
encoder_states = encoder(source_sentence)

decoder_state = init_state
for each target_word:
    scores = attention(decoder_state, encoder_states)
    weights = softmax(scores)
    context = sum(weights * encoder_states)

    output_word, decoder_state = decoder(context, decoder_state)
    generate(output_word)

Why Translation Quality Improves

Attention improves translation because:

  • Context is dynamic
  • Long sentences are handled better
  • Word order differences are learned
  • Rare words are translated more accurately

This was a major leap in NLP performance.


Bahdanau vs Luong in Translation

Aspect Bahdanau Luong
Scoring Additive (NN-based) Multiplicative
Speed Slower Faster
Accuracy Very high High

Real-World Translation Systems

Attention-based translation is used in:

  • Google Translate (earlier NMT versions)
  • Microsoft Translator
  • Speech-to-text pipelines

Later, Transformers replaced RNN-based models, but attention remains the core idea.


Assignment / Homework

Theory:

  • Explain how attention helps word alignment
  • Why fixed context vectors fail

Practical:

  • Implement a simple attention-based translation model
  • Visualize attention weights for a translated sentence

Environment:

  • Google Colab
  • Jupyter Notebook

Practice Questions

Q1. What problem does attention solve in translation?

It removes the fixed context bottleneck and enables dynamic focus.

Q2. What is word alignment?

Mapping between source words and translated target words.

Quick Quiz

Q1. Does attention create one context vector or many?

Many — one per decoding step.

Q2. Which model replaced RNN-based translation later?

Transformers.

Quick Recap

  • Machine Translation converts text between languages
  • Attention enables dynamic focus on source words
  • Word alignment is learned automatically
  • Translation quality improves significantly
  • Attention paved the way for Transformers

Next lesson: Text Summarization