Luong Attention (Multiplicative Attention)
In the previous lesson, you learned Bahdanau Attention, also called Additive Attention. That model introduced the idea of learning alignment using a small neural network.
In this lesson, we study another important attention mechanism: Luong Attention, also known as Multiplicative Attention.
Luong Attention is simpler, faster, and widely used in practice, especially when computational efficiency matters.
Why Luong Attention Was Introduced
While Bahdanau Attention works very well, it has one limitation:
- It uses an additional neural network for scoring
Luong Attention was introduced to:
- Reduce computation
- Simplify attention scoring
- Improve training speed
Instead of using an additive neural network, Luong Attention relies on vector multiplication.
Key Idea Behind Luong Attention
Luong Attention computes relevance by measuring similarity between:
- Decoder hidden state
- Encoder hidden states
The more similar they are, the higher the attention score.
This is similar to measuring how closely two vectors point in the same direction.
Main Components Used
Luong Attention uses:
- Encoder hidden states (h₁, h₂, …, hₙ)
- Decoder hidden state (sₜ)
- Optional trainable weight matrix (W)
No extra neural network is required.
Luong Attention Score Functions
Luong proposed three scoring methods. All are based on multiplication.
1. Dot Product Attention
The simplest form of Luong Attention.
score(st, hi) = stᵀ · hi
Here:
- No extra parameters
- Fast computation
- Works well when dimensions match
2. General (Scaled) Attention
This introduces a trainable weight matrix.
score(st, hi) = stᵀ · W · hi
This allows the model to learn a better similarity measure.
3. Concatenation Attention (Luong Variant)
A hybrid approach that slightly resembles Bahdanau, but still relies on multiplication internally.
This variant is less common in practice.
From Scores to Attention Weights
Like all attention mechanisms, Luong Attention uses softmax to normalize scores:
αt,i = softmax(score(st, hi))
These weights represent how much focus each input word receives.
Context Vector Computation
The context vector is calculated as:
ct = Σ αt,i · hi
This is identical to Bahdanau Attention. The difference lies only in how the scores are computed.
Why Luong Attention Is Faster
Luong Attention is computationally efficient because:
- No additional neural network
- Matrix multiplication is optimized on GPUs
- Fewer parameters to train
This makes it attractive for large datasets.
Conceptual Pseudocode
This pseudocode shows the logical flow.
Practice Environment:
- Google Colab
- Jupyter Notebook
for each decoder_step:
for each encoder_state:
score = dot(decoder_state, encoder_state)
attention_weights = softmax(scores)
context_vector = sum(attention_weights * encoder_states)
output = decoder(context_vector, decoder_state)
Luong vs Bahdanau Attention
| Aspect | Bahdanau | Luong |
|---|---|---|
| Scoring | Additive (NN-based) | Multiplicative |
| Speed | Slower | Faster |
| Parameters | More | Fewer |
| Best use | Smaller datasets | Large-scale training |
Real-World Usage
Luong Attention is commonly used in:
- Neural Machine Translation
- Speech recognition systems
- Large-scale NLP pipelines
Many early production systems preferred Luong Attention for its speed advantage.
Assignment / Homework
Theory:
- Explain why Luong Attention is faster than Bahdanau
- List the three Luong scoring methods
Practical:
- Implement dot-product attention using NumPy
- Compare outputs with additive attention
Environment:
- Google Colab
- Jupyter Notebook
Practice Questions
Q1. Why is Luong Attention called multiplicative?
Q2. Which Luong variant has no trainable parameters?
Quick Quiz
Q1. Which attention mechanism is faster in practice?
Q2. Does Luong Attention use a separate neural network for scoring?
Quick Recap
- Luong Attention uses vector multiplication
- It is faster and simpler than Bahdanau
- Dot, general, and concat variants exist
- Context vector logic remains the same
- Widely used in production systems
Next lesson: Machine Translation with Attention