GloVe – Global Vectors for Word Representation
In the previous lesson, you learned Word2Vec, which learns word embeddings using local context windows.
In this lesson, we explore GloVe, a powerful embedding method that combines global statistics with vector learning.
By the end of this lesson, you will clearly understand:
- Why GloVe was introduced
- How it differs from Word2Vec
- How GloVe captures global meaning
- When to use GloVe in NLP tasks
Why Do We Need GloVe?
Word2Vec learns word meaning based on local context windows. It looks at nearby words but does not directly use global word statistics.
Example problem:
Word2Vec may not fully capture how frequently two words co-occur across the entire corpus.
GloVe was introduced to solve this limitation by using global co-occurrence information.
What Is GloVe?
GloVe stands for Global Vectors.
It is a word embedding technique that learns vectors by analyzing how often words appear together across the entire dataset.
Key idea:
Word meaning comes from global word co-occurrence statistics.
Core Intuition Behind GloVe
GloVe builds a large matrix called the co-occurrence matrix.
Each cell represents:
- How often word A appears near word B
Example:
- “ice” appears often near “cold”
- “fire” appears often near “hot”
GloVe learns vectors so that ratios of co-occurrence probabilities encode meaning.
Simple Example (Conceptual)
Consider the words:
- king
- queen
- man
- woman
GloVe captures global patterns such as:
king − man + woman ≈ queen
This happens because GloVe preserves global semantic relationships.
How GloVe Works (High-Level Steps)
GloVe training follows these steps:
- Scan the entire corpus
- Build a word–word co-occurrence matrix
- Apply a weighted least-squares objective
- Learn word vectors that encode ratios of co-occurrences
Unlike Word2Vec, GloVe is not a prediction model. It is a matrix factorization–based approach.
Word2Vec vs GloVe (Key Difference)
| Aspect | Word2Vec | GloVe |
|---|---|---|
| Learning method | Predictive | Count-based + optimization |
| Context type | Local window | Global corpus |
| Uses co-occurrence matrix | No (implicit) | Yes (explicit) |
| Semantic relationships | Good | Very strong |
| Training speed | Fast | Slower (large matrix) |
Why GloVe Produces Better Semantic Structure
Because GloVe uses global statistics, it captures:
- Word similarity
- Analogies
- Long-range relationships
This makes GloVe especially useful for semantic-heavy NLP tasks.
Using Pretrained GloVe Embeddings
In practice, we usually do NOT train GloVe from scratch.
Instead, we use pretrained embeddings such as:
- GloVe 50d, 100d, 200d, 300d
- Trained on Wikipedia or Common Crawl
These embeddings already contain rich language knowledge.
Simple Code Example (Loading GloVe)
Let us see how to load pretrained GloVe vectors.
Where to run this code:
- Google Colab (recommended)
- Jupyter Notebook
import numpy as np
glove_path = "glove.6B.50d.txt"
embeddings = {}
with open(glove_path, "r", encoding="utf-8") as f:
for line in f:
values = line.split()
word = values[0]
vector = np.asarray(values[1:], dtype="float32")
embeddings[word] = vector
print(embeddings["king"][:10])
Output Explanation:
- Each word maps to a dense numeric vector
- 50 numbers represent the meaning of the word
- Similar words have similar vectors
Where GloVe Is Used
- Text classification
- Sentiment analysis
- Named Entity Recognition
- Machine translation
- Semantic search
GloVe embeddings are widely used in both research and industry.
Assignment / Homework
Theory:
- Explain how GloVe differs from Word2Vec
- Explain what a co-occurrence matrix is
Practical:
- Download GloVe embeddings from Stanford NLP
- Load vectors for at least 5 words
- Compare similarity between related words
Practice Questions
Q1. What does GloVe stand for?
Q2. What type of information does GloVe mainly use?
Quick Quiz
Q1. Which model is predictive?
Q2. Which model explicitly uses a co-occurrence matrix?
Quick Recap
- GloVe uses global co-occurrence statistics
- It combines count-based and embedding approaches
- Produces strong semantic word vectors
- Often used via pretrained embeddings
In the next lesson, we will study FastText, which improves embeddings by using subword information.