Word Embeddings – Overview
In the previous lesson, you learned about One-Hot Encoding. While it helps convert words into numbers, it fails to capture meaning and relationships between words.
To solve this problem, NLP introduced a powerful idea called Word Embeddings. This lesson explains what embeddings are, why they are important, and how they changed modern NLP.
Why One-Hot Encoding Is Not Enough
Recall one-hot encoding:
- Each word gets a unique vector
- Vectors are large and sparse
- No semantic relationship is captured
For example:
king and queen are as unrelated as king and banana in one-hot space.
Humans know that king and queen are related. Machines need a way to learn this relationship.
What Are Word Embeddings?
Word embeddings are dense numerical vectors that represent words based on their meaning and context.
Instead of binary vectors, embeddings use real-valued numbers. Words with similar meanings have vectors that are close to each other.
In simple terms:
- One-hot encoding → identity
- Word embeddings → meaning
Core Idea Behind Word Embeddings
The fundamental idea is:
"Words appearing in similar contexts tend to have similar meanings."
This is called the Distributional Hypothesis.
For example:
- "I drank a cup of tea"
- "I drank a cup of coffee"
The words tea and coffee appear in similar contexts, so their vectors should be close.
How Word Embeddings Represent Words
Each word is represented as a vector like:
[0.25, -0.71, 0.13, 0.89, ...]
Key properties:
- Fixed vector size (e.g., 50, 100, 300)
- Dense (few zeros)
- Captures semantic relationships
One-Hot Encoding vs Word Embeddings
| Aspect | One-Hot Encoding | Word Embeddings |
|---|---|---|
| Vector size | Vocabulary size | Fixed (e.g., 100) |
| Sparsity | Very sparse | Dense |
| Semantic meaning | No | Yes |
| Memory usage | High | Efficient |
| Used in modern NLP | Rarely | Extensively |
Geometric Intuition (Very Important)
Word embeddings live in a high-dimensional space.
In this space:
- Similar words → closer vectors
- Unrelated words → far apart
This allows mathematical operations like:
king − man + woman ≈ queen
This is one of the most famous demonstrations of embeddings.
How Are Word Embeddings Learned?
Word embeddings are learned from large text corpora.
The model learns:
- Which words appear together
- How often they appear
- In what contexts
Based on this, vector values are adjusted during training.
Popular Word Embedding Techniques
Several methods are used to create embeddings:
- Word2Vec (Skip-gram, CBOW)
- GloVe
- FastText
We will study each of these in upcoming lessons.
Simple Conceptual Example
Assume we have a small embedding space:
| Word | Vector (simplified) |
|---|---|
| king | [0.8, 0.7] |
| queen | [0.78, 0.72] |
| apple | [0.1, -0.5] |
Here, king and queen are close, while apple is far away.
Where Are Word Embeddings Used?
- Search engines
- Chatbots
- Translation systems
- Recommendation systems
- Large Language Models
Almost all modern NLP systems rely on embeddings.
Practice: Where to Try This Yourself
You will start practicing embeddings from the next lesson.
Recommended environments:
- Google Colab (best for beginners)
- Jupyter Notebook
- VS Code with Python
No heavy setup is required at this stage.
Assignment / Homework
Theory Tasks:
- Explain why one-hot encoding fails to capture meaning
- Write your own definition of word embeddings
Thinking Task:
- Why do you think fixed-size vectors are important?
Practice Questions
Q1. What problem do word embeddings solve?
Q2. Are word embeddings sparse or dense?
Quick Quiz
Q1. Which encoding captures semantic similarity?
Q2. Which is memory efficient?
Quick Recap
- One-hot encoding lacks meaning
- Word embeddings represent semantic relationships
- Similar words have similar vectors
- Embeddings are dense and efficient
- Foundation for modern NLP models
In the next lesson, we will go deeper and learn Word2Vec — the first major breakthrough in word embeddings.