NLP Lesson 17 – Word Embeddings | Dataplexa

Word Embeddings – Overview

In the previous lesson, you learned about One-Hot Encoding. While it helps convert words into numbers, it fails to capture meaning and relationships between words.

To solve this problem, NLP introduced a powerful idea called Word Embeddings. This lesson explains what embeddings are, why they are important, and how they changed modern NLP.

Why One-Hot Encoding Is Not Enough

Recall one-hot encoding:

Each word gets a unique vector
Vectors are large and sparse
No semantic relationship is captured

For example:

king and queen are as unrelated as king and banana in one-hot space.

Humans know that king and queen are related. Machines need a way to learn this relationship.

What Are Word Embeddings?

Word embeddings are dense numerical vectors that represent words based on their meaning and context.

Instead of binary vectors, embeddings use real-valued numbers. Words with similar meanings have vectors that are close to each other.

In simple terms:

One-hot encoding → identity
Word embeddings → meaning

Core Idea Behind Word Embeddings

The fundamental idea is:

"Words appearing in similar contexts tend to have similar meanings."

This is called the Distributional Hypothesis.

For example:

"I drank a cup of tea"
"I drank a cup of coffee"

The words tea and coffee appear in similar contexts, so their vectors should be close.

How Word Embeddings Represent Words

Each word is represented as a vector like:

[0.25, -0.71, 0.13, 0.89, ...]

Key properties:

Fixed vector size (e.g., 50, 100, 300)
Dense (few zeros)
Captures semantic relationships

One-Hot Encoding vs Word Embeddings

Aspect	One-Hot Encoding	Word Embeddings
Vector size	Vocabulary size	Fixed (e.g., 100)
Sparsity	Very sparse	Dense
Semantic meaning	No	Yes
Memory usage	High	Efficient
Used in modern NLP	Rarely	Extensively

Geometric Intuition (Very Important)

Word embeddings live in a high-dimensional space.

In this space:

Similar words → closer vectors
Unrelated words → far apart

This allows mathematical operations like:

king − man + woman ≈ queen

This is one of the most famous demonstrations of embeddings.

How Are Word Embeddings Learned?

Word embeddings are learned from large text corpora.

The model learns:

Which words appear together
How often they appear
In what contexts

Based on this, vector values are adjusted during training.

Popular Word Embedding Techniques

Several methods are used to create embeddings:

Word2Vec (Skip-gram, CBOW)
GloVe
FastText

We will study each of these in upcoming lessons.

Simple Conceptual Example

Assume we have a small embedding space:

Word	Vector (simplified)
king	[0.8, 0.7]
queen	[0.78, 0.72]
apple	[0.1, -0.5]

Here, king and queen are close, while apple is far away.

Where Are Word Embeddings Used?

Search engines
Chatbots
Translation systems
Recommendation systems
Large Language Models

Almost all modern NLP systems rely on embeddings.

Practice: Where to Try This Yourself

You will start practicing embeddings from the next lesson.

Recommended environments:

Google Colab (best for beginners)
Jupyter Notebook
VS Code with Python

No heavy setup is required at this stage.

Assignment / Homework

Theory Tasks:

Explain why one-hot encoding fails to capture meaning
Write your own definition of word embeddings

Thinking Task:

Why do you think fixed-size vectors are important?

Practice Questions

Q1. What problem do word embeddings solve?

They capture semantic meaning and relationships between words.

Q2. Are word embeddings sparse or dense?

Dense.

Quick Quiz

Q1. Which encoding captures semantic similarity?

Word embeddings.

Q2. Which is memory efficient?

Word embeddings.

Quick Recap

One-hot encoding lacks meaning
Word embeddings represent semantic relationships
Similar words have similar vectors
Embeddings are dense and efficient
Foundation for modern NLP models

In the next lesson, we will go deeper and learn Word2Vec — the first major breakthrough in word embeddings.

← Previous Course Index Next →