NLP Lesson 17 – Word Embeddings | Dataplexa

Word Embeddings – Overview

In the previous lesson, you learned about One-Hot Encoding. While it helps convert words into numbers, it fails to capture meaning and relationships between words.

To solve this problem, NLP introduced a powerful idea called Word Embeddings. This lesson explains what embeddings are, why they are important, and how they changed modern NLP.


Why One-Hot Encoding Is Not Enough

Recall one-hot encoding:

  • Each word gets a unique vector
  • Vectors are large and sparse
  • No semantic relationship is captured

For example:

king and queen are as unrelated as king and banana in one-hot space.

Humans know that king and queen are related. Machines need a way to learn this relationship.


What Are Word Embeddings?

Word embeddings are dense numerical vectors that represent words based on their meaning and context.

Instead of binary vectors, embeddings use real-valued numbers. Words with similar meanings have vectors that are close to each other.

In simple terms:

  • One-hot encoding → identity
  • Word embeddings → meaning

Core Idea Behind Word Embeddings

The fundamental idea is:

"Words appearing in similar contexts tend to have similar meanings."

This is called the Distributional Hypothesis.

For example:

  • "I drank a cup of tea"
  • "I drank a cup of coffee"

The words tea and coffee appear in similar contexts, so their vectors should be close.


How Word Embeddings Represent Words

Each word is represented as a vector like:

[0.25, -0.71, 0.13, 0.89, ...]

Key properties:

  • Fixed vector size (e.g., 50, 100, 300)
  • Dense (few zeros)
  • Captures semantic relationships

One-Hot Encoding vs Word Embeddings

Aspect One-Hot Encoding Word Embeddings
Vector size Vocabulary size Fixed (e.g., 100)
Sparsity Very sparse Dense
Semantic meaning No Yes
Memory usage High Efficient
Used in modern NLP Rarely Extensively

Geometric Intuition (Very Important)

Word embeddings live in a high-dimensional space.

In this space:

  • Similar words → closer vectors
  • Unrelated words → far apart

This allows mathematical operations like:

king − man + woman ≈ queen

This is one of the most famous demonstrations of embeddings.


How Are Word Embeddings Learned?

Word embeddings are learned from large text corpora.

The model learns:

  • Which words appear together
  • How often they appear
  • In what contexts

Based on this, vector values are adjusted during training.


Popular Word Embedding Techniques

Several methods are used to create embeddings:

  • Word2Vec (Skip-gram, CBOW)
  • GloVe
  • FastText

We will study each of these in upcoming lessons.


Simple Conceptual Example

Assume we have a small embedding space:

Word Vector (simplified)
king [0.8, 0.7]
queen [0.78, 0.72]
apple [0.1, -0.5]

Here, king and queen are close, while apple is far away.


Where Are Word Embeddings Used?

  • Search engines
  • Chatbots
  • Translation systems
  • Recommendation systems
  • Large Language Models

Almost all modern NLP systems rely on embeddings.


Practice: Where to Try This Yourself

You will start practicing embeddings from the next lesson.

Recommended environments:

  • Google Colab (best for beginners)
  • Jupyter Notebook
  • VS Code with Python

No heavy setup is required at this stage.


Assignment / Homework

Theory Tasks:

  • Explain why one-hot encoding fails to capture meaning
  • Write your own definition of word embeddings

Thinking Task:

  • Why do you think fixed-size vectors are important?

Practice Questions

Q1. What problem do word embeddings solve?

They capture semantic meaning and relationships between words.

Q2. Are word embeddings sparse or dense?

Dense.

Quick Quiz

Q1. Which encoding captures semantic similarity?

Word embeddings.

Q2. Which is memory efficient?

Word embeddings.

Quick Recap

  • One-hot encoding lacks meaning
  • Word embeddings represent semantic relationships
  • Similar words have similar vectors
  • Embeddings are dense and efficient
  • Foundation for modern NLP models

In the next lesson, we will go deeper and learn Word2Vec — the first major breakthrough in word embeddings.