Generative AI Course
Similarity Search
Once text is converted into embeddings, the next challenge is finding meaning efficiently.
Similarity search is the mechanism that allows machines to retrieve relevant information based on semantics, not exact keyword matches.
This concept is central to search engines, recommendation systems, and RAG pipelines.
The Problem Similarity Search Solves
Traditional search relies on exact word matches.
This fails when:
- Users phrase questions differently
- Synonyms are used
- Conceptual meaning matters more than keywords
Similarity search addresses these issues by comparing meaning rather than text.
Thinking Before Coding
Before writing any code, ask:
What exactly are we comparing?
In similarity search, we compare vectors, not strings.
That means every input must be embedded first.
High-Level Similarity Search Flow
A typical workflow looks like this:
- Embed all documents or chunks
- Store embeddings
- Embed the user query
- Compare query embedding to stored vectors
- Return the most similar results
Every production system follows this pattern.
Vector Distance and Similarity
To compare embeddings, we need a numerical similarity measure.
The most common choice is cosine similarity.
Why Cosine Similarity?
Cosine similarity measures the angle between vectors, not their magnitude.
This makes it ideal for comparing semantic meaning.
Simple Similarity Calculation
Let’s start with a small, controlled example to understand how similarity works.
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
doc_1 = np.array([0.8, 0.2])
doc_2 = np.array([0.75, 0.25])
doc_3 = np.array([0.1, 0.9])
query = np.array([0.78, 0.22])
print(cosine_similarity(query, doc_1))
print(cosine_similarity(query, doc_2))
print(cosine_similarity(query, doc_3))
Before running this code, understand the intent:
queryrepresents user intentdoc_*represent stored document chunks- The highest score indicates closest meaning
The query is semantically closer to doc_1 and doc_2
than doc_3.
Ranking Results
In real systems, you do not return a single match.
You rank results by similarity and return the top-k items.
Ranking Logic Example
documents = {
"doc_1": doc_1,
"doc_2": doc_2,
"doc_3": doc_3
}
scores = {
name: cosine_similarity(query, vec)
for name, vec in documents.items()
}
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
print(ranked)
This ranking step is critical.
It determines which information the model will see next in a RAG system.
Similarity Search at Scale
The examples so far use only a few vectors.
In real applications, you may have millions of embeddings.
Brute-force comparison becomes too slow.
Approximate Nearest Neighbor (ANN)
To scale similarity search, systems use approximate methods.
These trade tiny accuracy loss for massive performance gains.
Vector databases implement these techniques internally.
Where Similarity Search Is Used
Similarity search powers:
- Semantic document search
- Recommendation engines
- Duplicate detection
- RAG pipelines
If embeddings are the foundation, similarity search is the engine.
Common Mistakes to Avoid
- Comparing raw text instead of vectors
- Mixing embedding models
- Ignoring normalization
These mistakes lead to incorrect rankings.
Practice
What must text be converted into before similarity search?
Which similarity metric is most commonly used?
What step orders results by relevance?
Quick Quiz
Similarity search compares:
What determines which result is returned first?
Why are vector databases used?
Recap: Similarity search retrieves relevant information by comparing embeddings, not keywords.
Next up: We introduce vector databases — purpose-built systems for storing and searching embeddings.