Generative AI Course
ChromaDB
In the previous lesson, you learned what vector databases are and why they are essential for modern GenAI systems.
Now we move from theory to a real, developer-friendly tool: ChromaDB.
ChromaDB is an open-source vector database designed to be simple, lightweight, and easy to integrate into GenAI projects.
Why ChromaDB Exists
Many developers do not want to start with complex, cloud-heavy infrastructure.
They want:
- Local development
- Simple APIs
- Fast semantic search
- Easy experimentation with embeddings
ChromaDB was created to fill this gap.
When Should You Use ChromaDB?
Before writing code, ask yourself:
Am I building or testing a GenAI system that needs semantic search?
If the answer is yes, ChromaDB is often a good first choice.
It is commonly used for:
- RAG prototypes
- Local chatbots
- Document Q&A systems
- Learning and experimentation
High-Level Workflow
Every ChromaDB project follows the same logical steps:
- Create a collection
- Add documents with embeddings
- Store metadata
- Query the collection
- Retrieve relevant results
Understanding this flow is more important than memorizing the API.
Installing ChromaDB
Before coding, ensure ChromaDB is installed.
pip install chromadb
This installs the core ChromaDB library for local development.
Creating a ChromaDB Client
The first step in any ChromaDB project is creating a client.
Think of the client as the entry point to your vector database.
import chromadb
client = chromadb.Client()
At this stage:
- No data is stored yet
- No embeddings exist
- You are just initializing the system
Creating a Collection
A collection is where vectors are stored.
You can think of it like a table, but optimized for embeddings.
collection = client.create_collection(
name="documents"
)
This collection will later store:
- Text chunks
- Embeddings
- Metadata
Adding Documents
Before adding data, understand the goal:
We want ChromaDB to retrieve documents by meaning.
That means each document must be stored with content and an identifier.
collection.add(
documents=[
"Generative AI creates new content",
"Vector databases store embeddings",
"ChromaDB enables semantic search"
],
ids=["doc1", "doc2", "doc3"]
)
What happens internally:
- Text is embedded
- Vectors are indexed
- Documents become searchable
Querying the Collection
Now comes the most important part: retrieving relevant information.
Before writing the query, think:
What is the user trying to find?
results = collection.query(
query_texts=["How does semantic search work?"],
n_results=2
)
print(results)
ChromaDB performs:
- Query embedding
- Similarity search
- Result ranking
The results returned are the most semantically relevant documents.
Why This Matters for Jobs
In real GenAI roles, you will:
- Store company knowledge
- Retrieve relevant context
- Feed results into LLMs
ChromaDB often becomes the backbone of these systems.
Common Beginner Mistakes
- Not chunking documents properly
- Storing too much text in one document
- Ignoring metadata
These mistakes reduce retrieval quality.
Practice
What is the main storage unit in ChromaDB?
What does ChromaDB store internally?
Which operation retrieves similar documents?
Quick Quiz
ChromaDB primarily supports:
ChromaDB is best suited for:
ChromaDB is commonly used in:
Recap: ChromaDB is a practical, developer-friendly vector database used to build semantic search and RAG systems.
Next up: Pinecone — a managed, production-grade vector database used at scale.