Generative AI Course
Chunking Strategies: How to Split Documents for Effective RAG
In a RAG system, documents are not stored or retrieved as full files.
They are broken into smaller pieces called chunks.
How you create these chunks directly determines whether your system retrieves the right information or fails silently.
Why Chunking Is Required
Large documents cannot be embedded or retrieved effectively as a single unit.
If a document contains many topics, only a small part may be relevant to a user query.
Chunking solves this by isolating meaningful sections.
Think Before You Chunk
Before writing any code, ask:
- What questions will users ask?
- How dense is the information?
- Do ideas span multiple paragraphs?
Chunking is a design decision, not a mechanical step.
Basic Fixed-Size Chunking
The simplest approach is splitting text by a fixed number of characters or tokens.
def fixed_chunk(text, size=500):
chunks = []
for i in range(0, len(text), size):
chunks.append(text[i:i+size])
return chunks
This method is easy to implement but often breaks semantic meaning.
What Goes Wrong with Fixed Chunking
Problems include:
- Sentences cut in half
- Concepts split across chunks
- Loss of contextual continuity
Retrieval accuracy suffers even if embeddings are good.
Sentence-Based Chunking
A better approach is splitting by sentences or paragraphs.
This preserves semantic boundaries.
import nltk
def sentence_chunk(text, max_sentences=5):
sentences = nltk.sent_tokenize(text)
chunks, current = [], []
for sentence in sentences:
current.append(sentence)
if len(current) >= max_sentences:
chunks.append(" ".join(current))
current = []
if current:
chunks.append(" ".join(current))
return chunks
Each chunk now represents a complete thought.
Overlapping Chunks
Some ideas span chunk boundaries.
Overlapping ensures no critical context is lost.
def overlapping_chunks(chunks, overlap=1):
final_chunks = []
for i in range(len(chunks)):
start = max(0, i - overlap)
combined = " ".join(chunks[start:i+1])
final_chunks.append(combined)
return final_chunks
This improves recall at the cost of storage and compute.
Chunk Size Trade-offs
There is no perfect chunk size.
- Small chunks → precise retrieval, less context
- Large chunks → more context, higher noise
Most production systems tune chunk size experimentally.
Chunking for Different Data Types
Chunking strategy depends on data:
- Technical docs → section-based
- FAQs → question-answer pairs
- Logs → time-based windows
One strategy does not fit all.
How Chunking Affects the Entire RAG Pipeline
Chunking influences:
- Embedding quality
- Retrieval relevance
- Prompt size
- Latency and cost
Bad chunking cannot be fixed later.
How Learners Should Practice Chunking
To truly understand chunking:
- Visualize chunks before embedding
- Test the same query on different strategies
- Inspect retrieved chunks manually
Chunking is learned by iteration, not memorization.
Practice
What process splits documents into smaller units?
What technique preserves cross-boundary context?
Chunking quality most affects which RAG stage?
Quick Quiz
What is the biggest risk of fixed-size chunking?
Why are overlapping chunks used?
Chunking should be treated as a:
Recap: Chunking determines how knowledge is stored, retrieved, and understood in RAG systems.
Next up: Indexing strategies — how chunks are stored for fast and accurate retrieval.