Learn RAG chunking strategies — fixed-size, semantic, recursive, and parent-document chunking with practical guidelines for chunk size and overlap.

Chunking Strategies for RAG

Chunking is the process of splitting documents into smaller segments for embedding and retrieval in RAG systems, directly impacting retrieval quality, relevance, and generation accuracy.

What It Really Means

A RAG pipeline retrieves relevant chunks of text and feeds them to an LLM. The quality of the final answer depends heavily on whether the retrieved chunks contain the right information at the right granularity.

Too large: A 2,000-token chunk might contain the answer but buried in irrelevant context. The embedding model averages the semantics of the entire chunk, diluting the signal. The chunk matches broadly but lacks precision.

Too small: A 50-token chunk might contain a key fact but lack the context needed to interpret it. "The retention rate improved by 40%" means nothing without knowing what product, time period, or comparison baseline.

Chunking strategy is the art of finding the right granularity — chunks that are self-contained enough to be meaningful but focused enough to be retrievable. This is one of the most impactful and underappreciated decisions in RAG system design.

How It Works in Practice

Strategy 1: Fixed-Size Chunking

Split text into chunks of N tokens with M tokens of overlap.

Chunk size: 256-1024 tokens (512 is a common default)
Overlap: 10-20% of chunk size (64-128 tokens)
Pros: Simple, predictable, works everywhere
Cons: Splits mid-sentence, ignores document structure

Strategy 2: Recursive Character Splitting

Split by hierarchy of separators: paragraphs → sentences → words.

First try splitting on "\n\n" (paragraphs)
If chunks are too large, split on "\n" (lines)
If still too large, split on ". " (sentences)
If still too large, split on " " (words)
Pros: Respects natural text boundaries
Cons: Uneven chunk sizes

Strategy 3: Semantic Chunking

Split where the topic changes. Embed consecutive sentences and split where cosine similarity drops.

Embed each sentence
Compare consecutive sentence embeddings
Split where similarity drops below a threshold
Pros: Chunks are topically coherent
Cons: Expensive (requires embedding every sentence), unpredictable chunk sizes

Strategy 4: Document-Structured Chunking

Use document structure (headings, sections, code blocks) as natural boundaries.

Markdown: split on headings (##, ###)
HTML: split on semantic tags (section, article, h2)
Code: split on functions, classes, or modules
Pros: Preserves author's intended structure
Cons: Requires format-specific parsers

Strategy 5: Parent-Document Retrieval

Index small chunks for retrieval but return larger parent documents.

Create small chunks (256 tokens) for precise matching
Each small chunk references its parent chunk (1024 tokens)
Retrieve using small chunks, return parent chunks to the LLM
Pros: Precise retrieval + sufficient context
Cons: More complex indexing, higher storage

Implementation

python

Trade-offs

Chunk Size Guidelines

Use Case	Recommended Size	Rationale
Q&A / FAQ	256-512 tokens	Short, focused answers
Technical docs	512-1024 tokens	Need enough context for procedures
Legal documents	1024-2048 tokens	Clauses need surrounding context
Code	Function/class level	Natural semantic boundaries

When to Use Each Strategy

Fixed-size: Default starting point, works for most cases
Recursive: When document has natural paragraph/section structure
Semantic: When topic coherence matters more than fixed boundaries
Document-structured: Markdown, HTML, or code with clear structure
Parent-document: When you need both precise retrieval and rich context

Advantages of Good Chunking

Directly improves retrieval precision and recall
Reduces noise in LLM context, improving generation quality
Enables efficient token budgeting

Disadvantages of Over-Engineering

Semantic chunking is expensive at scale (embedding every sentence)
Complex chunking strategies are harder to debug
Domain-specific chunking requires custom parsers for each format

Common Misconceptions

"There is one optimal chunk size" — The optimal size depends on document type, query patterns, embedding model, and use case. Always test multiple sizes on your specific data.
"Overlap is always necessary" — Overlap helps when chunks split mid-topic, but adds redundancy and cost. With semantic or structure-based chunking, overlap is often unnecessary.
"Smaller chunks are always more precise" — Tiny chunks lose context. The sentence "It increased by 40%" is useless without knowing what "it" refers to. Context is essential for both embedding quality and LLM comprehension.
"Chunking is a one-time setup" — As your documents, queries, and models evolve, your chunking strategy should be re-evaluated. What works for v1 may not work for v2.

How This Appears in Interviews

Chunking strategy questions test practical RAG engineering knowledge:

"How would you chunk a 500-page technical manual for a RAG system?" — discuss structure-based chunking on headings, appropriate chunk sizes for technical content, and parent-document retrieval. See our interview questions on RAG systems.
"Your RAG system returns correct documents but the LLM gives wrong answers. What could be wrong?" — chunks may be splitting relevant context, chunks may be too large (diluted embeddings), or overlap may be insufficient.
"How do you handle tables and images in chunking?" — discuss multimodal embeddings, table serialization, and image captioning.

Related Concepts

RAG — Chunking is a critical step in the RAG pipeline
Vector Embeddings — Chunk quality affects embedding quality
Semantic Search — Retrieval performance depends on chunking
Embedding Models — Different models handle different chunk sizes
Token Budgeting — Chunk size directly impacts token usage
Algoroq Pricing — Practice RAG design interview questions

Chunking Strategies for RAG Explained: How to Split Documents for Optimal Retrieval