Chunking Strategies for RAG Explained: How to Split Documents for Optimal Retrieval

Learn RAG chunking strategies — fixed-size, semantic, recursive, and parent-document chunking with practical guidelines for chunk size and overlap.

chunkingragtext-splittingretrievaldocument-processing

Chunking Strategies for RAG

Chunking is the process of splitting documents into smaller segments for embedding and retrieval in RAG systems, directly impacting retrieval quality, relevance, and generation accuracy.

What It Really Means

A RAG pipeline retrieves relevant chunks of text and feeds them to an LLM. The quality of the final answer depends heavily on whether the retrieved chunks contain the right information at the right granularity.

Too large: A 2,000-token chunk might contain the answer but buried in irrelevant context. The embedding model averages the semantics of the entire chunk, diluting the signal. The chunk matches broadly but lacks precision.

Too small: A 50-token chunk might contain a key fact but lack the context needed to interpret it. "The retention rate improved by 40%" means nothing without knowing what product, time period, or comparison baseline.

Chunking strategy is the art of finding the right granularity — chunks that are self-contained enough to be meaningful but focused enough to be retrievable. This is one of the most impactful and underappreciated decisions in RAG system design.

How It Works in Practice

Strategy 1: Fixed-Size Chunking

Split text into chunks of N tokens with M tokens of overlap.

  • Chunk size: 256-1024 tokens (512 is a common default)
  • Overlap: 10-20% of chunk size (64-128 tokens)
  • Pros: Simple, predictable, works everywhere
  • Cons: Splits mid-sentence, ignores document structure

Strategy 2: Recursive Character Splitting

Split by hierarchy of separators: paragraphs → sentences → words.

  • First try splitting on "\n\n" (paragraphs)
  • If chunks are too large, split on "\n" (lines)
  • If still too large, split on ". " (sentences)
  • If still too large, split on " " (words)
  • Pros: Respects natural text boundaries
  • Cons: Uneven chunk sizes

Strategy 3: Semantic Chunking

Split where the topic changes. Embed consecutive sentences and split where cosine similarity drops.

  • Embed each sentence
  • Compare consecutive sentence embeddings
  • Split where similarity drops below a threshold
  • Pros: Chunks are topically coherent
  • Cons: Expensive (requires embedding every sentence), unpredictable chunk sizes

Strategy 4: Document-Structured Chunking

Use document structure (headings, sections, code blocks) as natural boundaries.

  • Markdown: split on headings (##, ###)
  • HTML: split on semantic tags (section, article, h2)
  • Code: split on functions, classes, or modules
  • Pros: Preserves author's intended structure
  • Cons: Requires format-specific parsers

Strategy 5: Parent-Document Retrieval

Index small chunks for retrieval but return larger parent documents.

  • Create small chunks (256 tokens) for precise matching
  • Each small chunk references its parent chunk (1024 tokens)
  • Retrieve using small chunks, return parent chunks to the LLM
  • Pros: Precise retrieval + sufficient context
  • Cons: More complex indexing, higher storage

Implementation

python

Trade-offs

Chunk Size Guidelines

Use CaseRecommended SizeRationale
Q&A / FAQ256-512 tokensShort, focused answers
Technical docs512-1024 tokensNeed enough context for procedures
Legal documents1024-2048 tokensClauses need surrounding context
CodeFunction/class levelNatural semantic boundaries

When to Use Each Strategy

  • Fixed-size: Default starting point, works for most cases
  • Recursive: When document has natural paragraph/section structure
  • Semantic: When topic coherence matters more than fixed boundaries
  • Document-structured: Markdown, HTML, or code with clear structure
  • Parent-document: When you need both precise retrieval and rich context

Advantages of Good Chunking

  • Directly improves retrieval precision and recall
  • Reduces noise in LLM context, improving generation quality
  • Enables efficient token budgeting

Disadvantages of Over-Engineering

  • Semantic chunking is expensive at scale (embedding every sentence)
  • Complex chunking strategies are harder to debug
  • Domain-specific chunking requires custom parsers for each format

Common Misconceptions

  • "There is one optimal chunk size" — The optimal size depends on document type, query patterns, embedding model, and use case. Always test multiple sizes on your specific data.

  • "Overlap is always necessary" — Overlap helps when chunks split mid-topic, but adds redundancy and cost. With semantic or structure-based chunking, overlap is often unnecessary.

  • "Smaller chunks are always more precise" — Tiny chunks lose context. The sentence "It increased by 40%" is useless without knowing what "it" refers to. Context is essential for both embedding quality and LLM comprehension.

  • "Chunking is a one-time setup" — As your documents, queries, and models evolve, your chunking strategy should be re-evaluated. What works for v1 may not work for v2.

How This Appears in Interviews

Chunking strategy questions test practical RAG engineering knowledge:

  • "How would you chunk a 500-page technical manual for a RAG system?" — discuss structure-based chunking on headings, appropriate chunk sizes for technical content, and parent-document retrieval. See our interview questions on RAG systems.
  • "Your RAG system returns correct documents but the LLM gives wrong answers. What could be wrong?" — chunks may be splitting relevant context, chunks may be too large (diluted embeddings), or overlap may be insufficient.
  • "How do you handle tables and images in chunking?" — discuss multimodal embeddings, table serialization, and image captioning.

Related Concepts

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.