Designing Complex AI Workflows with LangGraph
How to use LangGraph for stateful AI workflows with conditional routing, checkpointing, and human-in-the-loop patterns — and when it's overkill.
Akhil Sharma
February 22, 2026
Designing Complex AI Workflows with LangGraph
LangGraph models AI workflows as directed graphs with state. Nodes are functions, edges are transitions, and state persists across the entire execution. This is a fundamentally different model from chain-based frameworks — and it matters when your workflow needs loops, conditional branching, or human approval steps.
When LangGraph Makes Sense
LangGraph adds complexity. Before adopting it, make sure you need it:
- Linear chains (A → B → C): Use LangChain's LCEL or plain function composition. A graph is unnecessary overhead.
- Single agent with tools: Use
create_react_agentfrom LangGraph's prebuilt module, or any tool-calling framework. You don't need a custom graph. - Conditional branching (if X then A else B): This is where LangGraph starts earning its complexity.
- Loops (retry, iterative refinement, human-in-the-loop): This is LangGraph's sweet spot.
State Design: The Foundation
Every LangGraph workflow starts with a state class. This is the single source of truth that flows through the entire graph. Get the state wrong, and you'll be fighting the framework.
Key design principle: state fields should represent data, not control flow. Don't add a current_step field — that's what the graph edges handle. Don't add a should_retry boolean — use a conditional edge that checks the quality score.
The Annotated type with a reducer function controls how state updates merge. add_messages appends to the message list. For other fields, the default behavior is to overwrite. Custom reducers let you implement accumulation, deduplication, or other merge strategies.
Building a Real Workflow: Document Processing Pipeline
Let's build a document processing pipeline that extracts structured data, reviews quality, and routes to human approval when confidence is low.
AI Engineering Cohort
We build this end-to-end in the cohort.
Live sessions, real systems, your questions answered in real time. Next cohort starts 2nd July 2026 — 20 seats.
Reserve your spot →def review_quality(state: DocumentProcessingState) -> dict: sections = state["sections"] response = llm.invoke([ {"role": "system", "content": "Rate the quality of this extraction on a scale of 0-1. " "Check for: completeness, accuracy of entity extraction, summary coherence. " "Return JSON: {"score": float, "issues": [str]}"}, {"role": "user", "content": json.dumps(sections)}, ]) result = json.loads(response.content) return { "quality_score": result["score"], "needs_approval": result["score"] < 0.8, }
def format_output(state: DocumentProcessingState) -> dict: return {"output": json.dumps(state["sections"], indent=2)}
def human_review(state: DocumentProcessingState) -> dict: # This node pauses execution and waits for human input # The checkpoint system persists the state return {"needs_approval": True}
def route_after_review(state: DocumentProcessingState) -> str: if state["quality_score"] >= 0.8: return "format_output" return "human_review"
Build the graph
graph = StateGraph(DocumentProcessingState)
graph.add_node("parse", parse_document) graph.add_node("extract", extract_sections) graph.add_node("review", review_quality) graph.add_node("format_output", format_output) graph.add_node("human_review", human_review)
graph.add_edge("parse", "extract") graph.add_edge("extract", "review") graph.add_conditional_edges("review", route_after_review) graph.add_edge("format_output", END) graph.add_edge("human_review", "format_output")
graph.set_entry_point("parse")
Compile with checkpointing for persistence
memory = MemorySaver() app = graph.compile(checkpointer=memory, interrupt_before=["human_review"])
For production, replace MemorySaver with a durable backend:
Looping Patterns
LangGraph's ability to express loops is its strongest differentiator. A common pattern: iterative refinement where an agent generates output, a critic evaluates it, and the loop continues until quality thresholds are met.
Always add a maximum iteration count. Without it, a loop that never meets the quality threshold will run (and spend tokens) until you kill it.
Subgraphs for Complexity Management
When your graph exceeds 10 nodes, split it into subgraphs. Each subgraph is a compiled graph that acts as a single node in the parent graph:
This keeps each graph testable in isolation and prevents the "spaghetti graph" problem where conditional edges cross the entire workflow.
Streaming
LangGraph supports streaming at multiple levels — stream events from individual nodes, stream tokens from LLM calls within nodes, or stream state updates as the graph progresses:
This is critical for user-facing applications. Without streaming, the user stares at a spinner for 30+ seconds while multiple nodes execute.
When LangGraph Is Overkill
Don't use LangGraph for:
- Simple sequential tasks.
output = step3(step2(step1(input)))is clearer than a three-node graph. - Stateless transformations. If no node needs the output of a non-adjacent node, you don't need shared state.
- Single-shot tool use. A ReAct agent with tools doesn't need a custom graph unless you want custom control flow.
LangGraph is a state machine framework that happens to be useful for AI workflows. Use it when you need the state machine — conditional routing, loops, persistence, and human-in-the-loop. Skip it when you don't.
More in AI Engineering
Building Reliable LLM Evaluation Pipelines
How to evaluate LLM outputs systematically with automated metrics, LLM-as-judge, human review, and CI/CD integration for prompt regression testing.
Prompt Caching Strategies That Cut Your LLM Costs in Half
Practical caching strategies for LLM applications — from exact match to semantic similarity caching to provider-level prefix caching — with real cost/latency numbers.