Designing Complex AI Workflows with LangGraph

LangGraph models AI workflows as directed graphs with state. Nodes are functions, edges are transitions, and state persists across the entire execution. This is a fundamentally different model from chain-based frameworks — and it matters when your workflow needs loops, conditional branching, or human approval steps.

When LangGraph Makes Sense

LangGraph adds complexity. Before adopting it, make sure you need it:

Linear chains (A → B → C): Use LangChain's LCEL or plain function composition. A graph is unnecessary overhead.
Single agent with tools: Use create_react_agent from LangGraph's prebuilt module, or any tool-calling framework. You don't need a custom graph.
Conditional branching (if X then A else B): This is where LangGraph starts earning its complexity.
Loops (retry, iterative refinement, human-in-the-loop): This is LangGraph's sweet spot.

State Design: The Foundation

Every LangGraph workflow starts with a state class. This is the single source of truth that flows through the entire graph. Get the state wrong, and you'll be fighting the framework.

python

Key design principle: state fields should represent data, not control flow. Don't add a current_step field — that's what the graph edges handle. Don't add a should_retry boolean — use a conditional edge that checks the quality score.

The Annotated type with a reducer function controls how state updates merge. add_messages appends to the message list. For other fields, the default behavior is to overwrite. Custom reducers let you implement accumulation, deduplication, or other merge strategies.

Building a Real Workflow: Document Processing Pipeline

Let's build a document processing pipeline that extracts structured data, reviews quality, and routes to human approval when confidence is low.

python

def review_quality(state: DocumentProcessingState) -> dict: sections = state["sections"] response = llm.invoke([ {"role": "system", "content": "Rate the quality of this extraction on a scale of 0-1. " "Check for: completeness, accuracy of entity extraction, summary coherence. " "Return JSON: {"score": float, "issues": [str]}"}, {"role": "user", "content": json.dumps(sections)}, ]) result = json.loads(response.content) return { "quality_score": result["score"], "needs_approval": result["score"] < 0.8, }

def format_output(state: DocumentProcessingState) -> dict: return {"output": json.dumps(state["sections"], indent=2)}

def human_review(state: DocumentProcessingState) -> dict: # This node pauses execution and waits for human input # The checkpoint system persists the state return {"needs_approval": True}

def route_after_review(state: DocumentProcessingState) -> str: if state["quality_score"] >= 0.8: return "format_output" return "human_review"

Build the graph

graph = StateGraph(DocumentProcessingState)

graph.add_node("parse", parse_document) graph.add_node("extract", extract_sections) graph.add_node("review", review_quality) graph.add_node("format_output", format_output) graph.add_node("human_review", human_review)

graph.add_edge("parse", "extract") graph.add_edge("extract", "review") graph.add_conditional_edges("review", route_after_review) graph.add_edge("format_output", END) graph.add_edge("human_review", "format_output")

graph.set_entry_point("parse")

Compile with checkpointing for persistence

memory = MemorySaver() app = graph.compile(checkpointer=memory, interrupt_before=["human_review"])

For production, replace MemorySaver with a durable backend:

python

Looping Patterns

LangGraph's ability to express loops is its strongest differentiator. A common pattern: iterative refinement where an agent generates output, a critic evaluates it, and the loop continues until quality thresholds are met.

python

Always add a maximum iteration count. Without it, a loop that never meets the quality threshold will run (and spend tokens) until you kill it.

Subgraphs for Complexity Management

When your graph exceeds 10 nodes, split it into subgraphs. Each subgraph is a compiled graph that acts as a single node in the parent graph:

python

This keeps each graph testable in isolation and prevents the "spaghetti graph" problem where conditional edges cross the entire workflow.

Streaming

LangGraph supports streaming at multiple levels — stream events from individual nodes, stream tokens from LLM calls within nodes, or stream state updates as the graph progresses:

python

This is critical for user-facing applications. Without streaming, the user stares at a spinner for 30+ seconds while multiple nodes execute.

When LangGraph Is Overkill

Don't use LangGraph for:

Simple sequential tasks. output = step3(step2(step1(input))) is clearer than a three-node graph.
Stateless transformations. If no node needs the output of a non-adjacent node, you don't need shared state.
Single-shot tool use. A ReAct agent with tools doesn't need a custom graph unless you want custom control flow.

LangGraph is a state machine framework that happens to be useful for AI workflows. Use it when you need the state machine — conditional routing, loops, persistence, and human-in-the-loop. Skip it when you don't.

Designing Complex AI Workflows with LangGraph

Designing Complex AI Workflows with LangGraph

When LangGraph Makes Sense

State Design: The Foundation

Building a Real Workflow: Document Processing Pipeline

We build this end-to-end in the cohort.

Build the graph

Compile with checkpointing for persistence

Looping Patterns

Subgraphs for Complexity Management

Streaming

When LangGraph Is Overkill

More in AI Engineering

Building Reliable LLM Evaluation Pipelines

Prompt Caching Strategies That Cut Your LLM Costs in Half

become an engineering leader