Agentic AIState MachinesLangGraphOrchestration

Agent Orchestration Patterns: Decision Graphs & State Machines

2024-12-30 • 9 min read

Our agent needs to handle complex logic: 'Check if the document is valid. If yes, route to legal. If no, retry extraction. If extraction fails 3 times, escalate to a human.' How do we even code that without a mess of if-else spaghetti?

Welcome to the world of decision graphs and state machines! Linear scripts (do A, then B, then C) work for simple agents. But real business logic has branches, loops, retries, and conditional paths. That's when you need to model your agent's workflow as a graph—nodes are tasks, edges are transitions. Think of it like a flowchart, but executable. LangGraph excels at this. Pydantic AI handles it with functional composition. The key is moving from 'scripts' to 'state machines.'

State machine? That sounds like computer science theory, not practical AI.

State machines are EVERYWHERE in production systems! Traffic lights: Red → Green → Yellow → Red. That's a state machine. E-commerce orders: Pending → Processing → Shipped → Delivered. State machines formalize 'what can happen next' based on current state and events. For agents:

States: 'Extracting Data,' 'Validating Document,' 'Awaiting Human Review.'

Transitions: 'If extraction succeeds → Validate. If fails → Retry (max 3). If 3 failures → Escalate.'

Guards: Conditions on transitions ('only escalate if confidence < 0.8').

This eliminates ambiguity. Your agent can't randomly jump from 'Extracting' to 'Shipped.' The graph defines the ONLY legal paths. That's what production-grade agents need.

Okay, but how does this look in code? Like, with Pydantic AI?

In Pydantic AI, you model workflows functionally. Each step is a validated function that returns a structured result. You compose them:

from pydantic import BaseModel
from pydantic_ai import Agent

class DocumentState(BaseModel):
    status: Literal['pending', 'valid', 'invalid', 'escalated']
    text: str
    retry_count: int = 0

agent = Agent('openai:gpt-4', result_type=DocumentState)

async def process_document(doc: str) -> DocumentState:
    # Extract data
    result = await agent.run(f"Extract structured data from: {doc}")
    
    if result.data.status == 'valid':
        return route_to_legal(result.data)
    elif result.data.retry_count < 3:
        return await process_document(doc)  # Retry
    else:
        return escalate_to_human(result.data)

Notice the type-safe state? DocumentState is a Pydantic model. At every step, you KNOW the shape of your data. No 'undefined is not a function' nonsense. The recursion handles retries cleanly.

That's readable, but what about LangGraph? Everyone keeps mentioning it.

LangGraph is graph-native. You explicitly define nodes and edges:

from langgraph.graph import StateGraph
from typing import TypedDict

class State(TypedDict):
    document: str
    status: str
    retry_count: int

def extract_data(state: State) -> State:
    # Call LLM to extract
    return {**state, 'status': 'extracted'}

def validate(state: State) -> State:
    # Validation logic
    if is_valid:
        return {**state, 'status': 'valid'}
    else:
        return {**state, 'status': 'invalid', 'retry_count': state['retry_count'] + 1}

def should_retry(state: State) -> str:
    return 'extract' if state['retry_count'] < 3 else 'escalate'

workflow = StateGraph(State)
workflow.add_node('extract', extract_data)
workflow.add_node('validate', validate)
workflow.add_node('escalate', escalate_to_human)

workflow.add_edge('extract', 'validate')
workflow.add_conditional_edges('validate', should_retry, {'extract': 'extract', 'escalate': 'escalate'})

app = workflow.compile()

This is declarative. You SEE the workflow as a graph. LangGraph's Studio can visualize it. You can checkpoint (pause at any node), replay past runs, and debug time-travel style ('what if we had taken the other edge?'). That's invaluable for complex, multi-agent systems.

So LangGraph is better for complex workflows?

For graph-heavy, visual, stateful workflows, yes. LangGraph shines when:

1. You need human-in-the-loop: Pause the workflow at 'Awaiting Approval,' resume when human clicks 'Accept.'

2. You have parallel branches: Process 10 documents simultaneously, merge results.

3. You need debugging: Studio shows the exact path the agent took, which nodes failed, and why.

4. You're coordinating multiple agents: Legal Agent checks contract → Finance Agent approves budget → Ops Agent deploys. Each is a node.

But LangGraph has overhead—setup is verbose compared to Pydantic AI's functional style. For simple, linear agents with a few branches, Pydantic AI's approach is cleaner. The trade-off is simplicity vs. power.

What about error handling? If a node fails halfway through the graph?

This is where state machines REALLY help. You design error nodes:

workflow.add_node('retry_logic', handle_retry)
workflow.add_node('log_failure', log_and_alert)

workflow.add_conditional_edges(
    'extract',
    lambda state: 'retry' if state.get('error') and state['retry_count'] < 3 else 'fail',
    {'retry': 'retry_logic', 'fail': 'log_failure'}
)

Now if 'extract' fails, the graph routes to 'retry_logic' (up to 3 times), then 'log_failure' if all retries fail. You've formalized failure paths. No silent crashes. No agents stuck in infinite loops (unless you design bad guards—don't do that).

With Pydantic AI, you'd handle this with try/except and validated outputs:

try:
    result = await agent.run(prompt)
except ValidationError as e:
    if retry_count < 3:
        return await process_document(doc, retry_count + 1)
    else:
        return DocumentState(status='escalated', error=str(e))

Both work. LangGraph is explicit (graph shows error paths). Pydantic AI is concise (functional composition). Pick based on your team's style.

The "Million Dollar" Question

"Why can't we just use if-else statements? Why do we need graphs and state machines?"

Technical Reality Check

Why If-Else Fails for Complex Agents

1. Spaghetti code at scale. If-else chains become unmaintainable. 'If A, then B, unless C, but if D and not E, then F...' You lose track. Graphs make logic visible and testable.

2. No replay or debugging. With if-else, you rerun the entire agent to debug. With state machines + checkpointing, you replay from any node. 'Start from the Validate node with this state.' Massive time-saver.

3. No parallel execution. If-else is sequential. Graphs let you run nodes in parallel ('Process documents A, B, C simultaneously, merge results'). Critical for performance.

4. No human-in-the-loop. If-else can't pause mid-execution and wait for external input. State machines can: 'Pause at Approval node, wait for webhook, resume.'

5. No formal reasoning. With state machines, you can prove properties: 'The agent can never reach Shipped without passing Validate.' With if-else? Good luck auditing.

6. Testing is brutal. Testing if-else requires covering every branch. State machines let you test nodes independently and validate transitions separately. Modular testing.

Bottom line: If-else is fine for 3-step agents. For 10+ steps with retries, conditionals, and parallel work? You NEED graphs or you'll drown in complexity.

Alright, give me the production patterns. What do I need to know?

Here's the playbook:

1. Idempotency is non-negotiable. Agents retry. If 'send_email' gets called twice, you don't want two emails. Design tools to be idempotent (check 'email_sent' flag before sending). Or use message queues (Celery, RabbitMQ) with deduplication.

2. Separate orchestration from execution. Your graph/state machine is control flow. Tool invocation (MCP calls, API requests) is execution. Keep them decoupled. This lets you swap tools (e.g., switch from SendGrid to Mailgun) without rewriting workflows.

3. Checkpointing for long-running workflows. If your agent takes 10 minutes, checkpointing lets you resume after crashes. LangGraph supports this natively. Pydantic AI requires custom logic (save state to DB at each step).

4. Use typed state everywhere. Pydantic AI enforces this. LangGraph uses TypedDict (not as strict, but better than raw dicts). Typed state catches bugs at development time, not runtime.

5. Test guards and transitions independently. Unit test: 'If retry_count == 3, does should_retry return "escalate"?' Don't test the entire graph every time. Table-driven tests work great:

test_cases = [
    ({'retry_count': 0}, 'extract'),
    ({'retry_count': 3}, 'escalate'),
]
for state, expected in test_cases:
    assert should_retry(state) == expected

6. Log state transitions. Every time the agent moves from one node to another, log it: {timestamp, from_node, to_node, state_snapshot}. Critical for debugging production failures.

Last question: Can we combine Pydantic AI and LangGraph?

Absolutely! And it's actually the best-of-both-worlds pattern:

Use Pydantic AI for individual nodes (type-safe agent logic).

Use LangGraph for orchestration (graph workflow, visual debugging).

Example:

# Pydantic AI agent (type-safe)
agent = Agent('openai:gpt-4', result_type=DocumentState)

async def extract_node(state: State) -> State:
    result = await agent.run(f"Extract from {state['document']}")
    return {**state, 'data': result.data.model_dump()}

# LangGraph workflow
workflow.add_node('extract', extract_node)  # Pydantic AI inside!
workflow.add_node('validate', validate_node)

Now you get Pydantic's type safety at each node AND LangGraph's orchestration power. This is how we build production agents for clients with complex, validated workflows. For more on integrating agents with data systems, see our RAG Production Architecture post.

Technical Reality Check

What Decision Graphs Do NOT Solve

1. They don't fix bad business logic. If your workflow is 'Extract → Validate → Ship' but you forgot 'Check Inventory,' the graph won't magically add that node. YOU design the workflow.

2. They don't handle infinite loops. If you create a cycle without a guard ('Retry → Extract → Retry'), your agent will loop forever. Design termination conditions ('max_retries < 3').

3. They don't guarantee correctness. Just because the graph compiles doesn't mean it does what you want. You need integration tests that run the entire workflow end-to-end.

4. They don't parallelize for free. LangGraph supports parallel nodes, but YOU need to ensure thread-safety, avoid race conditions, and handle merge logic.

5. They don't eliminate LLM errors. If the LLM hallucinates at a node, the graph can retry or escalate, but it can't 'un-hallucinate.' You still need validation (Pydantic!) at each step.

Bottom line: State machines and graphs give you structure and control. They make complex workflows testable, debuggable, and maintainable. But they're not magic. You still need to design good workflows, validate data, and test thoroughly. Combine them with Pydantic AI's type safety and MCP's tool standardization, and you've got a production-grade agentic stack.

← Back to Agentic AI Engineering Series

← PreviousMCP in Production: Connecting AI Agents to Real Systems

← Back to Home