What is the ReAct pattern in AI agents?

ReAct (Reasoning + Acting) is a pattern where the agent alternates between thinking (reasoning about what to do) and acting (executing tools or actions). Each cycle produces a Thought, Action, and Observation, allowing the agent to adapt based on real-world feedback rather than planning everything upfront.

When should I use multi-agent vs single-agent architecture?

Use single agents for focused tasks with clear scope. Use multi-agent systems when you need specialized expertise (different agents for coding, research, review), parallel execution, or separation of concerns. Multi-agent adds coordination overhead—only use it when the complexity is justified.

What's the difference between ReAct and Plan-and-Execute patterns?

ReAct interleaves thinking and acting step-by-step, adapting after each action. Plan-and-Execute creates a complete plan upfront, then executes it. Use ReAct for exploratory tasks where the path isn't clear. Use Plan-and-Execute when you can predict the steps needed and want efficiency.

How do I implement the reflection pattern for AI agents?

Reflection adds a critique step after the agent produces output. The agent (or a separate critic agent) evaluates the work against criteria like correctness, completeness, and quality. If issues are found, the agent revises its output. This is essentially automated code review or editing for AI outputs.

You know how to design distributed systems. You understand microservices, message queues, and load balancers. Now you're building AI agents, and the architecture patterns feel unfamiliar.

Here's the thing: agentic AI has its own design patterns—recurring solutions to common problems, just like the Gang of Four patterns you learned for object-oriented design. Once you recognize them, agent architectures become much clearer.

This guide covers the core patterns you'll encounter when building production AI agents.

Pattern 1: ReAct (Reasoning + Acting)

The most fundamental agentic pattern.

ReAct stands for Reasoning and Acting. The agent alternates between thinking about what to do and actually doing it, learning from each action's results. If you want to understand how reasoning actually works in LLMs, that's the foundation this pattern builds on.

The Pattern

yaml

Loop until done:
Thought  → "What should I do next and why?"
Action  → Execute a tool or take an action
Observation → See what happened
(back to Thought with new information)

Example in Practice

User: "What's the weather in Tokyo and should I bring an umbrella?"

yaml

Thought: I need to check the weather in Tokyo. Let me use the weather API.

Action: weather_api(location="Tokyo")

Observation: {"temp": 18, "conditions": "light rain", "precipitation_chance": 80%}

Thought: It's 18°C with light rain and 80% chance of precipitation.
         The user should definitely bring an umbrella.

Action: respond("It's 18°C in Tokyo with light rain.
                 Yes, bring an umbrella—there's an 80% chance of precipitation.")

When to Use ReAct

Exploratory tasks where you don't know the steps upfront
Interactive workflows that need to adapt based on results
Most general-purpose agents — it's the default choice for a reason

When to Avoid

Simple tasks that don't need reasoning loops (just call the tool directly)
When you need predictable, auditable execution paths
High-throughput scenarios where the reasoning overhead matters

System Design Considerations

yaml

┌─────────────────────────────────────────────────────┐
│                    ReAct Agent                      │
│  ┌─────────┐    ┌─────────┐    ┌─────────────────┐  │
│  │ Thought │───►│ Action  │───►│   Observation   │  │
│  │ (LLM)   │◄───│ (Tools) │◄───│ (Tool Results)  │  │
│  └─────────┘    └─────────┘    └─────────────────┘  │
│       ▲                              │              │
│       └──────────────────────────────┘              │
│              (loop until done)                      │
└─────────────────────────────────────────────────────┘

Token costs: Each loop iteration consumes tokens. For complex tasks, this can add up quickly.

Latency: Each thought-action-observation cycle requires an LLM call plus tool execution time.

Context window: The full trace (all thoughts, actions, observations) stays in context. Long-running agents can hit context window limits.

Pattern 2: Plan-and-Execute

Think first, act later.

Instead of interleaving reasoning and acting, this pattern separates them entirely. First, create a complete plan. Then, execute it step by step.

The Pattern

yaml

Phase 1 - Planning:
  Input: User goal
  Output: Ordered list of steps to achieve the goal

Phase 2 - Execution:
  For each step in the plan:
    Execute the step
    Record the result
    (Optionally: replan if something unexpected happens)

Example in Practice

User: "Refactor the authentication module to use JWT instead of sessions"

yaml

# Planning Phase
Plan:
  1. Read current auth module to understand the implementation
  2. Identify all session-related code paths
  3. Design JWT token structure and flow
  4. Update the login endpoint to issue JWTs
  5. Update middleware to validate JWTs instead of sessions
  6. Update logout to handle token invalidation
  7. Update tests to use JWT-based auth
  8. Run test suite to verify changes

# Execution Phase
Executing step 1: Reading auth module...
  Result: Found session logic in auth.py, middleware.py, and 3 route files

Executing step 2: Identifying session code paths...
  Result: 12 locations use session.get('user_id'), 3 use session.clear()

... (continues through the plan)

When to Use Plan-and-Execute

Well-defined tasks where the steps are predictable
Complex multi-step operations that benefit from upfront organization
When you need auditability — the plan serves as documentation
Batch operations where you want to review before executing

When to Avoid

Exploratory tasks where you can't predict the path
Rapidly changing environments where plans become stale
Simple tasks that don't need formal planning

The Hybrid: Plan-and-Execute with Replanning

Most production systems use a hybrid approach:

yaml

1. Create initial plan
2. Execute step
3. Evaluate: Did it work as expected?
   - Yes → Continue to next step
   - No  → Replan from current state
4. Repeat until goal achieved

This gives you the benefits of structured planning while handling unexpected situations.

System Design Considerations

yaml

┌─────────────────────────────────────────────────────┐
│              Plan-and-Execute Agent                 │
│                                                     │
│  ┌──────────┐         ┌───────────────────────────┐ │
│  │ Planner  │────────►│         Plan              │ │
│  │  (LLM)   │         │  [Step 1, Step 2, ...]    │ │
│  └──────────┘         └───────────┬───────────────┘ │
│       ▲                           │                 │
│       │ (replan                   ▼                 │
│       │  if needed)      ┌───────────────┐          │
│       │                  │   Executor    │          │
│       │                  │  (sequential) │          │
│       │                  └───────┬───────┘          │
│       │                          │                  │
│       └──────────────────────────┘                  │
│                    (feedback)                       │
└─────────────────────────────────────────────────────┘

Advantages over ReAct:

More predictable execution
Plan can be reviewed before execution
Easier to estimate time/cost

Disadvantages:

Less adaptive to unexpected results
Planning itself can be wrong
May over-plan for simple tasks

Pattern 3: Tool Use Orchestration

Giving agents hands to interact with the world.

Tool use isn't a single pattern—it's a family of patterns for how agents discover, select, and execute tools.

Pattern 3a: Direct Tool Calling

The simplest approach: the LLM directly outputs structured tool calls.

yaml

User: "What's 2847 * 394?"

LLM Output:
  tool: calculator
  arguments: {"expression": "2847 * 394"}

System: Executes calculator, returns 1,121,718

LLM: "2847 × 394 = 1,121,718"

Modern LLMs (Claude, GPT-4) have native support for this pattern through function calling APIs.

Pattern 3b: Tool Selection with Routing

When you have many tools, add a selection layer:

yaml

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│   Query      │────►│  Tool Router    │────►│   Selected   │
│              │     │  (classifier)   │     │   Tool(s)    │
└──────────────┘     └─────────────────┘     └──────────────┘

The router can be:

LLM-based: Ask the model which tool is relevant
Embedding-based: Match query to tool descriptions via semantic similarity using embeddings
Rule-based: Pattern matching for known query types

Pattern 3c: Tool Composition

Agents that chain tools together to accomplish complex tasks:

yaml

Goal: "Get me a summary of the top HN stories about AI"

Tool chain:
hn_api.get_top_stories() → [story_ids]
For each story_id: hn_api.get_story(id) → {title, url}
Filter stories with "AI" in title
For each AI story: web_scraper.get_content(url) → content
llm.summarize(all_contents) → summary

Tool Design Principles

When designing tools for agents (similar principles apply to building RAG pipelines), focus on clarity and composability.

1. Clear, atomic operations

yaml

# Good: Single, clear purpose
search_documents(query: str, limit: int) → List[Document]

# Bad: Too many responsibilities
search_and_maybe_summarize_or_email(query: str, summarize: bool, email: str | None)

2. Rich return types

yaml

# Good: Structured, informative
{
  "results": [...],
  "total_count": 147,
  "query_time_ms": 23,
  "truncated": true
}

# Bad: Ambiguous
"Found 10 results" (string)

3. Graceful error handling

yaml

# Good: Actionable error
{
  "error": "rate_limited",
  "retry_after_seconds": 30,
  "message": "API rate limit exceeded. Try again in 30 seconds."
}

# Bad: Generic failure
{"error": "Something went wrong"}

System Design Considerations

Tool explosion problem: As you add tools, the agent's decisions become harder. Consider:

Grouping tools into categories
Using a hierarchical selection process
Providing tool recommendations based on context

Latency: Each tool call adds latency. For real-time applications:

Parallelize independent tool calls
Cache frequently-used tool results
Consider tool call budgets

Pattern 4: Reflection and Self-Critique

The agent that reviews its own work.

Reflection adds a critique step where the agent evaluates its output and revises if needed.

The Pattern

yaml

Generate initial output
Critique: Evaluate the output against criteria
If issues found: Revise and go to step 2
If acceptable: Return final output

Example: Code Generation with Reflection

yaml

# Initial generation
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Self-critique
Critique: "This implementation is correct but has exponential time complexity.
          For large n, it will be extremely slow. Should use memoization
          or iterative approach."

# Revision
def fibonacci(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b

# Second critique
Critique: "Correct, O(n) time complexity, O(1) space. Handles edge cases.
          Approved."

Reflection Strategies

Self-reflection: Same LLM evaluates its own output

Simpler to implement
Risk of blind spots (same biases)

Cross-reflection: Different prompt or model evaluates

More likely to catch issues
Higher cost (2x+ LLM calls)

Checklist-based: Evaluate against specific criteria

yaml

Checklist:
  - [ ] Code compiles/runs without errors
  - [ ] All edge cases handled
  - [ ] Follows project style guide
  - [ ] No security vulnerabilities
  - [ ] Adequate error handling

When to Use Reflection

High-stakes outputs where quality matters more than speed
Complex generation tasks (code, long-form content, analysis)
When you have clear evaluation criteria

System Design Considerations

yaml

┌───────────────────────────────────────────────────────┐
│                Reflection Loop                        │
│                                                       │
│  ┌───────────┐    ┌───────────┐    ┌───────────────┐  │
│  │ Generator │───►│  Output   │───►│    Critic     │  │
│  │   (LLM)   │    │           │    │    (LLM)      │  │
│  └───────────┘    └───────────┘    └───────┬───────┘  │
│       ▲                                     │         │
│       │              ┌──────────────────────┘         │
│       │              ▼                                │
│       │        ┌──────────┐                           │
│       │        │ Approved?│                           │
│       │        └────┬─────┘                           │
│       │             │                                 │
│       │    No       │        Yes                      │
│       └─────────────┘         └──────► Final Output   │
│      (revise with                                     │
│       feedback)                                       │
└───────────────────────────────────────────────────────┘

Cost: Reflection at least doubles your LLM costs. Budget accordingly.

Infinite loops: Set a maximum revision count (typically 2-3 iterations).

Critique quality: The critique is only as good as the criteria you provide.

Pattern 5: Multi-Agent Systems

Divide and conquer with specialized agents.

Instead of one agent doing everything, multiple specialized agents collaborate.

Pattern 5a: Sequential Pipeline

Agents process in order, each handing off to the next:

yaml

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Research │───►│ Analysis │───►│  Writer  │───►│  Editor  │
│  Agent   │    │  Agent   │    │  Agent   │    │  Agent   │
└──────────┘    └──────────┘    └──────────┘    └──────────┘

Use case: Content generation pipelines, data processing workflows

Pattern 5b: Hierarchical (Manager-Worker)

A manager agent coordinates specialist workers:

yaml

                    ┌──────────────┐
                    │   Manager    │
                    │    Agent     │
                    └──────┬───────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
     ┌──────────┐    ┌──────────┐    ┌──────────┐
     │  Coder   │    │ Reviewer │    │  Tester  │
     │  Agent   │    │  Agent   │    │  Agent   │
     └──────────┘    └──────────┘    └──────────┘

Use case: Complex projects requiring coordination, code development

Pattern 5c: Debate/Adversarial

Agents argue different positions, reaching consensus:

yaml

┌──────────┐         ┌──────────┐
│ Agent A  │◄───────►│ Agent B  │
│ (Pro)    │ debate  │ (Con)    │
└────┬─────┘         └────┬─────┘
     │                    │
     └────────┬───────────┘
              ▼
        ┌──────────┐
        │  Judge   │
        │  Agent   │
        └──────────┘

Use case: Decision-making, exploring trade-offs, red-teaming

Pattern 5d: Collaborative Swarm

Agents work in parallel on subtasks, aggregating results:

yaml

                    ┌─────────────┐
                    │    Task     │
                    │ Decomposer  │
                    └──────┬──────┘
                           │
        ┌──────────────────┼──────────────────┐
        ▼                  ▼                  ▼
  ┌──────────┐      ┌──────────┐      ┌──────────┐
  │ Worker 1 │      │ Worker 2 │      │ Worker 3 │
  └────┬─────┘      └────┬─────┘      └────┬─────┘
       │                 │                 │
       └─────────────────┼─────────────────┘
                         ▼
                  ┌─────────────┐
                  │ Aggregator  │
                  └─────────────┘

Use case: Parallel research, comprehensive analysis, map-reduce style tasks

When to Use Multi-Agent

Good reasons:

Different subtasks need genuinely different capabilities
You want separation of concerns (security, auditability)
Parallel execution provides meaningful speedup
You need adversarial checking (red team vs. blue team)

Bad reasons:

"It sounds more advanced"
A single agent could handle it but multi-agent seems cooler
You haven't tried simpler approaches first

System Design Considerations

Communication protocol: How do agents share information?

Direct message passing
Shared memory/blackboard
Event-driven pub/sub

Coordination overhead: More agents = more coordination complexity

Who decides when to start/stop?
How do you handle conflicts?
What's the failure mode if one agent fails?

Cost multiplication: N agents can mean N× the LLM calls. Design carefully.

Pattern 6: Human-in-the-Loop

The agent that knows when to ask for help.

Not everything should be autonomous. This pattern builds explicit checkpoints where humans review or approve agent actions.

Checkpoint Strategies

Approval gates: Pause before high-stakes actions

yaml

Agent: I'm about to send this email to 10,000 customers:
       [email preview]

       Approve? [Yes / No / Edit]

Confidence thresholds: Escalate when uncertain

yaml

if confidence < 0.7:
    return ask_human(question, context)
else:
    return proceed_with_action()

Periodic review: Check in at intervals

yaml

Every 5 actions:
    Show human: "Here's what I've done so far..."
    Get feedback: Continue / Adjust / Stop

Designing for Human Interaction

Make the state visible: Humans can't make good decisions without context

yaml

# Bad: Opaque request
"Should I proceed? [Yes/No]"

# Good: Transparent request
"I've analyzed 47 support tickets and found 3 patterns:
 1. Login failures (23 tickets) - appears to be OAuth issue
 2. Slow dashboard (15 tickets) - database query N+1 problem
 3. Missing data (9 tickets) - unclear cause, need investigation

 I recommend prioritizing #1 (OAuth fix) first.
 Should I create a detailed bug report for the OAuth issue? [Yes/No/Investigate #3 first]"

Provide good defaults: Don't make humans think unnecessarily

yaml

# Bad: Open-ended
"What should the filename be?"

# Good: Smart default with override
"I'll save this as 'quarterly-report-2026-Q1.pdf' [Enter to confirm / Type new name]"

Batch similar decisions: Don't interrupt for every small choice

yaml

# Bad: 50 separate approval requests
"Delete file1.tmp? [Y/N]"
"Delete file2.tmp? [Y/N]"
...

# Good: Batched review
"Found 50 temporary files (total 2.3GB). Delete all? [Y/N/Review list]"

Choosing the Right Pattern

Here's a decision framework:

Scenario	Recommended Pattern
General-purpose assistant	ReAct
Well-defined multi-step task	Plan-and-Execute
High-quality content generation	Reflection
Task requiring diverse expertise	Multi-Agent (Hierarchical)
Real-time, simple tool use	Direct Tool Calling
High-stakes automation	Human-in-the-Loop + any pattern
Exploratory research	ReAct or Multi-Agent (Swarm)
Code generation/review	Reflection + Plan-and-Execute

Pattern Combinations

In practice, you'll combine patterns. Some common combinations:

ReAct + Reflection: Think-act-observe loop with periodic self-critique

Plan-and-Execute + Human-in-the-Loop: Plan approval before execution

Multi-Agent + Reflection: Agents review each other's work

Tool Use + Everything: Nearly all patterns involve tools

Implementation Checklist

When building an agentic system, consider:

Core loop:

What's the reasoning pattern? (ReAct, Plan-Execute, etc.)
What's the termination condition?
What's the maximum iterations/steps?

Tools:

What tools does the agent need?
How are tools documented for the agent?
How do you handle tool failures?

State management:

How is conversation history maintained?
What context survives across steps?
How do you handle context window limits?

Safety:

What actions require human approval?
What are the resource limits (time, cost, API calls)?
How do you handle adversarial inputs?

Observability:

How do you log agent decisions?
Can you replay/debug agent traces?
What metrics matter for this agent?

Key Takeaways

ReAct is the default pattern—interleave thinking and acting, adapting as you go
Plan-and-Execute works when you can predict the steps and want structure
Reflection improves output quality by adding self-critique loops
Multi-Agent makes sense for genuinely complex tasks requiring specialization
Human-in-the-Loop isn't a failure—it's a feature for high-stakes decisions
Combine patterns based on your specific requirements

The patterns aren't mutually exclusive. Most production agents combine several—a planning agent that uses ReAct for execution, with reflection for quality and human approval for risky actions.

These patterns represent the architectural building blocks of AI-native software—systems designed from the ground up around LLM capabilities.

Start simple. Add complexity only when you have evidence you need it.

Building your first agent? Understand what makes AI agentic, learn how RAG provides context, and protect against prompt injection. For the latest on the models powering these patterns, see Meta Llama 5 and OpenAI's superapp consolidation.

Pattern 1: ReAct (Reasoning + Acting)

The Pattern

Example in Practice

When to Use ReAct

When to Avoid

System Design Considerations

Pattern 2: Plan-and-Execute

The Pattern

Example in Practice

When to Use Plan-and-Execute

When to Avoid

The Hybrid: Plan-and-Execute with Replanning

System Design Considerations

Pattern 3: Tool Use Orchestration

Pattern 3a: Direct Tool Calling

Pattern 3b: Tool Selection with Routing

Pattern 3c: Tool Composition

Tool Design Principles

System Design Considerations

Pattern 4: Reflection and Self-Critique

The Pattern

Example: Code Generation with Reflection

Reflection Strategies

When to Use Reflection

System Design Considerations

Pattern 5: Multi-Agent Systems

Pattern 5a: Sequential Pipeline

Pattern 5b: Hierarchical (Manager-Worker)

Pattern 5c: Debate/Adversarial

Pattern 5d: Collaborative Swarm

When to Use Multi-Agent

System Design Considerations

Pattern 6: Human-in-the-Loop

Checkpoint Strategies

Designing for Human Interaction

Choosing the Right Pattern

Pattern Combinations

Implementation Checklist

Key Takeaways

Related Articles

How to Design a Claude Skill

What is MCP? The Model Context Protocol Explained for Engineers

OpenAI's Superapp: ChatGPT, Codex, and Atlas Merge Into One