BlogsAgentic AISystem DesignAI AgentsReActMulti-AgentTool UseAI ArchitectureLLM Engineering

Agentic Design Patterns: A System Design Guide for AI Engineers

Learn the core architectural patterns for building AI agents—ReAct, planning, reflection, tool use, and multi-agent systems—explained for engineers who think in system design.

Agentic Design Patterns: A System Design Guide for AI Engineers

You know how to design distributed systems. You understand microservices, message queues, and load balancers. Now you're building AI agents, and the architecture patterns feel unfamiliar.

Here's the thing: agentic AI has its own design patterns—recurring solutions to common problems, just like the Gang of Four patterns you learned for object-oriented design. Once you recognize them, agent architectures become much clearer.

This guide covers the core patterns you'll encounter when building production AI agents.


Pattern 1: ReAct (Reasoning + Acting)

The most fundamental agentic pattern.

ReAct stands for Reasoning and Acting. The agent alternates between thinking about what to do and actually doing it, learning from each action's results. If you want to understand how reasoning actually works in LLMs, that's the foundation this pattern builds on.

The Pattern

yaml
Loop until done: 1. Thought → "What should I do next and why?" 2. Action → Execute a tool or take an action 3. Observation → See what happened 4. (back to Thought with new information)

Example in Practice

User: "What's the weather in Tokyo and should I bring an umbrella?"

yaml
Thought: I need to check the weather in Tokyo. Let me use the weather API. Action: weather_api(location="Tokyo") Observation: {"temp": 18, "conditions": "light rain", "precipitation_chance": 80%} Thought: It's 18°C with light rain and 80% chance of precipitation. The user should definitely bring an umbrella. Action: respond("It's 18°C in Tokyo with light rain. Yes, bring an umbrella—there's an 80% chance of precipitation.")

When to Use ReAct

  • Exploratory tasks where you don't know the steps upfront
  • Interactive workflows that need to adapt based on results
  • Most general-purpose agents — it's the default choice for a reason

When to Avoid

  • Simple tasks that don't need reasoning loops (just call the tool directly)
  • When you need predictable, auditable execution paths
  • High-throughput scenarios where the reasoning overhead matters

System Design Considerations

yaml
┌─────────────────────────────────────────────────────┐ │ ReAct Agent │ │ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │ │ │ Thought │───►│ Action │───►│ Observation │ │ │ │ (LLM) │◄───│ (Tools) │◄───│ (Tool Results) │ │ │ └─────────┘ └─────────┘ └─────────────────┘ │ │ ▲ │ │ │ └──────────────────────────────┘ │ │ (loop until done) │ └─────────────────────────────────────────────────────┘

Token costs: Each loop iteration consumes tokens. For complex tasks, this can add up quickly.

Latency: Each thought-action-observation cycle requires an LLM call plus tool execution time.

Context window: The full trace (all thoughts, actions, observations) stays in context. Long-running agents can hit context window limits.


Pattern 2: Plan-and-Execute

Think first, act later.

Instead of interleaving reasoning and acting, this pattern separates them entirely. First, create a complete plan. Then, execute it step by step.

The Pattern

yaml
Phase 1 - Planning: Input: User goal Output: Ordered list of steps to achieve the goal Phase 2 - Execution: For each step in the plan: Execute the step Record the result (Optionally: replan if something unexpected happens)

Example in Practice

User: "Refactor the authentication module to use JWT instead of sessions"

yaml
# Planning Phase Plan: 1. Read current auth module to understand the implementation 2. Identify all session-related code paths 3. Design JWT token structure and flow 4. Update the login endpoint to issue JWTs 5. Update middleware to validate JWTs instead of sessions 6. Update logout to handle token invalidation 7. Update tests to use JWT-based auth 8. Run test suite to verify changes # Execution Phase Executing step 1: Reading auth module... Result: Found session logic in auth.py, middleware.py, and 3 route files Executing step 2: Identifying session code paths... Result: 12 locations use session.get('user_id'), 3 use session.clear() ... (continues through the plan)

When to Use Plan-and-Execute

  • Well-defined tasks where the steps are predictable
  • Complex multi-step operations that benefit from upfront organization
  • When you need auditability — the plan serves as documentation
  • Batch operations where you want to review before executing

When to Avoid

  • Exploratory tasks where you can't predict the path
  • Rapidly changing environments where plans become stale
  • Simple tasks that don't need formal planning

The Hybrid: Plan-and-Execute with Replanning

Most production systems use a hybrid approach:

yaml
1. Create initial plan 2. Execute step 3. Evaluate: Did it work as expected? - Yes → Continue to next step - No → Replan from current state 4. Repeat until goal achieved

This gives you the benefits of structured planning while handling unexpected situations.

System Design Considerations

yaml
┌─────────────────────────────────────────────────────┐ │ Plan-and-Execute Agent │ │ │ │ ┌──────────┐ ┌───────────────────────────┐ │ │ │ Planner │────────►│ Plan │ │ │ │ (LLM) │ │ [Step 1, Step 2, ...] │ │ │ └──────────┘ └───────────┬───────────────┘ │ │ ▲ │ │ │ │ (replan ▼ │ │ │ if needed) ┌───────────────┐ │ │ │ │ Executor │ │ │ │ │ (sequential) │ │ │ │ └───────┬───────┘ │ │ │ │ │ │ └──────────────────────────┘ │ │ (feedback) │ └─────────────────────────────────────────────────────┘

Advantages over ReAct:

  • More predictable execution
  • Plan can be reviewed before execution
  • Easier to estimate time/cost

Disadvantages:

  • Less adaptive to unexpected results
  • Planning itself can be wrong
  • May over-plan for simple tasks

Pattern 3: Tool Use Orchestration

Giving agents hands to interact with the world.

Tool use isn't a single pattern—it's a family of patterns for how agents discover, select, and execute tools.

Pattern 3a: Direct Tool Calling

The simplest approach: the LLM directly outputs structured tool calls.

yaml
User: "What's 2847 * 394?" LLM Output: tool: calculator arguments: {"expression": "2847 * 394"} System: Executes calculator, returns 1,121,718 LLM: "2847 × 394 = 1,121,718"

Modern LLMs (Claude, GPT-4) have native support for this pattern through function calling APIs.

Pattern 3b: Tool Selection with Routing

When you have many tools, add a selection layer:

yaml
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐ │ Query │────►│ Tool Router │────►│ Selected │ │ │ │ (classifier) │ │ Tool(s) │ └──────────────┘ └─────────────────┘ └──────────────┘

The router can be:

  • LLM-based: Ask the model which tool is relevant
  • Embedding-based: Match query to tool descriptions via semantic similarity using embeddings
  • Rule-based: Pattern matching for known query types

Pattern 3c: Tool Composition

Agents that chain tools together to accomplish complex tasks:

yaml
Goal: "Get me a summary of the top HN stories about AI" Tool chain: 1. hn_api.get_top_stories() → [story_ids] 2. For each story_id: hn_api.get_story(id) → {title, url} 3. Filter stories with "AI" in title 4. For each AI story: web_scraper.get_content(url) → content 5. llm.summarize(all_contents) → summary

Tool Design Principles

When designing tools for agents (similar principles apply to building RAG pipelines), focus on clarity and composability.

1. Clear, atomic operations

yaml
# Good: Single, clear purpose search_documents(query: str, limit: int) → List[Document] # Bad: Too many responsibilities search_and_maybe_summarize_or_email(query: str, summarize: bool, email: str | None)

2. Rich return types

yaml
# Good: Structured, informative { "results": [...], "total_count": 147, "query_time_ms": 23, "truncated": true } # Bad: Ambiguous "Found 10 results" (string)

3. Graceful error handling

yaml
# Good: Actionable error { "error": "rate_limited", "retry_after_seconds": 30, "message": "API rate limit exceeded. Try again in 30 seconds." } # Bad: Generic failure {"error": "Something went wrong"}

System Design Considerations

Tool explosion problem: As you add tools, the agent's decisions become harder. Consider:

  • Grouping tools into categories
  • Using a hierarchical selection process
  • Providing tool recommendations based on context

Latency: Each tool call adds latency. For real-time applications:

  • Parallelize independent tool calls
  • Cache frequently-used tool results
  • Consider tool call budgets

Pattern 4: Reflection and Self-Critique

The agent that reviews its own work.

Reflection adds a critique step where the agent evaluates its output and revises if needed.

The Pattern

yaml
1. Generate initial output 2. Critique: Evaluate the output against criteria 3. If issues found: Revise and go to step 2 4. If acceptable: Return final output

Example: Code Generation with Reflection

yaml
# Initial generation def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2) # Self-critique Critique: "This implementation is correct but has exponential time complexity. For large n, it will be extremely slow. Should use memoization or iterative approach." # Revision def fibonacci(n): if n <= 1: return n a, b = 0, 1 for _ in range(2, n + 1): a, b = b, a + b return b # Second critique Critique: "Correct, O(n) time complexity, O(1) space. Handles edge cases. Approved."

Reflection Strategies

Self-reflection: Same LLM evaluates its own output

  • Simpler to implement
  • Risk of blind spots (same biases)

Cross-reflection: Different prompt or model evaluates

  • More likely to catch issues
  • Higher cost (2x+ LLM calls)

Checklist-based: Evaluate against specific criteria

yaml
Checklist: - [ ] Code compiles/runs without errors - [ ] All edge cases handled - [ ] Follows project style guide - [ ] No security vulnerabilities - [ ] Adequate error handling

When to Use Reflection

  • High-stakes outputs where quality matters more than speed
  • Complex generation tasks (code, long-form content, analysis)
  • When you have clear evaluation criteria

System Design Considerations

yaml
┌───────────────────────────────────────────────────────┐ │ Reflection Loop │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────────┐ │ │ │ Generator │───►│ Output │───►│ Critic │ │ │ │ (LLM) │ │ │ │ (LLM) │ │ │ └───────────┘ └───────────┘ └───────┬───────┘ │ │ ▲ │ │ │ │ ┌──────────────────────┘ │ │ │ ▼ │ │ │ ┌──────────┐ │ │ │ │ Approved?│ │ │ │ └────┬─────┘ │ │ │ │ │ │ │ No │ Yes │ │ └─────────────┘ └──────► Final Output │ │ (revise with │ │ feedback) │ └───────────────────────────────────────────────────────┘

Cost: Reflection at least doubles your LLM costs. Budget accordingly.

Infinite loops: Set a maximum revision count (typically 2-3 iterations).

Critique quality: The critique is only as good as the criteria you provide.


Pattern 5: Multi-Agent Systems

Divide and conquer with specialized agents.

Instead of one agent doing everything, multiple specialized agents collaborate.

Pattern 5a: Sequential Pipeline

Agents process in order, each handing off to the next:

yaml
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Research │───►│ Analysis │───►│ Writer │───►│ Editor │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ └──────────┘ └──────────┘ └──────────┘ └──────────┘

Use case: Content generation pipelines, data processing workflows

Pattern 5b: Hierarchical (Manager-Worker)

A manager agent coordinates specialist workers:

yaml
┌──────────────┐ │ Manager │ │ Agent │ └──────┬───────┘ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Coder │ │ Reviewer │ │ Tester │ │ Agent │ │ Agent │ │ Agent │ └──────────┘ └──────────┘ └──────────┘

Use case: Complex projects requiring coordination, code development

Pattern 5c: Debate/Adversarial

Agents argue different positions, reaching consensus:

yaml
┌──────────┐ ┌──────────┐ │ Agent A │◄───────►│ Agent B │ │ (Pro) │ debate │ (Con) │ └────┬─────┘ └────┬─────┘ │ │ └────────┬───────────┘ ┌──────────┐ │ Judge │ │ Agent │ └──────────┘

Use case: Decision-making, exploring trade-offs, red-teaming

Pattern 5d: Collaborative Swarm

Agents work in parallel on subtasks, aggregating results:

yaml
┌─────────────┐ │ Task │ │ Decomposer │ └──────┬──────┘ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Worker 1 │ │ Worker 2 │ │ Worker 3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └─────────────────┼─────────────────┘ ┌─────────────┐ │ Aggregator │ └─────────────┘

Use case: Parallel research, comprehensive analysis, map-reduce style tasks

When to Use Multi-Agent

Good reasons:

  • Different subtasks need genuinely different capabilities
  • You want separation of concerns (security, auditability)
  • Parallel execution provides meaningful speedup
  • You need adversarial checking (red team vs. blue team)

Bad reasons:

  • "It sounds more advanced"
  • A single agent could handle it but multi-agent seems cooler
  • You haven't tried simpler approaches first

System Design Considerations

Communication protocol: How do agents share information?

  • Direct message passing
  • Shared memory/blackboard
  • Event-driven pub/sub

Coordination overhead: More agents = more coordination complexity

  • Who decides when to start/stop?
  • How do you handle conflicts?
  • What's the failure mode if one agent fails?

Cost multiplication: N agents can mean N× the LLM calls. Design carefully.


Pattern 6: Human-in-the-Loop

The agent that knows when to ask for help.

Not everything should be autonomous. This pattern builds explicit checkpoints where humans review or approve agent actions.

Checkpoint Strategies

Approval gates: Pause before high-stakes actions

yaml
Agent: I'm about to send this email to 10,000 customers: [email preview] Approve? [Yes / No / Edit]

Confidence thresholds: Escalate when uncertain

yaml
if confidence < 0.7: return ask_human(question, context) else: return proceed_with_action()

Periodic review: Check in at intervals

yaml
Every 5 actions: Show human: "Here's what I've done so far..." Get feedback: Continue / Adjust / Stop

Designing for Human Interaction

Make the state visible: Humans can't make good decisions without context

yaml
# Bad: Opaque request "Should I proceed? [Yes/No]" # Good: Transparent request "I've analyzed 47 support tickets and found 3 patterns: 1. Login failures (23 tickets) - appears to be OAuth issue 2. Slow dashboard (15 tickets) - database query N+1 problem 3. Missing data (9 tickets) - unclear cause, need investigation I recommend prioritizing #1 (OAuth fix) first. Should I create a detailed bug report for the OAuth issue? [Yes/No/Investigate #3 first]"

Provide good defaults: Don't make humans think unnecessarily

yaml
# Bad: Open-ended "What should the filename be?" # Good: Smart default with override "I'll save this as 'quarterly-report-2026-Q1.pdf' [Enter to confirm / Type new name]"

Batch similar decisions: Don't interrupt for every small choice

yaml
# Bad: 50 separate approval requests "Delete file1.tmp? [Y/N]" "Delete file2.tmp? [Y/N]" ... # Good: Batched review "Found 50 temporary files (total 2.3GB). Delete all? [Y/N/Review list]"

Choosing the Right Pattern

Here's a decision framework:

Scenario
Recommended Pattern
General-purpose assistant
ReAct
Well-defined multi-step task
Plan-and-Execute
High-quality content generation
Reflection
Task requiring diverse expertise
Multi-Agent (Hierarchical)
Real-time, simple tool use
Direct Tool Calling
High-stakes automation
Human-in-the-Loop + any pattern
Exploratory research
ReAct or Multi-Agent (Swarm)
Code generation/review
Reflection + Plan-and-Execute

Pattern Combinations

In practice, you'll combine patterns. Some common combinations:

ReAct + Reflection: Think-act-observe loop with periodic self-critique

Plan-and-Execute + Human-in-the-Loop: Plan approval before execution

Multi-Agent + Reflection: Agents review each other's work

Tool Use + Everything: Nearly all patterns involve tools


Implementation Checklist

When building an agentic system, consider:

Core loop:

  • What's the reasoning pattern? (ReAct, Plan-Execute, etc.)
  • What's the termination condition?
  • What's the maximum iterations/steps?

Tools:

  • What tools does the agent need?
  • How are tools documented for the agent?
  • How do you handle tool failures?

State management:

  • How is conversation history maintained?
  • What context survives across steps?
  • How do you handle context window limits?

Safety:

  • What actions require human approval?
  • What are the resource limits (time, cost, API calls)?
  • How do you handle adversarial inputs?

Observability:

  • How do you log agent decisions?
  • Can you replay/debug agent traces?
  • What metrics matter for this agent?

Key Takeaways

  • ReAct is the default pattern—interleave thinking and acting, adapting as you go
  • Plan-and-Execute works when you can predict the steps and want structure
  • Reflection improves output quality by adding self-critique loops
  • Multi-Agent makes sense for genuinely complex tasks requiring specialization
  • Human-in-the-Loop isn't a failure—it's a feature for high-stakes decisions
  • Combine patterns based on your specific requirements

The patterns aren't mutually exclusive. Most production agents combine several—a planning agent that uses ReAct for execution, with reflection for quality and human approval for risky actions.

These patterns represent the architectural building blocks of AI-native software—systems designed from the ground up around LLM capabilities.

Start simple. Add complexity only when you have evidence you need it.


Building your first agent? Understand what makes AI agentic, learn how RAG provides context, and protect against prompt injection.

Related Articles