Agentic Design Patterns: A System Design Guide for AI Engineers
Learn the core architectural patterns for building AI agents—ReAct, planning, reflection, tool use, and multi-agent systems—explained for engineers who think in system design.

You know how to design distributed systems. You understand microservices, message queues, and load balancers. Now you're building AI agents, and the architecture patterns feel unfamiliar.
Here's the thing: agentic AI has its own design patterns—recurring solutions to common problems, just like the Gang of Four patterns you learned for object-oriented design. Once you recognize them, agent architectures become much clearer.
This guide covers the core patterns you'll encounter when building production AI agents.
Pattern 1: ReAct (Reasoning + Acting)
The most fundamental agentic pattern.
ReAct stands for Reasoning and Acting. The agent alternates between thinking about what to do and actually doing it, learning from each action's results. If you want to understand how reasoning actually works in LLMs, that's the foundation this pattern builds on.
The Pattern
Loop until done:
1. Thought → "What should I do next and why?"
2. Action → Execute a tool or take an action
3. Observation → See what happened
4. (back to Thought with new information)Example in Practice
User: "What's the weather in Tokyo and should I bring an umbrella?"
Thought: I need to check the weather in Tokyo. Let me use the weather API.
Action: weather_api(location="Tokyo")
Observation: {"temp": 18, "conditions": "light rain", "precipitation_chance": 80%}
Thought: It's 18°C with light rain and 80% chance of precipitation.
The user should definitely bring an umbrella.
Action: respond("It's 18°C in Tokyo with light rain.
Yes, bring an umbrella—there's an 80% chance of precipitation.")When to Use ReAct
- Exploratory tasks where you don't know the steps upfront
- Interactive workflows that need to adapt based on results
- Most general-purpose agents — it's the default choice for a reason
When to Avoid
- Simple tasks that don't need reasoning loops (just call the tool directly)
- When you need predictable, auditable execution paths
- High-throughput scenarios where the reasoning overhead matters
System Design Considerations
┌─────────────────────────────────────────────────────┐
│ ReAct Agent │
│ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │
│ │ Thought │───►│ Action │───►│ Observation │ │
│ │ (LLM) │◄───│ (Tools) │◄───│ (Tool Results) │ │
│ └─────────┘ └─────────┘ └─────────────────┘ │
│ ▲ │ │
│ └──────────────────────────────┘ │
│ (loop until done) │
└─────────────────────────────────────────────────────┘Token costs: Each loop iteration consumes tokens. For complex tasks, this can add up quickly.
Latency: Each thought-action-observation cycle requires an LLM call plus tool execution time.
Context window: The full trace (all thoughts, actions, observations) stays in context. Long-running agents can hit context window limits.
Pattern 2: Plan-and-Execute
Think first, act later.
Instead of interleaving reasoning and acting, this pattern separates them entirely. First, create a complete plan. Then, execute it step by step.
The Pattern
Phase 1 - Planning:
Input: User goal
Output: Ordered list of steps to achieve the goal
Phase 2 - Execution:
For each step in the plan:
Execute the step
Record the result
(Optionally: replan if something unexpected happens)Example in Practice
User: "Refactor the authentication module to use JWT instead of sessions"
# Planning Phase
Plan:
1. Read current auth module to understand the implementation
2. Identify all session-related code paths
3. Design JWT token structure and flow
4. Update the login endpoint to issue JWTs
5. Update middleware to validate JWTs instead of sessions
6. Update logout to handle token invalidation
7. Update tests to use JWT-based auth
8. Run test suite to verify changes
# Execution Phase
Executing step 1: Reading auth module...
Result: Found session logic in auth.py, middleware.py, and 3 route files
Executing step 2: Identifying session code paths...
Result: 12 locations use session.get('user_id'), 3 use session.clear()
... (continues through the plan)When to Use Plan-and-Execute
- Well-defined tasks where the steps are predictable
- Complex multi-step operations that benefit from upfront organization
- When you need auditability — the plan serves as documentation
- Batch operations where you want to review before executing
When to Avoid
- Exploratory tasks where you can't predict the path
- Rapidly changing environments where plans become stale
- Simple tasks that don't need formal planning
The Hybrid: Plan-and-Execute with Replanning
Most production systems use a hybrid approach:
1. Create initial plan
2. Execute step
3. Evaluate: Did it work as expected?
- Yes → Continue to next step
- No → Replan from current state
4. Repeat until goal achievedThis gives you the benefits of structured planning while handling unexpected situations.
System Design Considerations
┌─────────────────────────────────────────────────────┐
│ Plan-and-Execute Agent │
│ │
│ ┌──────────┐ ┌───────────────────────────┐ │
│ │ Planner │────────►│ Plan │ │
│ │ (LLM) │ │ [Step 1, Step 2, ...] │ │
│ └──────────┘ └───────────┬───────────────┘ │
│ ▲ │ │
│ │ (replan ▼ │
│ │ if needed) ┌───────────────┐ │
│ │ │ Executor │ │
│ │ │ (sequential) │ │
│ │ └───────┬───────┘ │
│ │ │ │
│ └──────────────────────────┘ │
│ (feedback) │
└─────────────────────────────────────────────────────┘Advantages over ReAct:
- More predictable execution
- Plan can be reviewed before execution
- Easier to estimate time/cost
Disadvantages:
- Less adaptive to unexpected results
- Planning itself can be wrong
- May over-plan for simple tasks
Pattern 3: Tool Use Orchestration
Giving agents hands to interact with the world.
Tool use isn't a single pattern—it's a family of patterns for how agents discover, select, and execute tools.
Pattern 3a: Direct Tool Calling
The simplest approach: the LLM directly outputs structured tool calls.
User: "What's 2847 * 394?"
LLM Output:
tool: calculator
arguments: {"expression": "2847 * 394"}
System: Executes calculator, returns 1,121,718
LLM: "2847 × 394 = 1,121,718"Modern LLMs (Claude, GPT-4) have native support for this pattern through function calling APIs.
Pattern 3b: Tool Selection with Routing
When you have many tools, add a selection layer:
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Query │────►│ Tool Router │────►│ Selected │
│ │ │ (classifier) │ │ Tool(s) │
└──────────────┘ └─────────────────┘ └──────────────┘The router can be:
- LLM-based: Ask the model which tool is relevant
- Embedding-based: Match query to tool descriptions via semantic similarity using embeddings
- Rule-based: Pattern matching for known query types
Pattern 3c: Tool Composition
Agents that chain tools together to accomplish complex tasks:
Goal: "Get me a summary of the top HN stories about AI"
Tool chain:
1. hn_api.get_top_stories() → [story_ids]
2. For each story_id: hn_api.get_story(id) → {title, url}
3. Filter stories with "AI" in title
4. For each AI story: web_scraper.get_content(url) → content
5. llm.summarize(all_contents) → summaryTool Design Principles
When designing tools for agents (similar principles apply to building RAG pipelines), focus on clarity and composability.
1. Clear, atomic operations
# Good: Single, clear purpose
search_documents(query: str, limit: int) → List[Document]
# Bad: Too many responsibilities
search_and_maybe_summarize_or_email(query: str, summarize: bool, email: str | None)2. Rich return types
# Good: Structured, informative
{
"results": [...],
"total_count": 147,
"query_time_ms": 23,
"truncated": true
}
# Bad: Ambiguous
"Found 10 results" (string)3. Graceful error handling
# Good: Actionable error
{
"error": "rate_limited",
"retry_after_seconds": 30,
"message": "API rate limit exceeded. Try again in 30 seconds."
}
# Bad: Generic failure
{"error": "Something went wrong"}System Design Considerations
Tool explosion problem: As you add tools, the agent's decisions become harder. Consider:
- Grouping tools into categories
- Using a hierarchical selection process
- Providing tool recommendations based on context
Latency: Each tool call adds latency. For real-time applications:
- Parallelize independent tool calls
- Cache frequently-used tool results
- Consider tool call budgets
Pattern 4: Reflection and Self-Critique
The agent that reviews its own work.
Reflection adds a critique step where the agent evaluates its output and revises if needed.
The Pattern
1. Generate initial output
2. Critique: Evaluate the output against criteria
3. If issues found: Revise and go to step 2
4. If acceptable: Return final outputExample: Code Generation with Reflection
# Initial generation
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
# Self-critique
Critique: "This implementation is correct but has exponential time complexity.
For large n, it will be extremely slow. Should use memoization
or iterative approach."
# Revision
def fibonacci(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
# Second critique
Critique: "Correct, O(n) time complexity, O(1) space. Handles edge cases.
Approved."Reflection Strategies
Self-reflection: Same LLM evaluates its own output
- Simpler to implement
- Risk of blind spots (same biases)
Cross-reflection: Different prompt or model evaluates
- More likely to catch issues
- Higher cost (2x+ LLM calls)
Checklist-based: Evaluate against specific criteria
Checklist:
- [ ] Code compiles/runs without errors
- [ ] All edge cases handled
- [ ] Follows project style guide
- [ ] No security vulnerabilities
- [ ] Adequate error handlingWhen to Use Reflection
- High-stakes outputs where quality matters more than speed
- Complex generation tasks (code, long-form content, analysis)
- When you have clear evaluation criteria
System Design Considerations
┌───────────────────────────────────────────────────────┐
│ Reflection Loop │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────────┐ │
│ │ Generator │───►│ Output │───►│ Critic │ │
│ │ (LLM) │ │ │ │ (LLM) │ │
│ └───────────┘ └───────────┘ └───────┬───────┘ │
│ ▲ │ │
│ │ ┌──────────────────────┘ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ │ │ Approved?│ │
│ │ └────┬─────┘ │
│ │ │ │
│ │ No │ Yes │
│ └─────────────┘ └──────► Final Output │
│ (revise with │
│ feedback) │
└───────────────────────────────────────────────────────┘Cost: Reflection at least doubles your LLM costs. Budget accordingly.
Infinite loops: Set a maximum revision count (typically 2-3 iterations).
Critique quality: The critique is only as good as the criteria you provide.
Pattern 5: Multi-Agent Systems
Divide and conquer with specialized agents.
Instead of one agent doing everything, multiple specialized agents collaborate.
Pattern 5a: Sequential Pipeline
Agents process in order, each handing off to the next:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Research │───►│ Analysis │───►│ Writer │───►│ Editor │
│ Agent │ │ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘ └──────────┘Use case: Content generation pipelines, data processing workflows
Pattern 5b: Hierarchical (Manager-Worker)
A manager agent coordinates specialist workers:
┌──────────────┐
│ Manager │
│ Agent │
└──────┬───────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Coder │ │ Reviewer │ │ Tester │
│ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘Use case: Complex projects requiring coordination, code development
Pattern 5c: Debate/Adversarial
Agents argue different positions, reaching consensus:
┌──────────┐ ┌──────────┐
│ Agent A │◄───────►│ Agent B │
│ (Pro) │ debate │ (Con) │
└────┬─────┘ └────┬─────┘
│ │
└────────┬───────────┘
▼
┌──────────┐
│ Judge │
│ Agent │
└──────────┘Use case: Decision-making, exploring trade-offs, red-teaming
Pattern 5d: Collaborative Swarm
Agents work in parallel on subtasks, aggregating results:
┌─────────────┐
│ Task │
│ Decomposer │
└──────┬──────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────────────┼─────────────────┘
▼
┌─────────────┐
│ Aggregator │
└─────────────┘Use case: Parallel research, comprehensive analysis, map-reduce style tasks
When to Use Multi-Agent
Good reasons:
- Different subtasks need genuinely different capabilities
- You want separation of concerns (security, auditability)
- Parallel execution provides meaningful speedup
- You need adversarial checking (red team vs. blue team)
Bad reasons:
- "It sounds more advanced"
- A single agent could handle it but multi-agent seems cooler
- You haven't tried simpler approaches first
System Design Considerations
Communication protocol: How do agents share information?
- Direct message passing
- Shared memory/blackboard
- Event-driven pub/sub
Coordination overhead: More agents = more coordination complexity
- Who decides when to start/stop?
- How do you handle conflicts?
- What's the failure mode if one agent fails?
Cost multiplication: N agents can mean N× the LLM calls. Design carefully.
Pattern 6: Human-in-the-Loop
The agent that knows when to ask for help.
Not everything should be autonomous. This pattern builds explicit checkpoints where humans review or approve agent actions.
Checkpoint Strategies
Approval gates: Pause before high-stakes actions
Agent: I'm about to send this email to 10,000 customers:
[email preview]
Approve? [Yes / No / Edit]Confidence thresholds: Escalate when uncertain
if confidence < 0.7:
return ask_human(question, context)
else:
return proceed_with_action()Periodic review: Check in at intervals
Every 5 actions:
Show human: "Here's what I've done so far..."
Get feedback: Continue / Adjust / StopDesigning for Human Interaction
Make the state visible: Humans can't make good decisions without context
# Bad: Opaque request
"Should I proceed? [Yes/No]"
# Good: Transparent request
"I've analyzed 47 support tickets and found 3 patterns:
1. Login failures (23 tickets) - appears to be OAuth issue
2. Slow dashboard (15 tickets) - database query N+1 problem
3. Missing data (9 tickets) - unclear cause, need investigation
I recommend prioritizing #1 (OAuth fix) first.
Should I create a detailed bug report for the OAuth issue? [Yes/No/Investigate #3 first]"Provide good defaults: Don't make humans think unnecessarily
# Bad: Open-ended
"What should the filename be?"
# Good: Smart default with override
"I'll save this as 'quarterly-report-2026-Q1.pdf' [Enter to confirm / Type new name]"Batch similar decisions: Don't interrupt for every small choice
# Bad: 50 separate approval requests
"Delete file1.tmp? [Y/N]"
"Delete file2.tmp? [Y/N]"
...
# Good: Batched review
"Found 50 temporary files (total 2.3GB). Delete all? [Y/N/Review list]"Choosing the Right Pattern
Here's a decision framework:
Scenario | Recommended Pattern |
|---|---|
General-purpose assistant | ReAct |
Well-defined multi-step task | Plan-and-Execute |
High-quality content generation | Reflection |
Task requiring diverse expertise | Multi-Agent (Hierarchical) |
Real-time, simple tool use | Direct Tool Calling |
High-stakes automation | Human-in-the-Loop + any pattern |
Exploratory research | ReAct or Multi-Agent (Swarm) |
Code generation/review | Reflection + Plan-and-Execute |
Pattern Combinations
In practice, you'll combine patterns. Some common combinations:
ReAct + Reflection: Think-act-observe loop with periodic self-critique
Plan-and-Execute + Human-in-the-Loop: Plan approval before execution
Multi-Agent + Reflection: Agents review each other's work
Tool Use + Everything: Nearly all patterns involve tools
Implementation Checklist
When building an agentic system, consider:
Core loop:
- What's the reasoning pattern? (ReAct, Plan-Execute, etc.)
- What's the termination condition?
- What's the maximum iterations/steps?
Tools:
- What tools does the agent need?
- How are tools documented for the agent?
- How do you handle tool failures?
State management:
- How is conversation history maintained?
- What context survives across steps?
- How do you handle context window limits?
Safety:
- What actions require human approval?
- What are the resource limits (time, cost, API calls)?
- How do you handle adversarial inputs?
Observability:
- How do you log agent decisions?
- Can you replay/debug agent traces?
- What metrics matter for this agent?
Key Takeaways
- ReAct is the default pattern—interleave thinking and acting, adapting as you go
- Plan-and-Execute works when you can predict the steps and want structure
- Reflection improves output quality by adding self-critique loops
- Multi-Agent makes sense for genuinely complex tasks requiring specialization
- Human-in-the-Loop isn't a failure—it's a feature for high-stakes decisions
- Combine patterns based on your specific requirements
The patterns aren't mutually exclusive. Most production agents combine several—a planning agent that uses ReAct for execution, with reflection for quality and human approval for risky actions.
These patterns represent the architectural building blocks of AI-native software—systems designed from the ground up around LLM capabilities.
Start simple. Add complexity only when you have evidence you need it.
Building your first agent? Understand what makes AI agentic, learn how RAG provides context, and protect against prompt injection.


