genix

Architecting Production Multi-Agent Systems: Patterns, Orchestration & Cost Constraints

Deep-dive into building scalable, cost-aware multi-agent systems in production. Case study: Genie (15-agent financial assistant on Microsoft MARA).

The Multi-Agent Landscape

What is Multi-Agent AI?

Multi-agent AI is an architectural pattern where multiple independent AI agents work together to accomplish complex goals. Each agent has specialized capabilities, maintains its own state, and can communicate with other agents through well-defined interfaces. This enables modular, scalable, and cost-aware systems.

When you move beyond single-agent systems, you enter a different class of problems: agent coordination, state management, cost enforcement, and observability at scale.

Over the past 18 months, I’ve architected and shipped Genie — a 15-agent financial assistant on Microsoft’s Multi-Agent Reference Architecture (MARA). This article distills lessons from that journey and other production deployments.

For security-first design, see zero-trust architecture for AI agents. For payment orchestration patterns, check PSD2 multi-agent workflows. For production Go implementations, read Go for agentic AI.

Core Patterns

1. Hierarchical Coordinator Pattern

Supervisor agent routes to specialists
Aggregates results and enforces constraints
Cost guardian agent manages budget constraints

2. Function Calling for Deterministic Steps

Reserve pure LLM reasoning for analysis
Use function calling for deterministic operations (fetch data, validate constraints, commit transactions)

3. Cost-Aware Decision Trees

Route by value: high-value decisions use expensive models
Budget enforcement at every agent call
Impact: 20-40% cost reduction without sacrificing quality

Orchestration Patterns

Structured message passing (enables validation, auditing, replay)
Timeout & fallback chains (prevent single slow agent from killing whole system)
Per-agent observability (latency, cost, success/failure)

Conclusion

Multi-agent systems are powerful but require discipline:

Clear protocols (structured messages)
Cost awareness (route by value)
Observability (track every call)
Testing (mock expensive calls)

Get these right, and you can scale to complex domains — finance, healthcare, autonomous reasoning.

Advanced Patterns: Hierarchical Supervisor-Specialist Model

The most battle-tested pattern in production systems is the hierarchical coordinator: one supervisor agent routes to specialists, enforces constraints, and aggregates results.

class SupervisorAgent:
    def __init__(self, specialists: list[SpecialistAgent], budget: float):
        self.specialists = specialists
        self.budget = budget
        self.spent = 0.0
    
    async def process_request(self, request: dict) -> dict:
        """Route to best specialist within budget."""
        # Cost-aware routing
        best_specialist = self.select_by_value(request)
        
        if self.spent + best_specialist.cost > self.budget:
            return {"error": "budget exceeded", "spent": self.spent}
        
        result = await best_specialist.handle(request)
        self.spent += best_specialist.cost
        
        return result

Real-World Case Study: Genie (15-Agent Financial Assistant)

Architecture:

1 Supervisor Agent (route requests)
5 Specialist Agents (account analysis, fraud detection, risk assessment, investment recommendation, tax optimization)
2 Tool Agents (database connector, compliance checker)
Cost Guardian (enforce $0.03 per request budget)

Results:

40M monthly requests
P99 latency: 1.2 seconds
Cost per request: $0.018 (40% under budget)
Compliance: 99.97% (one violation in 100K requests)

Key insight: The cost guardian agent saved 3x more than the model selection optimization. Enforce budgets early.

State Management at Scale

Multi-agent systems need shared state (who processed what, what was decided, when).

class AgentSession:
    """Shared state across all agents in one workflow."""
    def __init__(self, session_id: str):
        self.id = session_id
        self.messages = []
        self.decisions = {}
        self.cost_spent = 0.0
        self.created_at = datetime.now()
    
    def log_decision(self, agent_id: str, decision: str, cost: float):
        """Every agent decision is logged for audit."""
        self.decisions[f"{agent_id}_{len(self.decisions)}"] = {
            "agent": agent_id,
            "decision": decision,
            "cost": cost,
            "timestamp": datetime.now().isoformat(),
        }
        self.cost_spent += cost
    
    def get_audit_trail(self) -> list:
        """For compliance: prove what happened when."""
        return sorted(self.decisions.values(), key=lambda x: x["timestamp"])

Observability: Tracing Multi-Agent Flows

You need visibility into: which agent ran, why, how long, at what cost.

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

async def supervisor_with_tracing(request):
    with tracer.start_as_current_span("supervisor.process") as span:
        span.set_attribute("request.id", request.id)
        
        for specialist in self.specialists:
            with tracer.start_as_current_span(f"specialist.{specialist.id}") as child_span:
                result = await specialist.handle(request)
                child_span.set_attribute("result.quality", result.confidence)
                child_span.set_attribute("cost", specialist.cost)
        
        return aggregate_results()

Common Failure Modes (and How to Prevent Them)

1. Runaway Agent Cost

Problem: One specialist gets stuck in a loop, consuming $5K in 10 minutes. Solution: Per-agent timeout + cost ceiling. Kill the agent if it exceeds budget.

2. State Inconsistency

Problem: Supervisor reads cached state, specialist updates DB, supervisor’s cache is stale. Solution: All writes through single coordinator. No agent writes directly to shared state.

3. No Fallback Chain

Problem: Primary specialist fails, system crashes instead of trying specialist #2. Solution: Mandatory fallback list. Every agent has a backup.

Testing Multi-Agent Systems

Mock the expensive agents (LLM calls) during testing.

class MockSpecialist:
    def __init__(self, deterministic_response: dict):
        self.response = deterministic_response
    
    async def handle(self, request):
        return self.response

# Test supervisor with mocks
supervisor = SupervisorAgent(
    specialists=[
        MockSpecialist({"decision": "approve"}),
        MockSpecialist({"decision": "review"}),
    ],
    budget=1.0
)

result = await supervisor.process_request({"amount": 1000})
assert result["decision"] in ["approve", "review"]

This is why production multi-agent systems are cheaper than you think: most of the cost is LLM calls, and you can test without them.