Multi-Agent Systems in 5 Interfaces: A MARA Implementation Field Report
Microsoft published a Multi-Agent Reference Architecture last year. I spent the last sprint implementing it for medical sequential diagnosis. This is what the architecture buys you, what it doesn't, and the design rules that actually matter.
The architecture in one sentence
A user (or API call) publishes a clinical message → the orchestrator delivers it to the right agent → the agent returns zero or more new messages → the orchestrator publishes those → governance gates each hop → the workflow continues until a terminal agent decides we’re done.
That’s it. Every interesting behaviour in Bodh — the open-source medical multi-agent platform I’ve been writing in Go — is a consequence of that loop.
The Microsoft Multi-Agent Reference Architecture (MARA) doesn’t tell you what your agents should do. It tells you what shape they should have, where governance runs, and how they coordinate. The clinical specificity lives in agents and policies. The platform is opinion-free about clinical reasoning.
If you’ve been building agentic systems with frameworks that bundle prompting + tool selection + state management + observability into one library, MARA will feel structurally different. Here’s what it looks like in code.
Five interfaces hold the whole platform together
Seventeen-plus agents, the bus, the registry, the governance composer, the HITL gate — all compose from these five contracts.
1. Agent
type Agent interface {
ID() string
Name() string
Capabilities() []string
HandleMessage(ctx context.Context, msg Message, env Environment) ([]Message, error)
}
ID()is the addressable name on the bus. Used as theTofield for routing.Capabilities()is a list of capability strings the registry can search by ("ask_question","verify_reasoning","order_test"). Capability-driven dispatch beats hardcoded peer names every time.HandleMessagetakes one message, returns zero or more outputs. The orchestrator handles routing those outputs back to the bus.
That’s the agent contract. Four methods. No “tools”, no “memory injection”, no “session state” — those are concerns at other interfaces.
2. Bus
type Bus interface {
Subscribe(agentID string, h Handler) (unsubscribe func())
Publish(ctx context.Context, msg protocol.Message)
}
Publish routes by msg.To. The default in-memory bus invokes matching handlers as goroutines. Production swaps Kafka / NATS JetStream / RabbitMQ without any agent code changes.
The bus implementation is where you make decisions about ordering, retries, persistence, and back-pressure. None of those decisions leak into agent code.
3. Registry
type Registry interface {
Register(ctx context.Context, a agent.Agent) error
Get(ctx context.Context, id string) (agent.Agent, error)
List(ctx context.Context) []agent.Agent
FindByCapability(ctx context.Context, capability string) []agent.Agent
}
The discovery layer. FindByCapability("ask_question") returns every agent that publishes that capability. In Bodh, the inbox_triage agent uses this to find a primary-care agent at runtime — capability-driven, not hard-coded.
Production replaces the in-memory registry with service-discovery-backed implementations (Consul, etcd, Kubernetes EndpointSlice) when agents live in separate pods.
4. Policy
type Policy interface {
Evaluate(ctx context.Context, msg protocol.Message) (PolicyResult, error)
}
Policies run before any agent’s HandleMessage. Returning a Deny short-circuits the message — the agent never sees it.
This is where HIPAA, GDPR, multi-tenancy, scope checks, rate limits, and cost caps live. Adding a new regulatory regime is: add a Policy, add it to the composer. The orchestrator never changes. The agents never change.
Bodh ships:
RequireCaseIDPolicy— fail closed on missing correlationMaxDiagnosticCostPolicy— per-case budget enforcementMaxContentLengthPolicy— DoS / prompt-bomb protectionRequireTenantPolicy— multi-tenant isolationRequireScopePolicy— SMART-on-FHIR scope checkResearchDisclaimerPolicy— research-only deployment marker
Each one is ~15 lines.
5. Environment
type Environment interface {
Now() time.Time
Logf(format string, args ...any)
}
Intentionally minimal. The orchestrator hands the same Environment to every agent’s HandleMessage. Extend by composition — add fields to your concrete type, give it richer methods, type-assert in the agent when you need them. This pattern keeps the interface stable while letting deployments inject memory, LLM clients, FHIR clients, KMS clients, secret stores.
The 30-line orchestrator closure
Here’s the most important code in the platform. The closure subscribes one agent to the bus and orchestrates governance + audit + handler dispatch + republish:
func (o *Orchestrator) subscribeAgentLocked(a agent.Agent) {
agentID := a.ID()
o.bus.Subscribe(agentID, func(c context.Context, msg agent.Message) {
// 1. Governance runs BEFORE the agent sees the message.
if o.policy != nil {
res, err := o.policy.Evaluate(c, msg)
if err != nil {
o.env.Logf("policy error for msg %s: %v", msg.ID, err)
return
}
if res.Decision == governance.DecisionDeny {
o.env.Logf("message %s denied by policy: %s", msg.ID, res.Reason)
o.recordAudit(audit.FromPolicyDecision(msg, agentID, res))
return // agent NEVER sees the message
}
}
// 2. Record the agent invocation.
o.env.Logf("agent %s handling message %s from %s", agentID, msg.ID, msg.From)
o.recordAudit(audit.FromMessage(msg, agentID))
// 3. Delegate to the agent implementation.
out, err := a.HandleMessage(c, msg, o.env)
if err != nil {
o.env.Logf("agent %s error: %v", agentID, err)
o.recordAudit(audit.Event{
Kind: audit.KindAgentError, AgentID: agentID,
ErrorText: err.Error(),
})
return
}
// 4. Republish outputs. One agent's output becomes another's input.
for _, m := range out {
o.bus.Publish(c, m)
}
})
}
Five properties fall out:
- Fail closed. A denied message has no path to the agent.
- Single audit point. Every governance decision and every agent invocation passes through this closure. Want HIPAA §164.312(b) audit controls? Instrument here once, not in 17 agents.
- Pluggability. Want multi-tenancy, scope checks, rate limits, SUD segmentation? Add a Policy. The orchestrator doesn’t change.
- No clinical logic. The orchestrator never branches on diagnosis, condition, or test results. Pure transport + policy.
- Audit is fail-open. If audit
Record()errors, log and continue — never block message processing because audit failed. Audit failures are operational; blocking the clinical flow because the sink is down is the worst outcome.
This 30 lines is what separates “agent framework” from “multi-agent platform.” Everything clinical or domain-specific lives outside of it.
Design rule that makes the rest of the system testable
Agents never call each other directly. They emit Messages with a To field. The bus routes. The orchestrator runs governance before any HandleMessage.
This single rule is what makes the system testable, auditable, and safe to refactor.
If diagnostician could call reasoning_verifier.Verify(...) directly:
- The audit log doesn’t see it (no message to record)
- The governance composer doesn’t see it (no policy evaluation)
- The HITL gate can’t intercept it (no envelope wrapping)
- Tests can’t observe it (no message to assert on)
- Bus implementation changes don’t affect it (in-process function call)
By forcing every interaction through the bus, every interaction becomes:
- Observable (one place to instrument)
- Auditable (one schema)
- Governable (one place to enforce policy)
- Testable (assert on emitted messages)
- Distributable (swap in-memory for Kafka)
That’s the dividend.
What sequential diagnosis looks like in this shape
A diagnostic case in Bodh runs 12 messages through 8 agents:
| Step | From | To | Type |
|---|---|---|---|
| 1 | user |
intake |
presentation |
| 2 | intake |
rpm_monitor |
case_state |
| 3 | rpm_monitor |
diagnostic_supervisor |
case_state |
| 4 | supervisor |
questioner |
case_state |
| 5 | questioner |
supervisor |
questions_complete |
| 6 | supervisor |
test_planner |
case_state |
| 7 | test_planner |
cost_guardian |
case_state |
| 8 | cost_guardian |
supervisor |
tests_complete |
| 9 | supervisor |
diagnostician |
case_state |
| 10 | diagnostician |
reasoning_verifier |
diagnosis_proposal |
| 11 | reasoning_verifier |
supervisor |
diagnosis |
| 12 | supervisor |
— | (terminal) |
12 messages = 12 audit events = 12 governance evaluations. No hidden state, no inter-agent function calls, no place where a clinician can’t intervene.
The supervisor is a pure state machine on msg.Type:
func (s *DiagnosticSupervisor) HandleMessage(
ctx context.Context, msg agent.Message, env agent.Environment,
) ([]agent.Message, error) {
switch msg.Type {
case "case_state":
return s.delegate(msg, "questioner", "case_state")
case "questions_complete":
return s.delegate(msg, "test_planner", "case_state")
case "tests_complete":
return s.delegate(msg, "diagnostician", "case_state")
case "tests_revise":
return s.reviseTests(msg)
case "diagnosis":
return s.finalize(msg, env)
}
return nil, nil
}
That’s it. The supervisor has zero clinical reasoning. It routes by message type. The clinical reasoning lives in the specialist agents.
The pattern composes
Adding a new condition (asthma, PR #6) was the canonical worked example of how the architecture composes:
medical.BuildCarePlangets acase "asthma":branch (~30 lines)medical.DefaultPanelRulesgets two new rules (~20 lines each)agents/test_planner.gogets an entry inconditionWorkupsmap (one line)agents/diagnostician.go(in PR #11) gets a specialty inference branch (~10 lines)- New tests, README update
Zero platform changes. No orchestrator change. No new policy. No new bus topic. No new HTTP endpoint. The asthma support lands as pure pkg/medical + agent extension.
That’s the composition dividend. The platform’s job is to stay out of the way of the clinical work.
Three things this architecture is NOT
Not a tool-using agent framework
There’s nothing in the MARA pattern about “tools” as a first-class concept. Tool use lives inside individual LLM-backed agents (the diagnostician_tools.go variant in Bodh wraps an llm.Client that supports tools). The orchestrator doesn’t care.
If your problem is “let an LLM call a function” — LangChain, LlamaIndex, the OpenAI SDK’s native tool-use, Anthropic’s tool-use, MCP all do this directly. You probably don’t need MARA.
Not a graph/DAG framework
If you’re thinking of LangGraph, Inngest, Temporal, Airflow — those are workflow engines. They model a DAG of steps, possibly with retries and human approvals.
MARA is a coordination pattern. Workflows emerge from the conversation between agents, not from a pre-declared DAG. The supervisor agent is what makes a workflow exist — and that’s a domain-specific agent, not platform code.
Not optimised for low latency
12 messages per case = 12 bus hops, 12 governance evaluations, 12 audit writes. In-memory, that’s ~5-50µs of platform overhead per hop. For clinical work, fine. For sub-millisecond real-time, no.
If your problem is high-frequency trading or real-time game state, look elsewhere. If your problem is clinical reasoning, financial advice, legal document analysis, customer support escalation — the latency budget is generous enough that the MARA discipline pays off in auditability.
When to reach for this pattern
| Use case | Does MARA fit? |
|---|---|
| Clinical decision support | ✅ Built for it |
| Financial advice / fraud review | ✅ Auditability is non-negotiable |
| Legal document analysis | ✅ Multi-step reasoning + audit |
| Customer support escalation | ✅ Multiple specialists + human handoff |
| Robotics control | 🟡 Latency may be too high |
| Real-time game NPCs | ❌ Too much overhead |
| One-shot Q&A | ❌ Overkill — use a single LLM call |
| Tool-using assistant (1-2 tools) | ❌ Single agent + tools is simpler |
The rule of thumb: if your problem has regulatory or audit pressure AND involves multi-step reasoning AND benefits from role-specialised reasoning (specialist 1 does X, specialist 2 does Y), MARA pays off. Otherwise, simpler patterns win.
Try it
git clone https://github.com/PratikDhanave/bodh.git
cd bodh
# Watch a full diagnostic case run through 12 messages
go run ./cmd/demo
# Inspect the 5 interfaces
ls pkg/{agent,comm,registry,governance,orchestration}/
# Read the 30-line closure
grep -A30 "subscribeAgentLocked" pkg/orchestration/orchestrator.go
Deep architecture documentation in docs/architecture.md.
Repo: github.com/PratikDhanave/bodh
If you’re building multi-agent systems and want to compare design decisions — the bus implementation, the registry shape, the policy composition pattern — issues, PRs, and DMs welcome.
Bodh is a research and engineering reference. The MARA implementation described here is the architecture target for clinical AI; the codebase is not approved for clinical use.