Multi-Agent Systems in 5 Interfaces: A MARA Implementation Field Report

Microsoft published a Multi-Agent Reference Architecture last year. I spent the last sprint implementing it for medical sequential diagnosis. This is what the architecture buys you, what it doesn't, and the design rules that actually matter.

The architecture in one sentence

A user (or API call) publishes a clinical message → the orchestrator delivers it to the right agent → the agent returns zero or more new messages → the orchestrator publishes those → governance gates each hop → the workflow continues until a terminal agent decides we’re done.

That’s it. Every interesting behaviour in Bodh — the open-source medical multi-agent platform I’ve been writing in Go — is a consequence of that loop.

The Microsoft Multi-Agent Reference Architecture (MARA) doesn’t tell you what your agents should do. It tells you what shape they should have, where governance runs, and how they coordinate. The clinical specificity lives in agents and policies. The platform is opinion-free about clinical reasoning.

If you’ve been building agentic systems with frameworks that bundle prompting + tool selection + state management + observability into one library, MARA will feel structurally different. Here’s what it looks like in code.


Five interfaces hold the whole platform together

Seventeen-plus agents, the bus, the registry, the governance composer, the HITL gate — all compose from these five contracts.

1. Agent

type Agent interface {
    ID() string
    Name() string
    Capabilities() []string
    HandleMessage(ctx context.Context, msg Message, env Environment) ([]Message, error)
}

That’s the agent contract. Four methods. No “tools”, no “memory injection”, no “session state” — those are concerns at other interfaces.

2. Bus

type Bus interface {
    Subscribe(agentID string, h Handler) (unsubscribe func())
    Publish(ctx context.Context, msg protocol.Message)
}

Publish routes by msg.To. The default in-memory bus invokes matching handlers as goroutines. Production swaps Kafka / NATS JetStream / RabbitMQ without any agent code changes.

The bus implementation is where you make decisions about ordering, retries, persistence, and back-pressure. None of those decisions leak into agent code.

3. Registry

type Registry interface {
    Register(ctx context.Context, a agent.Agent) error
    Get(ctx context.Context, id string) (agent.Agent, error)
    List(ctx context.Context) []agent.Agent
    FindByCapability(ctx context.Context, capability string) []agent.Agent
}

The discovery layer. FindByCapability("ask_question") returns every agent that publishes that capability. In Bodh, the inbox_triage agent uses this to find a primary-care agent at runtime — capability-driven, not hard-coded.

Production replaces the in-memory registry with service-discovery-backed implementations (Consul, etcd, Kubernetes EndpointSlice) when agents live in separate pods.

4. Policy

type Policy interface {
    Evaluate(ctx context.Context, msg protocol.Message) (PolicyResult, error)
}

Policies run before any agent’s HandleMessage. Returning a Deny short-circuits the message — the agent never sees it.

This is where HIPAA, GDPR, multi-tenancy, scope checks, rate limits, and cost caps live. Adding a new regulatory regime is: add a Policy, add it to the composer. The orchestrator never changes. The agents never change.

Bodh ships:

Each one is ~15 lines.

5. Environment

type Environment interface {
    Now() time.Time
    Logf(format string, args ...any)
}

Intentionally minimal. The orchestrator hands the same Environment to every agent’s HandleMessage. Extend by composition — add fields to your concrete type, give it richer methods, type-assert in the agent when you need them. This pattern keeps the interface stable while letting deployments inject memory, LLM clients, FHIR clients, KMS clients, secret stores.


The 30-line orchestrator closure

Here’s the most important code in the platform. The closure subscribes one agent to the bus and orchestrates governance + audit + handler dispatch + republish:

func (o *Orchestrator) subscribeAgentLocked(a agent.Agent) {
    agentID := a.ID()
    o.bus.Subscribe(agentID, func(c context.Context, msg agent.Message) {
        // 1. Governance runs BEFORE the agent sees the message.
        if o.policy != nil {
            res, err := o.policy.Evaluate(c, msg)
            if err != nil {
                o.env.Logf("policy error for msg %s: %v", msg.ID, err)
                return
            }
            if res.Decision == governance.DecisionDeny {
                o.env.Logf("message %s denied by policy: %s", msg.ID, res.Reason)
                o.recordAudit(audit.FromPolicyDecision(msg, agentID, res))
                return  // agent NEVER sees the message
            }
        }

        // 2. Record the agent invocation.
        o.env.Logf("agent %s handling message %s from %s", agentID, msg.ID, msg.From)
        o.recordAudit(audit.FromMessage(msg, agentID))

        // 3. Delegate to the agent implementation.
        out, err := a.HandleMessage(c, msg, o.env)
        if err != nil {
            o.env.Logf("agent %s error: %v", agentID, err)
            o.recordAudit(audit.Event{
                Kind: audit.KindAgentError, AgentID: agentID,
                ErrorText: err.Error(),
            })
            return
        }

        // 4. Republish outputs. One agent's output becomes another's input.
        for _, m := range out {
            o.bus.Publish(c, m)
        }
    })
}

Five properties fall out:

  1. Fail closed. A denied message has no path to the agent.
  2. Single audit point. Every governance decision and every agent invocation passes through this closure. Want HIPAA §164.312(b) audit controls? Instrument here once, not in 17 agents.
  3. Pluggability. Want multi-tenancy, scope checks, rate limits, SUD segmentation? Add a Policy. The orchestrator doesn’t change.
  4. No clinical logic. The orchestrator never branches on diagnosis, condition, or test results. Pure transport + policy.
  5. Audit is fail-open. If audit Record() errors, log and continue — never block message processing because audit failed. Audit failures are operational; blocking the clinical flow because the sink is down is the worst outcome.

This 30 lines is what separates “agent framework” from “multi-agent platform.” Everything clinical or domain-specific lives outside of it.


Design rule that makes the rest of the system testable

Agents never call each other directly. They emit Messages with a To field. The bus routes. The orchestrator runs governance before any HandleMessage.

This single rule is what makes the system testable, auditable, and safe to refactor.

If diagnostician could call reasoning_verifier.Verify(...) directly:

By forcing every interaction through the bus, every interaction becomes:

That’s the dividend.


What sequential diagnosis looks like in this shape

A diagnostic case in Bodh runs 12 messages through 8 agents:

Step From To Type
1 user intake presentation
2 intake rpm_monitor case_state
3 rpm_monitor diagnostic_supervisor case_state
4 supervisor questioner case_state
5 questioner supervisor questions_complete
6 supervisor test_planner case_state
7 test_planner cost_guardian case_state
8 cost_guardian supervisor tests_complete
9 supervisor diagnostician case_state
10 diagnostician reasoning_verifier diagnosis_proposal
11 reasoning_verifier supervisor diagnosis
12 supervisor (terminal)

12 messages = 12 audit events = 12 governance evaluations. No hidden state, no inter-agent function calls, no place where a clinician can’t intervene.

The supervisor is a pure state machine on msg.Type:

func (s *DiagnosticSupervisor) HandleMessage(
    ctx context.Context, msg agent.Message, env agent.Environment,
) ([]agent.Message, error) {
    switch msg.Type {
    case "case_state":
        return s.delegate(msg, "questioner", "case_state")
    case "questions_complete":
        return s.delegate(msg, "test_planner", "case_state")
    case "tests_complete":
        return s.delegate(msg, "diagnostician", "case_state")
    case "tests_revise":
        return s.reviseTests(msg)
    case "diagnosis":
        return s.finalize(msg, env)
    }
    return nil, nil
}

That’s it. The supervisor has zero clinical reasoning. It routes by message type. The clinical reasoning lives in the specialist agents.


The pattern composes

Adding a new condition (asthma, PR #6) was the canonical worked example of how the architecture composes:

  1. medical.BuildCarePlan gets a case "asthma": branch (~30 lines)
  2. medical.DefaultPanelRules gets two new rules (~20 lines each)
  3. agents/test_planner.go gets an entry in conditionWorkups map (one line)
  4. agents/diagnostician.go (in PR #11) gets a specialty inference branch (~10 lines)
  5. New tests, README update

Zero platform changes. No orchestrator change. No new policy. No new bus topic. No new HTTP endpoint. The asthma support lands as pure pkg/medical + agent extension.

That’s the composition dividend. The platform’s job is to stay out of the way of the clinical work.


Three things this architecture is NOT

Not a tool-using agent framework

There’s nothing in the MARA pattern about “tools” as a first-class concept. Tool use lives inside individual LLM-backed agents (the diagnostician_tools.go variant in Bodh wraps an llm.Client that supports tools). The orchestrator doesn’t care.

If your problem is “let an LLM call a function” — LangChain, LlamaIndex, the OpenAI SDK’s native tool-use, Anthropic’s tool-use, MCP all do this directly. You probably don’t need MARA.

Not a graph/DAG framework

If you’re thinking of LangGraph, Inngest, Temporal, Airflow — those are workflow engines. They model a DAG of steps, possibly with retries and human approvals.

MARA is a coordination pattern. Workflows emerge from the conversation between agents, not from a pre-declared DAG. The supervisor agent is what makes a workflow exist — and that’s a domain-specific agent, not platform code.

Not optimised for low latency

12 messages per case = 12 bus hops, 12 governance evaluations, 12 audit writes. In-memory, that’s ~5-50µs of platform overhead per hop. For clinical work, fine. For sub-millisecond real-time, no.

If your problem is high-frequency trading or real-time game state, look elsewhere. If your problem is clinical reasoning, financial advice, legal document analysis, customer support escalation — the latency budget is generous enough that the MARA discipline pays off in auditability.


When to reach for this pattern

Use case Does MARA fit?
Clinical decision support ✅ Built for it
Financial advice / fraud review ✅ Auditability is non-negotiable
Legal document analysis ✅ Multi-step reasoning + audit
Customer support escalation ✅ Multiple specialists + human handoff
Robotics control 🟡 Latency may be too high
Real-time game NPCs ❌ Too much overhead
One-shot Q&A ❌ Overkill — use a single LLM call
Tool-using assistant (1-2 tools) ❌ Single agent + tools is simpler

The rule of thumb: if your problem has regulatory or audit pressure AND involves multi-step reasoning AND benefits from role-specialised reasoning (specialist 1 does X, specialist 2 does Y), MARA pays off. Otherwise, simpler patterns win.


Try it

git clone https://github.com/PratikDhanave/bodh.git
cd bodh

# Watch a full diagnostic case run through 12 messages
go run ./cmd/demo

# Inspect the 5 interfaces
ls pkg/{agent,comm,registry,governance,orchestration}/

# Read the 30-line closure
grep -A30 "subscribeAgentLocked" pkg/orchestration/orchestrator.go

Deep architecture documentation in docs/architecture.md.

Repo: github.com/PratikDhanave/bodh

If you’re building multi-agent systems and want to compare design decisions — the bus implementation, the registry shape, the policy composition pattern — issues, PRs, and DMs welcome.


Bodh is a research and engineering reference. The MARA implementation described here is the architecture target for clinical AI; the codebase is not approved for clinical use.

SoftwareArchitecture #MultiAgent #DistributedSystems #Go #Golang #MARA #SystemDesign #OpenSource #HealthTech