The 12-chapter reference architecture, in 102 Python files

Microsoft published a 12-chapter reference architecture for multi-agent systems and a separate framework (MAF) to build them. I implemented one on top of the other and learned what each chapter actually demands in code.

May 29, 2026 · Engineers shipping multi-agent systems to production

Multi-AgentMAFArchitecturePython

Microsoft published two things this year that look like they belong together but live in different places.

The first is the Multi-Agent Reference Architecture — a 12-chapter mdBook that says what an enterprise multi-agent system should contain: orchestrator + specialists, an agent registry, short- and long-term memory, request-based and message-driven communication, observability, evaluation, security, governance, context engineering, and a final reference topology.

The second is the Microsoft Agent Framework (MAF) — the Python and .NET SDKs that give you Agent, WorkflowBuilder, SequentialBuilder, ConcurrentBuilder, HandoffBuilder, and a swarm of optional integration packages (agent-framework-a2a, agent-framework-mem0, agent-framework-purview, agent-framework-lab-gaia, …).

The architecture is opinionated, agnostic of language. The framework is opinionated, specific to .NET and Python. The natural question is: if you take both seriously, what does the resulting code look like?

I spent a week answering that question. The result is PratikDhanave/microsofagenframework — a runnable Python implementation where every reference-architecture chapter maps to a concrete module, and every module that can be backed by an official MAF package is.

What's in the repo

102 files, 6,432 lines (mostly Python, with mermaid-heavy markdown docs and a Grafana dashboard JSON).
48 passing tests that exercise everything the framework doesn't auto-instrument.
One make all target that runs pre-flight, install, docker stack, tests, every workflow, and every eval, then prints the URLs of the observability stack.
Default LLM: local Ollama (granite4.1:3b). No API key needed to run the whole demo.

Chapter → code map

This is the table that drove the design:

#	Chapter	Code	MAF package used
1	Introduction (Design Principles)	(all modules)	—
2	Building Blocks	`agents/` + `registry/`	`agent_framework.Agent`
3	Design Options	(project layout)	—
4	Agent Registry	`registry/`	project-local convention
5	Agents Communication	`communication/`	`agent_framework_a2a`
6	Memory (STM / LTM)	`memory/`	`AgentSession`, `MemoryContextProvider`, `MemoryFileStore`, `agent_framework_mem0`
7	Observability	`observability.py`	`agent_framework.observability.configure_otel_providers`
8	Evaluation	`evals/`	`agent_framework_lab_gaia`
9	Security	`security/`	`agent_framework_purview`
10	Governance	`governance/`	project-local + Agent Governance Toolkit
11	Context Engineering	`context/`	`agent_framework.ContextProvider`
12	Patterns	`workflows/`	the four MAF builders

Two rows in that table are worth flagging:

Chapter 4 (Agent Registry) is project-local. MAF does not ship a built-in registry. Azure AI Foundry has a hosted-agent catalog (AIProjectClient.agents.list()) but it only sees Foundry-managed agents. If you want a registry that works across providers (Ollama, OpenAI, Foundry) you build one. The repo's registry/ is a 4-file @register(...) decorator + an in-memory store. It's a hundred lines.

Chapter 10 (Governance) is also project-local — until you wire in the Agent Governance Toolkit. MAF doesn't have a lifecycle state machine or an RAI checklist either. The repo ships both, and then integrates agent-governance-toolkit for real OWASP-Agentic-Top-10-aligned policy enforcement. There's a separate post on that.

The shape it ends up as

flowchart TB
    Client["👤 Client (CLI / DevUI / API)"]

    subgraph Orchestration["Orchestration · workflows/"]
        Sequential["SequentialBuilder"]
        Concurrent["ConcurrentBuilder"]
        Handoff["HandoffBuilder"]
        Custom["WorkflowBuilder (custom graph)"]
    end

    subgraph Agents["Agents · agents/"]
        Orch["Orchestrator"]
        Planner["Planner"]
        Researcher["Researcher"]
        Writer["Writer"]
        Critic["Critic"]
        Classifier["Classifier"]
    end

    Registry[("Registry · registry/")]
    Memory[("Memory · AgentSession + MemoryContextProvider")]
    Comm["Communication · A2A wrapper"]
    Sec["Security · Purview + RBAC"]
    Gov["Governance · Lifecycle + RAI + AGT"]
    Ctx["Context · Tool selectors"]
    Obs["Observability · OTel"]

    Client -->|prompt| Orchestration
    Orchestration -->|delegate| Agents
    Agents -->|discover| Registry
    Agents -->|state| Memory
    Agents -->|distributed| Comm
    Agents -->|policy| Sec
    Registry -.->|gate promotion| Gov
    Orchestration -->|assemble| Ctx
    Agents -.->|spans + metrics| Obs

Six agents. Four orchestration topologies. Seven cross-cutting modules. One CLI.

What you actually run

git clone https://github.com/PratikDhanave/microsofagenframework
cd microsofagenframework
make all

make all runs eight sequenced steps:

Pre-flight: verify Ollama is up, pull granite4.1:3b if needed.
Install: pip install -e ".[dev]" + agent-governance-toolkit[full].
Boot OpenTelemetry collector + Jaeger + Prometheus + Grafana.
48 unit tests (~2 seconds, no LLM needed).
Smoke: registry populated + custom_graph end-to-end.
Sequential + Concurrent + Custom_graph workflows against Ollama.
Tool-call eval + multi-turn agent eval.
URLs for Grafana / Prometheus / Jaeger.

The whole thing on my laptop runs in about 90 seconds after the first install. The model is small enough to be honest in real time; the framework integration is what's being demonstrated, not a benchmark score.

What you don't get

This is a reference implementation, not a product. Specifically:

The Researcher agent's web-search tool is a stub that returns fabricated results. The article on tool design says the boundary between "test the agent" and "test the tools" is a clean place to draw a line, so the tool is mocked and the agent's decision about whether to use it is what the evals score.
The Handoff pattern works on OpenAI and Foundry but breaks on Ollama right now due to an upstream bug in agent-framework-ollama that forwards an allow_multiple_tool_calls kwarg the ollama package never supported. Sequential, concurrent, and custom-graph all work on Ollama; handoff is documented as such.
The Purview security tier is only useful if you have an Azure tenant with Purview entitlement. A regex-based PII redaction + audit log is shipped as the no-cloud fallback so the project keeps working in dev/CI.

Why bother

Reading 12 chapters of a reference architecture in the abstract is not the same as having a Python module to read. You can see the module, run it, change one line, and watch what breaks. You can compare what the architecture asked for to what MAF gives you and see — concretely — which chapters MAF has thought hardest about (memory, observability, A2A) and which ones it leaves to you (registry, lifecycle, RAI).

The next nine posts in this series go through that map module by module. The next one is about the four MAF orchestration patterns and when each one is the right shape.

If you want to skip ahead, the repo's docs/chapters/SUMMARY.md is the table above with one markdown doc per chapter, each with its own mermaid diagrams.