The 12-chapter reference architecture, in 102 Python files
Microsoft published a 12-chapter reference architecture for multi-agent systems and a separate framework (MAF) to build them. I implemented one on top of the other and learned what each chapter actually demands in code.
Microsoft published two things this year that look like they belong together but live in different places.
The first is the Multi-Agent Reference Architecture — a 12-chapter mdBook that says what an enterprise multi-agent system should contain: orchestrator + specialists, an agent registry, short- and long-term memory, request-based and message-driven communication, observability, evaluation, security, governance, context engineering, and a final reference topology.
The second is the Microsoft Agent Framework (MAF) — the Python and .NET SDKs that give you Agent, WorkflowBuilder, SequentialBuilder, ConcurrentBuilder, HandoffBuilder, and a swarm of optional integration packages (agent-framework-a2a, agent-framework-mem0, agent-framework-purview, agent-framework-lab-gaia, …).
The architecture is opinionated, agnostic of language. The framework is opinionated, specific to .NET and Python. The natural question is: if you take both seriously, what does the resulting code look like?
I spent a week answering that question. The result is PratikDhanave/microsofagenframework — a runnable Python implementation where every reference-architecture chapter maps to a concrete module, and every module that can be backed by an official MAF package is.
What's in the repo
- 102 files, 6,432 lines (mostly Python, with mermaid-heavy markdown docs and a Grafana dashboard JSON).
- 48 passing tests that exercise everything the framework doesn't auto-instrument.
- One
make alltarget that runs pre-flight, install, docker stack, tests, every workflow, and every eval, then prints the URLs of the observability stack. - Default LLM: local Ollama (
granite4.1:3b). No API key needed to run the whole demo.
Chapter → code map
This is the table that drove the design:
| # | Chapter | Code | MAF package used |
|---|---|---|---|
| 1 | Introduction (Design Principles) | (all modules) | — |
| 2 | Building Blocks | agents/ + registry/ |
agent_framework.Agent |
| 3 | Design Options | (project layout) | — |
| 4 | Agent Registry | registry/ |
project-local convention |
| 5 | Agents Communication | communication/ |
agent_framework_a2a |
| 6 | Memory (STM / LTM) | memory/ |
AgentSession, MemoryContextProvider, MemoryFileStore, agent_framework_mem0 |
| 7 | Observability | observability.py |
agent_framework.observability.configure_otel_providers |
| 8 | Evaluation | evals/ |
agent_framework_lab_gaia |
| 9 | Security | security/ |
agent_framework_purview |
| 10 | Governance | governance/ |
project-local + Agent Governance Toolkit |
| 11 | Context Engineering | context/ |
agent_framework.ContextProvider |
| 12 | Patterns | workflows/ |
the four MAF builders |
Two rows in that table are worth flagging:
Chapter 4 (Agent Registry) is project-local. MAF does not ship a built-in registry. Azure AI Foundry has a hosted-agent catalog (AIProjectClient.agents.list()) but it only sees Foundry-managed agents. If you want a registry that works across providers (Ollama, OpenAI, Foundry) you build one. The repo's registry/ is a 4-file @register(...) decorator + an in-memory store. It's a hundred lines.
Chapter 10 (Governance) is also project-local — until you wire in the Agent Governance Toolkit. MAF doesn't have a lifecycle state machine or an RAI checklist either. The repo ships both, and then integrates agent-governance-toolkit for real OWASP-Agentic-Top-10-aligned policy enforcement. There's a separate post on that.
The shape it ends up as
flowchart TB
Client["👤 Client (CLI / DevUI / API)"]
subgraph Orchestration["Orchestration · workflows/"]
Sequential["SequentialBuilder"]
Concurrent["ConcurrentBuilder"]
Handoff["HandoffBuilder"]
Custom["WorkflowBuilder (custom graph)"]
end
subgraph Agents["Agents · agents/"]
Orch["Orchestrator"]
Planner["Planner"]
Researcher["Researcher"]
Writer["Writer"]
Critic["Critic"]
Classifier["Classifier"]
end
Registry[("Registry · registry/")]
Memory[("Memory · AgentSession + MemoryContextProvider")]
Comm["Communication · A2A wrapper"]
Sec["Security · Purview + RBAC"]
Gov["Governance · Lifecycle + RAI + AGT"]
Ctx["Context · Tool selectors"]
Obs["Observability · OTel"]
Client -->|prompt| Orchestration
Orchestration -->|delegate| Agents
Agents -->|discover| Registry
Agents -->|state| Memory
Agents -->|distributed| Comm
Agents -->|policy| Sec
Registry -.->|gate promotion| Gov
Orchestration -->|assemble| Ctx
Agents -.->|spans + metrics| Obs
Six agents. Four orchestration topologies. Seven cross-cutting modules. One CLI.
What you actually run
git clone https://github.com/PratikDhanave/microsofagenframework
cd microsofagenframework
make all
make all runs eight sequenced steps:
- Pre-flight: verify Ollama is up, pull
granite4.1:3bif needed. - Install:
pip install -e ".[dev]"+agent-governance-toolkit[full]. - Boot OpenTelemetry collector + Jaeger + Prometheus + Grafana.
- 48 unit tests (~2 seconds, no LLM needed).
- Smoke: registry populated + custom_graph end-to-end.
- Sequential + Concurrent + Custom_graph workflows against Ollama.
- Tool-call eval + multi-turn agent eval.
- URLs for Grafana / Prometheus / Jaeger.
The whole thing on my laptop runs in about 90 seconds after the first install. The model is small enough to be honest in real time; the framework integration is what's being demonstrated, not a benchmark score.
What you don't get
This is a reference implementation, not a product. Specifically:
- The Researcher agent's web-search tool is a stub that returns fabricated results. The article on tool design says the boundary between "test the agent" and "test the tools" is a clean place to draw a line, so the tool is mocked and the agent's decision about whether to use it is what the evals score.
- The Handoff pattern works on OpenAI and Foundry but breaks on Ollama right now due to an upstream bug in
agent-framework-ollamathat forwards anallow_multiple_tool_callskwarg theollamapackage never supported. Sequential, concurrent, and custom-graph all work on Ollama; handoff is documented as such. - The Purview security tier is only useful if you have an Azure tenant with Purview entitlement. A regex-based PII redaction + audit log is shipped as the no-cloud fallback so the project keeps working in dev/CI.
Why bother
Reading 12 chapters of a reference architecture in the abstract is not the same as having a Python module to read. You can see the module, run it, change one line, and watch what breaks. You can compare what the architecture asked for to what MAF gives you and see — concretely — which chapters MAF has thought hardest about (memory, observability, A2A) and which ones it leaves to you (registry, lifecycle, RAI).
The next nine posts in this series go through that map module by module. The next one is about the four MAF orchestration patterns and when each one is the right shape.
If you want to skip ahead, the repo's docs/chapters/SUMMARY.md is the table above with one markdown doc per chapter, each with its own mermaid diagrams.