P26-06-10">

Five MAF orchestration shapes — adding Group Chat and Magentic

The first ten posts treated MAF as having four orchestration patterns. The official docs say five. Here are the two I missed — Group Chat and Magentic — with the API surface, when to pick each, and the test path that catches them at build time.

The earlier posts in this series covered four orchestration patterns: Sequential, Concurrent, Handoff, and Custom Graph. That's the set the project shipped with — and it's what most multi-agent diagrams show.

Then I read the 804-page official Microsoft Agent Framework PDF cover to cover and discovered the docs document five orchestrations, not four. The two I'd missed are Group Chat and Magentic. This post adds them, with offline tests, and explains when each one is the right call.

The full set

Pattern Topology When to pick
Sequential linear Refinement pipeline — each step builds on the last
Concurrent fan-out / fan-in Independent perspectives in parallel
Handoff mesh Agents transfer control to each other (no central orchestrator)
Group Chat star Multiple agents take turns; a manager (or selector) picks who speaks
Magentic star + planning Open-ended task; manager maintains a task ledger and progress ledger
Custom graph arbitrary DAG When none of the above fit — loops, conditionals, sub-workflows

The first four are mesh / fan-out / linear variations on the same idea. Magentic is qualitatively different — it adds planning to the orchestrator's job, complete with a paper-grounded design (Magentic-One) and three independent loop caps (rounds / stalls / resets).

Group Chat

agent_framework.orchestrations.GroupChatBuilder takes:

  • participants — the agents in the room
  • exactly one of: orchestrator_agent, orchestrator, or selection_func
  • max_rounds — hard cap on iteration

The orchestrator picks who speaks next each round. For testing and simple cases, a deterministic selection_func is cheaper than spinning up a manager LLM:

def _round_robin_selector(participant_names: list[str]):
    """Build a GroupChatSelectionFunction that alternates speakers."""
    index = {"value": 0}

    def select(state) -> str:
        name = participant_names[index["value"] % len(participant_names)]
        index["value"] += 1
        return name

    return select


def build_group_chat_workflow(client=None, *, max_rounds: int = 4):
    client = client or build_chat_client()
    writer = make_writer(client, require_per_service_call_history_persistence=True)
    critic = make_critic(client, require_per_service_call_history_persistence=True)
    return GroupChatBuilder(
        participants=[writer, critic],
        selection_func=_round_robin_selector(["writer", "critic"]),
        max_rounds=max_rounds,
        output_from="all",
    ).build()

The whole thing — including OTel wrapper and metrics — is in workflows/group_chat.py.

When to pick Group Chat over Handoff

The line is subtle. From the docs:

Group Chat Handoff
Star topology with a manager picking speakers Mesh topology with agents transferring control to each other
Iterative refinement (writer ↔ reviewer rounds) Routing the conversation to the right specialist
Manager owns the orchestration Each agent owns its handoff decision
Shared context — all agents see history Full context handed off; receiver owns the task

A writer-critic loop is Group Chat. A customer-support triage that routes to refund/order/return agents is Handoff.

Magentic

MagenticBuilder is the heaviest pattern. The manager maintains two ledgers and adapts in real time:

  • Task ledger — facts, plan, educated guesses (updated on initial planning and on replans)
  • Progress ledger — is the task complete? are we looping? next speaker? (updated every round)

The Python signature with all the knobs:

def build_magentic_workflow(
    client=None,
    *,
    max_round_count: int = 6,        # total rounds before forced termination
    max_stall_count: int = 3,        # consecutive stalls before manager replans
    max_reset_count: int = 2,        # full plan resets before final termination
    enable_plan_review: bool = False, # HITL signoff on initial plan + replans
) -> Workflow:
    manager = _make_manager(client)
    researcher = make_researcher(client, require_per_service_call_history_persistence=True)
    writer = make_writer(client, require_per_service_call_history_persistence=True)
    critic = make_critic(client, require_per_service_call_history_persistence=True)
    return MagenticBuilder(
        participants=[researcher, writer, critic],
        manager_agent=manager,
        max_round_count=max_round_count,
        max_stall_count=max_stall_count,
        max_reset_count=max_reset_count,
        enable_plan_review=enable_plan_review,
        output_from="all",
    ).build()

Why three independent loop caps? Each protects against a different failure mode:

Cap What it prevents
max_round_count Runaway iteration when neither agent signals completion
max_stall_count The manager wastes rounds asking the same question — triggers a replan
max_reset_count The plan itself is unsolvable — give up after N full plan rewrites

Without all three, Magentic can loop indefinitely on tasks it shouldn't try in the first place.

When to pick Magentic

The official docs put it bluntly:

"Magentic orchestration has the same architecture as the Group Chat orchestration pattern, with a very powerful manager that uses planning to coordinate agent collaboration. If your scenario requires simpler coordination without complex planning, consider using the Group Chat pattern instead."

In other words: try Group Chat first. Reach for Magentic when the task has open structure (no fixed pipeline), requires research + computation across multiple specialists, and the manager genuinely needs to plan.

A literal example from the Magentic-One paper: "Prepare a report comparing energy efficiency of ResNet-50, BERT-base, and GPT-2 on Azure Standard_NC6s_v3 VMs, including CO₂ estimates for 24-hour training, with tables and a final recommendation per task type." That's the kind of task where a manager picking the next speaker every round buys you something a fixed pipeline can't.

The three orchestrator events Magentic emits

Worth knowing because they're observable from your event-handling code:

async for ev in workflow_run.watch_stream_async():
    if isinstance(ev, MagenticPlanCreatedEvent):
        print(f"[plan] {ev.full_task_ledger.text}")
    elif isinstance(ev, MagenticReplannedEvent):
        print(f"[replan] {ev.full_task_ledger.text}")
    elif isinstance(ev, MagenticProgressLedgerUpdatedEvent):
        ledger = ev.progress_ledger
        print(f"[progress] complete={ledger.is_request_satisfied}, "
              f"loop={ledger.is_in_loop}, next={ledger.next_speaker}")

These give you a live view into the manager's state machine. Wire them into your observability dashboard alongside the standard workflow.* spans.

Testing without burning tokens

Both workflows are built offline without contacting an LLM. The MAF builders just wire participants and config — the LLM is only called when you await workflow.run(prompt). That means we can verify the build path with unit tests:

# tests/test_group_chat_workflow.py

def test_group_chat_workflow_builds_offline() -> None:
    workflow = build_group_chat_workflow(client=_DummyClient(), max_rounds=2)
    assert workflow is not None


@pytest.mark.parametrize("max_rounds", [1, 2, 4, 8])
def test_group_chat_workflow_respects_max_rounds(max_rounds: int) -> None:
    workflow = build_group_chat_workflow(client=_DummyClient(), max_rounds=max_rounds)
    assert workflow is not None
# tests/test_magentic_workflow.py

@pytest.mark.parametrize("rounds,stalls,resets", [(2,1,1), (4,2,1), (6,3,2), (10,5,3)])
def test_magentic_workflow_loop_caps(rounds, stalls, resets) -> None:
    workflow = build_magentic_workflow(
        client=_DummyClient(),
        max_round_count=rounds,
        max_stall_count=stalls,
        max_reset_count=resets,
    )
    assert workflow is not None

These tests run in milliseconds in CI. They don't catch runtime bugs in the orchestrations, but they catch the most common kind of failure: I've passed the wrong kwarg, or the participants list is empty, or the build is missing an orchestrator. (Group Chat requires one of orchestrator_agent / orchestrator / selection_func — forget to pass one and the builder raises ValueError. The test catches it.)

The full suite is now 83 tests, all offline, ~2 seconds. The four runtime patterns plus the two new ones, plus content / structured / approval helpers.

Running them

# Group Chat — writer ↔ critic round-robin, max 4 rounds
make group_chat PROMPT="Draft a one-paragraph product launch announcement."

# Magentic — planning manager + researcher + writer + critic
make magentic PROMPT="Plan a one-week curriculum for a beginner Python class."

Both honor the same MODEL= override the other workflows do. The Ollama default (granite4.1:3b) handles Group Chat fine. Magentic genuinely needs a stronger model — try qwen3.5:latest if you have it pulled, or point OLLAMA_MODEL= at a 7B+ model.

What this changes about the project

The orchestration patterns table in README.md now lists six rows (five MAF patterns + custom graph) instead of four. Two new modules — workflows/group_chat.py and workflows/magentic.py — each under 150 lines. Twelve new tests. Two new Makefile targets.

And one piece of advice the reading of the PDF surfaced and the post hammered home: try Group Chat before Magentic. Most of the time the simpler manager is enough, and the three loop caps in Magentic exist because the manager can get stuck. Save Magentic for the open-ended tasks that actually justify a planning loop.