Multi-turn evals from first principles
Single-turn evals check one decision. Multi-turn evals check the whole trajectory. Here
Posts about python. ← All posts
Single-turn evals check one decision. Multi-turn evals check the whole trajectory. Here
AgentSession is short-term memory. MemoryContextProvider + MemoryFileStore is long-term memory. Mem0 is long-term memory when you want it hosted. Here
The Microsoft Agent Framework deliberately does not ship an agent registry. Here
Sequential, Concurrent, Handoff, and Custom WorkflowBuilder. Four shapes the Microsoft Agent Framework ships out of the box. Each one is the right answer to a different question.
Microsoft published a 12-chapter reference architecture for multi-agent systems and a separate framework (MAF) to build them. I implemented one on top of the other and learned what each chapter actually demands in code.
We built a small Go + Python service that parses a project's INFORMATION_SCHEMA, asks Gemini to classify each top-spending query against a catalog of anti-patterns, and recommends a rewrite. It is not a magic box; it is a pipeline that cuts the human review time per query from 20 minutes to 90 seconds.