Memory done right with MAF
AgentSession is short-term memory. MemoryContextProvider + MemoryFileStore is long-term memory. Mem0 is long-term memory when you want it hosted. Here's the shape — and the one boundary the framework gets right that your custom memory layer probably doesn't.
I built memory for this project twice. The first time I wrote a Memory ABC, an InMemorySTM class with a dict-per-namespace, and a FileLTM class that wrote newline-delimited JSON. Two hundred lines of code, two interfaces, full test coverage.
Then I read the MAF source and deleted all of it.
MAF ships exactly the primitives the reference architecture's Chapter 6 asks for — and the boundary it draws is the right one. This post is about what those primitives are, the shape the project ends up as, and why my first attempt was a regression.
The boundary that matters
The architecture distinguishes Short-Term Memory (STM) and Long-Term Memory (LTM):
- STM holds the active conversation. It's per-session, scoped, evictable.
- LTM holds durable facts, preferences, decisions. It's per-user (or per-tenant), persistent, queried into the prompt.
The mistake my first attempt made was treating these as the same interface — both Memory ABCs with read/write/list/delete. They're not the same. STM is state the framework manages on every turn. LTM is context the framework injects on every prompt. The verbs are different.
MAF reflects this:
from agent_framework import (
AgentSession, # STM
MemoryContextProvider, # LTM
MemoryFileStore, # LTM backing store
)
from agent_framework_mem0 import Mem0ContextProvider # LTM, hosted
AgentSession is a thin handle the agent reads through. MemoryContextProvider is a middleware-style hook the agent calls before and after every run. Two interfaces, two concerns.
STM: AgentSession
The project's memory/factory.py wraps AgentSession in a new_session() helper. That's all it is:
from agent_framework import AgentSession
def new_session(session_id: str | None = None) -> AgentSession:
return AgentSession(session_id=session_id)
Then you pass it to agent.run():
session = new_session("chat-2026-05-29")
response = await agent.run("What did we decide last time?", session=session)
The agent's configured HistoryProvider (MAF supplies an InMemoryHistoryProvider by default) writes the turn into the session and reads recent turns back on the next call. You never touch the storage layer directly. You hand the framework a session id; the framework does the rest.
The thing I had to un-learn: I wanted to manage messages myself. MAF won't let you, and it shouldn't. The conversation contract — "what messages did this agent see?" — is something the framework owns end-to-end so it can also own tracing, token counting, and compaction. Take the abstraction.
LTM: MemoryContextProvider over a store
LTM is the part the agent queries into the prompt. You wire it via context_providers=[...]:
from agent_framework import Agent
from multi_agent.memory import make_memory_context_provider, new_session
from multi_agent.providers import build_chat_client
memory = make_memory_context_provider(owner_state_key="user-42")
agent = Agent(
client=build_chat_client(),
name="researcher",
instructions="...",
context_providers=[memory],
)
session = new_session("chat-2026-05-29")
response = await agent.run("What did we decide last time?", session=session)
make_memory_context_provider is a 10-line factory:
def make_memory_context_provider(
*,
owner_state_key: str,
base_path: Path = DEFAULT_MEMORY_DIR,
recent_turns: int = 0,
consolidation_client: Any | None = None,
) -> MemoryContextProvider:
store = MemoryFileStore(base_path=base_path, owner_state_key=owner_state_key)
return MemoryContextProvider(
store=store,
recent_turns=recent_turns,
consolidation_client=consolidation_client,
)
What MemoryContextProvider does on every call is:
sequenceDiagram
autonumber
User->>Agent: agent.run(prompt, session=S)
Agent->>Session: read recent turns (STM)
Session-->>Agent: STM messages
Agent->>Ctx: before_run(prompt)
Ctx->>Store: list_topics(owner)
Store-->>Ctx: topic index + summaries
Ctx-->>Agent: durable memory chunks
Agent->>LLM: complete(STM + LTM + prompt)
LLM-->>Agent: response
Agent->>Ctx: after_run(transcript)
Ctx->>Store: extract + consolidate
Agent-->>User: response
Two parts of that are worth zooming in on.
before_run is the injection point. The provider reads from the store, picks the topic memories most relevant to the prompt, and adds them to the messages going to the model. This is what makes LTM different from "load everything into the system prompt" — only the topics that look relevant for this prompt get pulled in.
after_run is the extraction point. The provider takes the full transcript and asks the model (the same one, or a separate consolidation_client) to extract memory candidates as JSON of {topic, memory} items. Those land in the store. The default extraction prompt is in MAF's source and is worth reading; it explicitly says "include only durable facts, preferences, decisions, or patterns worth remembering later" and "do not include transient tasks, temporary reminders, one-off outputs, or tool chatter." That's the prompt that decides what your LTM looks like.
When to swap MemoryFileStore for Mem0
MemoryFileStore is fine for development, single-machine, single-user. The moment you want:
- Semantic recall (vector similarity over memories rather than topic-index lookup)
- Cross-user analytics
- A hosted backend you don't have to operate
…you swap to Mem0ContextProvider from agent-framework-mem0. Same ContextProvider interface, same Agent(context_providers=[...]) wiring:
def make_mem0_context_provider(*, user_id, api_key=None, agent_id=None, application_id=None):
from agent_framework_mem0 import Mem0ContextProvider
return Mem0ContextProvider(
user_id=user_id, api_key=api_key,
agent_id=agent_id, application_id=application_id,
)
The agent code does not change. That's the value of the framework drawing the boundary correctly.
Design approaches the chapter calls out
The reference architecture sketches three shapes for STM:
flowchart LR
subgraph Shared["Shared Memory (this project's default)"]
AS1[Agent A] --> CS[(Single store)]
AS2[Agent B] --> CS
AS3[Agent C] --> CS
end
subgraph Distributed["Distributed Memory"]
AD1[Agent A] --> CD1[(Store A)]
AD2[Agent B] --> CD2[(Store B)]
end
subgraph Hybrid["Hybrid Memory"]
AH1[Agent A] --> CH1[(Local cache)]
AH2[Agent B] --> CH2[(Local cache)]
CH1 --> CSh[(Shared store)]
CH2 --> CSh
end
The repo's default is shared memory — all agents read/write the same MemoryFileStore because they're all in the same process. To distribute, give each agent service its own namespaced MemoryFileStore (or a per-agent Cosmos DB collection). The hybrid is what you'd build with Redis as a local cache backing onto Cosmos.
The shape you choose is independent of the API. That's why drawing the boundary at the ContextProvider interface matters more than the backing store you pick.
What I'd warn against
- Don't write a custom
MemoryABC. I did, and it didn't have anywhere to go.ContextProvideris the interface, and you should plug into it. - Don't dump message history into LTM. The framework's extraction step exists to keep transient chatter out. Trust it. (If you don't trust the extraction prompt, replace the
consolidation_client— don't bypass the extraction step.) - Don't share an LTM namespace across users.
owner_state_keyis the per-user scope. Use it.
The whole memory/ module is now under 100 lines. That's the right size — the framework does the work; the project's job is to wire it.