The economics
Production agents tend to be expensive per query:
- They use the strongest LLM (cost: $$).
- They use the most tools (latency: seconds).
- They emit full audit trails (compute + storage).
For a question like “what’s my balance?” — none of that is needed. A simple Postgres query plus a string-format would do.
A cost-aware dispatcher routes between the cheap path (direct query) and the expensive path (full agent) based on the query.
The classifier
type Class int
const (
ClassSimpleLookup Class = iota // → cheap path
ClassConversational // → small LLM, no tools
ClassDomainSpecific // → production agent
ClassAdversarial // → safety review + production agent
)
func classify(ctx context.Context, query string) Class {
return smallClassifier.Predict(ctx, query)
}
func dispatch(ctx context.Context, query string) Response {
switch classify(ctx, query) {
case ClassSimpleLookup:
return directQuery(ctx, query) // ~10ms, free
case ClassConversational:
return smallLLM(ctx, query) // ~200ms, $0.0001
case ClassDomainSpecific:
return productionAgent.Run(ctx, query) // ~3s, $0.05
case ClassAdversarial:
return safetyReview(ctx, query) // ~5s, $0.10
}
}
The classifier is small and fast. The cost per classification is dwarfed by the cost saved.
Numbers from one workload
For a customer-support copilot serving ~50K queries/day:
| Path | Volume | Cost/query | Daily cost |
|---|---|---|---|
| Simple lookup | 35% (17,500) | $0.0001 | $1.75 |
| Conversational | 25% (12,500) | $0.001 | $12.50 |
| Domain agent | 35% (17,500) | $0.05 | $875 |
| Safety review | 5% (2,500) | $0.10 | $250 |
| Total | 50,000 | avg $0.023 | $1,139 |
Without the dispatcher (all queries → production agent): 50,000 × $0.05 = $2,500/day. The dispatcher cut cost ~55% without affecting user experience.
What to watch for
Misclassification cost asymmetry. Routing a domain-specific query to the simple lookup is bad (wrong answer). Routing a simple lookup to the production agent is just expensive. Bias the classifier toward the more thorough path on uncertain cases.
Classifier drift. As your query distribution changes (new feature launches, seasonal patterns), the classifier’s accuracy drifts. Retrain quarterly; monitor the per-class quality.
The “simple lookup that becomes complex” case. User asks “what’s my balance” but then follows up with “and why did it drop yesterday?” The follow-up needs the agent. The dispatcher has to be conversation-aware, not query-aware.
Genie’s wiring
pkg/llm/router.go does this for the LLM selection level (small model vs large model). The agent-level dispatcher is per-deployment — for Genie’s hosted shape it’s a separate handler that classifies and routes. The pattern is identical at both levels.
For any production AI workload above a few thousand queries per day, this is the highest-leverage cost optimisation that doesn’t compromise quality. The classifier is small; the savings compound.