Cost-aware agent dispatch — when the cheap agent is enough

The economics

Production agents tend to be expensive per query:

They use the strongest LLM (cost: $$).
They use the most tools (latency: seconds).
They emit full audit trails (compute + storage).

For a question like “what’s my balance?” — none of that is needed. A simple Postgres query plus a string-format would do.

A cost-aware dispatcher routes between the cheap path (direct query) and the expensive path (full agent) based on the query.

The classifier

type Class int
const (
    ClassSimpleLookup Class = iota   // → cheap path
    ClassConversational              // → small LLM, no tools
    ClassDomainSpecific              // → production agent
    ClassAdversarial                 // → safety review + production agent
)

func classify(ctx context.Context, query string) Class {
    return smallClassifier.Predict(ctx, query)
}

func dispatch(ctx context.Context, query string) Response {
    switch classify(ctx, query) {
    case ClassSimpleLookup:
        return directQuery(ctx, query)        // ~10ms, free
    case ClassConversational:
        return smallLLM(ctx, query)           // ~200ms, $0.0001
    case ClassDomainSpecific:
        return productionAgent.Run(ctx, query) // ~3s, $0.05
    case ClassAdversarial:
        return safetyReview(ctx, query)        // ~5s, $0.10
    }
}

The classifier is small and fast. The cost per classification is dwarfed by the cost saved.

Numbers from one workload

For a customer-support copilot serving ~50K queries/day:

Path	Volume	Cost/query	Daily cost
Simple lookup	35% (17,500)	$0.0001	$1.75
Conversational	25% (12,500)	$0.001	$12.50
Domain agent	35% (17,500)	$0.05	$875
Safety review	5% (2,500)	$0.10	$250
Total	50,000	avg $0.023	$1,139

Without the dispatcher (all queries → production agent): 50,000 × $0.05 = $2,500/day. The dispatcher cut cost ~55% without affecting user experience.

What to watch for

Misclassification cost asymmetry. Routing a domain-specific query to the simple lookup is bad (wrong answer). Routing a simple lookup to the production agent is just expensive. Bias the classifier toward the more thorough path on uncertain cases.

Classifier drift. As your query distribution changes (new feature launches, seasonal patterns), the classifier’s accuracy drifts. Retrain quarterly; monitor the per-class quality.

The “simple lookup that becomes complex” case. User asks “what’s my balance” but then follows up with “and why did it drop yesterday?” The follow-up needs the agent. The dispatcher has to be conversation-aware, not query-aware.

Genie’s wiring

pkg/llm/router.go does this for the LLM selection level (small model vs large model). The agent-level dispatcher is per-deployment — for Genie’s hosted shape it’s a separate handler that classifies and routes. The pattern is identical at both levels.

For any production AI workload above a few thousand queries per day, this is the highest-leverage cost optimisation that doesn’t compromise quality. The classifier is small; the savings compound.