·2 min read·← All posts
Agents SLO Latency

The pattern

For some classes of query, multiple agents could answer. They differ on:

Latency-aware dispatch picks the agent based on the user’s latency SLO, not the agent’s “quality.”

The dispatcher

type SLOBudget struct {
    MaxLatency time.Duration
    MinConfidence float64
}

func dispatchBySLO(ctx context.Context, query Query, slo SLOBudget) Response {
    for _, agent := range agentsByLatency {  // sorted fastest first
        if agent.P99Latency > slo.MaxLatency { continue }
        resp := agent.Run(ctx, query)
        if resp.Confidence >= slo.MinConfidence {
            return resp
        }
    }
    return Response{Error: ErrNoAgentMeetsSLO}
}

Iterate fastest-first. Return the first response that meets confidence. Cap at the SLO.

Why this matters more for agentic than for microservices

In microservices, latency-aware routing is about replicas and locality. Same code; pick the closest instance.

In agents, the differences are conceptual: a small model can answer some queries; a large model has to for others. The dispatcher picks the right model, not the right replica.

For Genie’s agents/financial_supervisor, the dispatch decision factors in:

The SLO as a contract

The SLO is the user-facing promise. The dispatcher’s job is to honour it.

What happens when no agent meets the SLO:

For Genie, option A is the default. The disclaimer (FREE-AI Rec 25) carries the confidence level so the user can decide whether to trust it.

Where this shows up in practice

For the conversational call-centre agent (a Pratik-Kinetic-style voice deployment):

The user perceives a consistently responsive system. The team doesn’t pay full-production-agent cost on every query. Win/win.

What to monitor

Per-class latency histograms. Per-class confidence distributions. SLO-met rate (queries served within the budget).

When the SLO-met rate drops, you have two levers: add capacity to the slow path, or move more queries to the fast path (re-tune the classifier). Watch the rate weekly; adjust quarterly.

← Back to all posts