The flag
In Genie:
func TierOf(a Agent) Tier {
if t, ok := a.(TierAware); ok {
return t.Tier()
}
return TierPrototype // ← the safe default
}
An agent that doesn’t implement TierAware is treated as Prototype, not Production. The dispatch gate refuses Prototype for customer traffic.
This is the flag. It catches the case where an engineer ships an agent and forgets to mark its tier.
The culture
The flag is necessary; it’s not sufficient. The flag catches forgetting; it doesn’t catch sloppy declarations. An engineer can write return TierProduction without earning it.
The culture is the set of things that make sloppy production declarations unlikely:
Promotion is a documented decision. Moving from Beta to Production is a PR that includes the test results, the adversarial-corpus pass, the fallback wiring, and the risk team’s sign-off. The PR template asks for each. The reviewer’s job is to verify each.
Promotion has reviewers. Three sets of eyes: the agent’s author, an engineering peer, a risk team representative. Three independent “yes” votes before the tier changes.
Demotion is a normal action. A production agent that started misbehaving gets demoted to Beta until fixed. Demoting is a one-line code change plus a PR; not a paperwork-heavy exercise.
Tier appears in dashboards. The on-call dashboard shows tier per agent. If a Prototype agent is taking customer traffic (something the dispatch gate should have prevented but didn’t, or a sandbox override is in effect), the dashboard reveals it.
The default for new agents is Sketch. A make scaffold command that creates a new agent template assigns TierSketch — even more conservative than Prototype. Engineers explicitly bump it as they earn it.
What the absence of culture looks like
I’ve seen teams ship a tier model that died within a quarter because:
- Engineers self-promoted to Production casually.
- Reviewers stamped Production tier without checking the promotion criteria.
- The risk team wasn’t involved in promotion decisions.
- Production-tier agents that started failing weren’t demoted because “we’ll fix it soon.”
The flag persisted in the codebase; the model died in practice.
The recurring discipline
For Genie’s promotion checklist:
- [ ] Tier() returns the new tier in source.
- [ ] RiskLevel() declared.
- [ ] Unit tests cover each branch of HandleMessage.
- [ ] At least one integration test in tests/ exercises it through the bus.
- [ ] Adversarial corpus passes (prompt injection + jailbreak suites).
- [ ] Fallback wired via orchestrator.SetFallback().
- [ ] BCP drill confirms fallback fires.
- [ ] Audit hooks emit on every output.
- [ ] Appears in /v1/ai-inventory with the new tier.
- [ ] Risk team signed off in the deployment record.
Ten checkboxes. Three reviewers. One PR per promotion. That’s the cultural shape.
What I’d carry forward
Two patterns from regulated finance that transfer:
-
Promotion is the moment of review. Not deployment, not first-customer-touch. Promotion. The review happens at the moment the agent’s status changes, while everyone’s attention is on it.
-
The flag is the floor. The culture is the ceiling. A team that operates at the flag’s level alone gets the cheap benefits. A team that operates at the cultural level gets the durable safety.
The flag is the cheap implementation. The culture is the work.