February 17, 2026 · 2 min read

Cost-aware agent dispatch — when the cheap agent is enough

Not every query needs the production agent. A cost-aware dispatcher decides whether to route to the cheap-and-fast agent or the expensive-and-thorough one. Same UX, dramatically lower bill.

AgentsCost OptimisationLLM Ops
February 15, 2026 · 2 min read

The case for boring stack choices in regulated AI

Postgres over the latest vector DB. Go stdlib over the framework du jour. Single binary over Kubernetes operator. The choices that bore reviewers and delight on-call engineers.

ArchitectureOpinionGo
February 14, 2026 · 2 min read

Default-to-Prototype as a culture, not just a flag

An agent that doesn't declare a tier defaults to Prototype, not Production. The flag is the code; the culture is what enforces "new code is not production until someone says so."

CultureEngineeringTier Promotion
February 12, 2026 · 2 min read

GOMEMLIMIT and the soft GC pacing change every Go service should set

GOMEMLIMIT tells the Go runtime to keep memory below a soft cap by running GC harder when it's close. For containers with hard memory limits, this prevents OOM kills. The setting every Go service in K8s should have.

GoGOMEMLIMITMemoryKubernetes
February 10, 2026 · 2 min read

Running AWS Bedrock and Vertex AI in the same agent stack

An enterprise customer wants you on AWS; the next one wants you on GCP. The provider router pattern that keeps the agent code identical and swaps only the LLM endpoint.

AWSBedrockVertex AIMulti-CloudGo