Notes from integrating OpenTelemetry into airshipit, an open-source bare-metal Kubernetes lifecycle project with contributions from Ericsson, AT&T, Microsoft, and others. The hard part wasn't OTel; it was making distributed traces useful across foreign code.
The azure-service-operator project lets you declare Azure resources as Kubernetes objects. Notes from the multi-vendor collaboration shape: how decisions got made, what slowed us down, what shipped despite it.
The transaction engine had to absorb 30K+ TPS across partner integrations, never lose a transaction, and survive partial failures. The architecture: Go, Kafka, Pub/Sub, Redis, K8s, with idempotency at every layer.
Field notes from running multi-agent AI on K8s. The patterns the book recommends, the ones that survived contact with production, and the ones that broke in interesting ways.
GOMEMLIMIT tells the Go runtime to keep memory below a soft cap by running GC harder when it's close. For containers with hard memory limits, this prevents OOM kills. The setting every Go service in K8s should have.
Multi-agent stacks have state: vector indexes, chat histories, agent memory. GKE for AI workloads needs StatefulSets, PVCs, gateway controllers, and the patterns that work in 2026.