2026

Blog posts from 2026. ← All posts

Years: 2026
· Engineering

The board policy is not a slide — it's a YAML file

The bank's board approves an AI policy. The policy exists as a slide deck nobody reads. The risk team's actual operational policy is what's in the code. Closing that gap is the FREE-AI Rec 14 win.

· Engineering

Audit logs are the API of record

The audit log isn't a side effect of the system. It's the contract you owe to regulators, customers, and your future self. Treat it as a first-class API — schema, versioning, and SLOs included.

· Engineering

Twelve Go idioms I changed my mind about

Patterns I confidently recommended five years ago that I'd argue against today. The list of "things you used to do in Go that don't pay back anymore."

· Engineering

errgroup patterns for parallel agent dispatch

Fan out to N agents; first error cancels the rest; collect successful results. errgroup is the right tool for this; the patterns are concise but worth getting exactly right.

· Engineering

SPIFFE/SPIRE basics — workload identity at deploy time

Services need identity too, not just users. SPIFFE issues SVIDs (verifiable identity documents) to workloads; SPIRE is the reference issuer. The shape and the first deploy.

· Engineering

mTLS at the proxy — Envoy + SPIRE-issued SVIDs

Pushing mTLS into a service mesh removes it from every individual service. Envoy + SPIRE is the canonical pattern; the implementation has fewer moving parts than the architecture diagrams suggest.

· Engineering

GraphRAG — when a knowledge graph beats vector search

Vector search treats every chunk as independent. GraphRAG models the relationships between entities, communities, and concepts. For corpus-spanning questions ("what's the relationship between X and Y"), graph wins.

· Engineering

BigQuery Knowledge Graph for entity resolution at scale

BigQuery has had a built-in knowledge graph since 2024. For entity resolution across millions of rows — the "is this John Smith the same as that John Smith" problem — it's the cheapest tool I've found.

· Engineering

HyDE — generate a hypothetical answer to improve retrieval

Embedding a question and embedding an answer often produce different vectors. HyDE generates a hypothetical answer to the question, embeds *that*, and retrieves on it. Retrieval quality goes up disproportionately.

· Engineering

Cost-aware agent dispatch — when the cheap agent is enough

Not every query needs the production agent. A cost-aware dispatcher decides whether to route to the cheap-and-fast agent or the expensive-and-thorough one. Same UX, dramatically lower bill.

· Engineering

The case for boring stack choices in regulated AI

Postgres over the latest vector DB. Go stdlib over the framework du jour. Single binary over Kubernetes operator. The choices that bore reviewers and delight on-call engineers.

· Engineering

Default-to-Prototype as a culture, not just a flag

An agent that doesn't declare a tier defaults to Prototype, not Production. The flag is the code; the culture is what enforces "new code is not production until someone says so."

· Engineering

GOMEMLIMIT and the soft GC pacing change every Go service should set

GOMEMLIMIT tells the Go runtime to keep memory below a soft cap by running GC harder when it's close. For containers with hard memory limits, this prevents OOM kills. The setting every Go service in K8s should have.

· Engineering

Running AWS Bedrock and Vertex AI in the same agent stack

An enterprise customer wants you on AWS; the next one wants you on GCP. The provider router pattern that keeps the agent code identical and swaps only the LLM endpoint.

· Engineering

Egress costs — the gotcha that kills cloud-arbitrage plans

Cross-cloud data movement is billed by the GB. The bill is invisible until it isn't. A multi-region or multi-cloud architecture that doesn't model egress costs in design will discover them in production.

· Engineering

Data residency in the Gulf — UAE ADGM/DIFC + Saudi SAMA at Bancnet

An open-banking platform serving UAE and Saudi customers had to honour three overlapping regulators: ADGM (Abu Dhabi), DIFC (Dubai), and SAMA (Saudi central bank). Notes on the architecture that satisfied all three.

· Engineering

Workload Identity Federation Azure → GCP for a real migration

Moving a workload from Azure to GCP while it continues to authenticate against on-prem Azure AD (Entra ID). Federation lets the GCP workload assume a GCP service account based on its Azure identity.

· Engineering

UPI integration — the spec quirks no one mentions

UPI is the most popular payment rail in India. The spec is precise. The implementation guides are not. Notes on the integration details that ate weeks the first time.

· Engineering

Brownlow — Cloud KMS + Security Command Center for vote integrity

Vote integrity needed two things the platform team couldn't fake even by accident: signing keys we couldn't access, and continuous security monitoring we couldn't silence. KMS + SCC delivered both.

· Engineering

AIGP body of knowledge — a Go engineer's reading map

IAPP's AI Governance Professional certification covers a body of knowledge worth knowing whether you certify or not. The mapping from BOK to working Go code for the engineer who wants to understand AI governance practically.

· Engineering

Ardan Ultimate AI #23 — Direct and indirect prompt injection, plus defenses

The single biggest LLM security risk. The example walks through both forms (direct from user input, indirect via retrieved content) and the layered defenses: system prompt isolation, content classification, output validation, structured tool schemas.

· Engineering

Ardan Ultimate AI #20 — Embedding-based semantic cache

Exact-match caching misses paraphrases. "What is the refund policy?" and "How do refunds work?" should both hit the same cached answer. Semantic cache embeds queries and matches by similarity.

· Engineering

Ardan Ultimate AI #19 — Speculative decoding with a draft model

Run a small draft model to predict several tokens at once; verify them in a single pass with the large model. Latency drops without quality dropping. The technique production LLM serving uses but most application engineers don't see.

· Engineering

Ardan Ultimate AI #18 — Incremental message caching (IMC) for chat

A long chat reprocesses the entire history on every turn. Prefix caching lets the LLM serve the cached KV-cache prefix from the previous turn and only compute the new suffix. Massive latency win on long conversations.

· Engineering

Ardan Ultimate AI #17 — Building an agent over an MCP server

Model Context Protocol standardises tool calling across LLMs. The example builds both sides: an MCP server exposing tools, and an agent that calls them. Works the same against any MCP-compatible LLM.

· Engineering

Ardan Ultimate AI #15 — A read-only NL→SQL tool

Give an LLM a SQL tool, watch it write delete statements. The read-only version: parse the generated SQL, refuse anything that isn't SELECT, validate against an allow-listed schema, run with a strict timeout.

· Engineering

Ardan Ultimate AI #14 — A streaming agent with a reasoning panel

Stream the agent's reasoning and tool calls to the UI as they happen. The user sees "thinking about X, calling tool Y, got result Z, now answering..." — dramatically better UX than waiting for the final answer.

· Engineering

Ardan Ultimate AI #13 — A minimal multi-tool agent loop

The smallest possible multi-tool agent. The loop is 30 lines of Go and shows exactly what an "agent" is — there's no magic, just a structured back-and-forth between the LLM and a set of tools until the model says stop.

· Engineering

Ardan Ultimate AI #12 — Two-phase tool calling explained

The tool-calling dance: the LLM emits a structured tool call → application runs the tool → application appends the result → the LLM uses it in the next turn. Two phases. Everything else is detail.

· Engineering

Ardan Ultimate AI #09 — Debugging retrieval in isolation (K and threshold)

When RAG gives wrong answers, the problem is usually retrieval, not the LLM. The example isolates the retrieval step so you can see exactly what chunks come back for a given query, with what scores, and tune K and the similarity threshold accordingly.

· Engineering

Ardan Ultimate AI #05 — The same question with and without RAG

Side-by-side comparison: ask the LLM a domain question with no context, then ask with retrieved context. The without-RAG answer is plausible nonsense. The with-RAG answer is correct. The example that motivates everything else in the course.

· Engineering

Ardan Ultimate AI #03 — Context injection into a prompt

Before RAG and tools, the original way to give an LLM extra information was to inject it into the prompt. The example shows the right way to format injected context and what the LLM does (and doesn't) pay attention to.

· Engineering

Ardan Ultimate AI #02 — LLM-generated embeddings

Hand-crafting vectors stops scaling at about 10 dimensions. LLM-generated embeddings give you a 1024-dim vector that captures semantic meaning. The example shows how to generate them and what they're good for.

· Engineering

OAuth 2.1 + PKCE for a single-page app

PKCE is the load-bearing mitigation against authorization-code interception. The Go implementation is short; the parts every SPA gets wrong are documented here.

· Engineering

WebAuthn passkeys in Go with crypto/ed25519

Passkeys are FIDO2; FIDO2 is the spec; Ed25519 is the signature algorithm. The full registration + assertion flow in 200 lines of stdlib Go.

· Engineering

RFC 8693 token exchange — the nurse Alice scenario

Dual-identity tokens for the agent → MCP server → upstream API chain. Subject stays the user; Actor identifies the agent acting on the user's behalf. Walked through with a worked clinical example.

· Engineering

Brownlow — zero-trust voting on Cloud Run during live AFL broadcasts

100K+ votes, 10K+ concurrent users during a live AFL Brownlow Medal broadcast. The architecture: Go on Cloud Run, GraphQL + gRPC behind a CDN, vote integrity through Cloud KMS + Security Command Center. Notes on what makes a live-broadcast load shape unusual.

· Engineering

Mapping a multi-agent platform to the GCP PCSE blueprint

Every Professional Cloud Security Engineer exam bullet, mapped to a file path in an RBI FREE-AI aligned Go platform. Where the implementation matches, where the analog substitutes, and where the honest gaps are.

· Engineering

Defence in depth for agentic AI — the eleven-layer envelope

The mental model that says no two adjacent layers share a single point of failure for the same class of attack. From TLS to OTel, the eleven layers a customer request crosses before an answer comes back.

· Engineering

AI governance — from credential to codebase

Board policy as a YAML file the risk team owns. Annexure VI as a database query. Every governance recommendation rendered as a file path in a Go repository.

· Engineering

Agentic security in production — the operations playbook

Twelve months of running multi-agent AI in a regulated context. SLIs that matter, the incident runbook, drift detection, continuous adversarial testing, secret rotation, compliance posture as code.

· Engineering

Annexure VI as a query

The RBI FREE-AI incident reporting form, expressed as a Go struct and a Postgres table. Every entry is an auto-generated artefact from the runtime — not a form an operator fills in retrospectively.

· Engineering

Why Go for production agentic AI

Stdlib over libraries, single binary over framework, fail-closed defaults over forgiveness. The boring-on-purpose case for choosing Go to ship a multi-agent system into a regulated environment.

· Engineering

BCP for AI — forced-failure drills

Fallback agents plus a CI step that replaces the primary agent with one that always errors. If the fallback doesn't produce a usable answer, the PR can't merge.

· Engineering

Sovereign AI is a policy, not a slide

Classification → provider allowlist. A pii-classified message can only reach a provider whose region is in the allowlist for pii. Sovereignty as a runtime gate, not a checkbox.

· Engineering

NPCI rail routing with human-in-the-loop

UPI, IMPS, NEFT, RTGS — which rail to use depends on amount, urgency, window, success-rate history. A deterministic chooser with a HITL gate above ₹2 lakh.

· Engineering

Policy as code, without the risk team having to ship code

A tiny CEL-style DSL plus a board-approved YAML file. The risk team adds a governance rule by editing a config file; engineering ships the rule by restarting the service.

· Engineering

Deterministic KYC, the LLM just talks

PAN check-digit validation, Aadhaar offline KYC, DigiLocker, PEP/sanctions — all in Go code, not in a prompt. The LLM's job is to translate the verdict into something a human can read.

· Engineering

Ardan Ultimate AI #30 — PDF extraction with Docling + LLM

PDFs are the format that breaks every RAG pipeline. Docling is the IBM-research extractor that handles layout, tables, and figures. The example wires Docling + LLM to make PDFs usable.

· Engineering

Ardan Ultimate AI #25 — Poisoned-document attacks on RAG and defenses

A RAG pipeline that ingests user-supplied documents is a prompt-injection vector. An attacker uploads a document with hidden instructions; the LLM retrieves it and follows them. Defense: input filtering, content classification, output verification.

· Engineering

Four orchestration patterns in MAF — and when to pick each

Sequential, Concurrent, Handoff, and Custom WorkflowBuilder. Four shapes the Microsoft Agent Framework ships out of the box. Each one is the right answer to a different question.

· Engineering

The 12-chapter reference architecture, in 102 Python files

Microsoft published a 12-chapter reference architecture for multi-agent systems and a separate framework (MAF) to build them. I implemented one on top of the other and learned what each chapter actually demands in code.

· Cloud architects + medical AI engineers

Running a HIPAA-Aware Multi-Agent Medical AI on GKE: A Field Map

Google's GKE AI infrastructure docs list ~40 integrations. Here's a field map of which ones actually matter when the workload is a HIPAA-aware multi-agent medical AI, and where the gaps sit.

· AIGP candidates + AI governance practitioners

Studying for the AIGP? Here's a Reference Implementation in Go

Studying for the IAPP AI Governance Professional credential? Here's an open-source Go codebase that demonstrates ~70% of the body of knowledge in working code.

· Backend engineering + security

PostgreSQL Row-Level Security Is HIPAA Defense in Depth

PostgreSQL row-level security as HIPAA defence in depth. Why fail-open application filtering isn't enough, and how 'append-only at DB GRANTs' carries more of the §164.312(b) burden than people realise.

· Policy + engineering

The 21st Century Cures Act, Expressed in Go

The 21st Century Cures Act §3060 CDS carve-out criterion 4 expressed as a code-level queue, lossless on reject, with audit-recorded reviewer rationale. Build it once, satisfy GDPR Article 22 for free.

· ML engineers, AI medicine

Moving Diagnostic Accuracy 42.9% → 85.7% by Changing Two Files

How a single sprint of specialty-rule work — guided by a benchmark that wasn't afraid to print embarrassing numbers — turned a 'demo respiratory differential' into a five-condition rule-based diagnostic engine.

· Engineering

The 57% number — how we cut the Tata Group BigQuery bill in half

₹100 Cr / ~$12M in proven savings across a year-plus engagement. The four levers that did the heavy lifting, the lever I expected to win that didn't, and the post-engagement playbook that became a Searce managed service.

· Engineering

Optimus — a Gemini-powered BigQuery anti-pattern detector that paid for itself in a week

We built a small Go + Python service that parses a project's INFORMATION_SCHEMA, asks Gemini to classify each top-spending query against a catalog of anti-patterns, and recommends a rewrite. It is not a magic box; it is a pipeline that cuts the human review time per query from 20 minutes to 90 seconds.

· Engineering

The Spanner Migration Tool — a contributor's reading map

Notes from contributing to Google's open-source Spanner Migration Tool (HarbourBridge). Where to start reading the codebase, where the load-bearing logic lives, and the parts that look simple but aren't.

· Engineering

Spanner interleaved tables — when and when not

Interleaving a child table into its parent co-locates the rows for fast joins. It also tightens coupling in ways that bite you on the next schema migration. A practitioner's decision matrix.

· Engineering

CDC for minimal-downtime Spanner migration — Datastream + Pub/Sub + Dataflow

A bulk migration takes hours; the application can't be offline that long. CDC keeps the source and destination in sync while the bulk runs, and a quick cutover swaps traffic. The handoff between bulk and CDC is where most migrations go wrong.

· Engineering

Globe — running a 30K+ TPS transaction platform on Kubernetes

The transaction engine had to absorb 30K+ TPS across partner integrations, never lose a transaction, and survive partial failures. The architecture: Go, Kafka, Pub/Sub, Redis, K8s, with idempotency at every layer.

· Engineering

Five MAF orchestration shapes — adding Group Chat and Magentic

The first ten posts treated MAF as having four orchestration patterns. The official docs say five. Here are the two I missed — Group Chat and Magentic — with the API surface, when to pick each, and the test path that catches them at build time.

· All engineers

Lessons from Converting 18 Agents in 90 Days

The patterns that worked, the traps we fell into, and what we'd do differently.

What worked, what was hard, and what we'd do differently. Real numbers: 18 agents, 90 days, 5 governance policies, 4 provider swaps.

· Engineering

Refactor: from rolled-our-own to MAF-native

I built memory, communication, security, governance, and evals from scratch first. Then I deleted most of it and used the official MAF packages. Here

· Engineering

Ollama as the default LLM for enterprise-shaped systems

PROVIDER=ollama, granite4.1:3b, zero API keys, no Azure account. How to make a multi-agent project that demonstrates enterprise patterns run end-to-end on a laptop in 90 seconds.

· DevOps + platform engineers

Provider Abstraction: From Gemini-Only to Swappable LLMs

How to port ADK's model hard-codes to MAF's provider factory pattern.

Zero-code provider swaps: Ollama (dev), OpenAI (staging), Azure Foundry (prod). Same agents, different models.

· Engineering

Governance with the Agent Governance Toolkit

OWASP Agentic Top 10 coverage with YAML policy files, two API surfaces (one-line wrapper and programmatic evaluator), and a metric bridge that shows policy denials in Grafana.

· Governance + backend engineers

Tool Wrapping: From ADK Functions to MAF Governed Tools

How to port tools, add policy enforcement, and integrate OPA.

From ADK functions to MAF governed tools. Adding policy enforcement, DLP, approval gates, and OPA integration.

· Backend + ML engineers

Token Exchange Patterns: Porting Multi-Turn State from ADK to MAF

How conversation threads replace session state; how to track token usage across agent chains.

Sessions to threads: porting multi-turn state from ADK to MAF. Token budgeting, long-term memory, and conversation audit trails.

· Engineering

A2A: when the workflow IS the broker

The reference architecture distinguishes request-based and message-driven agent communication. For in-process orchestration, MAF

· Platform architects

The Executor Pattern: ADK→MAF Conversion for Agentic Control Flow

How to port ADK's orchestration callbacks to MAF builders without losing control.

How to port ADK's orchestration callbacks to MAF builders without losing control. The executor pattern: you own the loop.

· Engineering

Memory done right with MAF

AgentSession is short-term memory. MemoryContextProvider + MemoryFileStore is long-term memory. Mem0 is long-term memory when you want it hosted. Here

· Software architects + platform engineers

Why We Migrated from Google ADK to Microsoft MARA

The philosophy, trade-offs, and what we learned converting 18+ agents in 3 months.

The philosophy, trade-offs, and what we learned converting 18+ agents in 3 months. Provider abstraction as the foundation for portable agents.