The Right-to-Explanation Handler: GDPR Article 22 as a Go HTTP Endpoint
How a 200-line Go handler turns an audit log and an eval store into a regulator-friendly answer to
Posts about go. ← All posts
How a 200-line Go handler turns an audit log and an eval store into a regulator-friendly answer to
Request → N-eyes approve → window-of-time → automatic expiry, with every transition written to a hash-chained audit log. The package that closes Gap #1 from the PCSE map.
Five interfaces hold the whole platform together. The 30-line orchestrator closure that makes the rest of the architecture testable, auditable, and safe to evolve.
PostgreSQL row-level security as HIPAA defence in depth. Why fail-open application filtering isn't enough, and how 'append-only at DB GRANTs' carries more of the §164.312(b) burden than people realise.
The 21st Century Cures Act §3060 CDS carve-out criterion 4 expressed as a code-level queue, lossless on reject, with audit-recorded reviewer rationale. Build it once, satisfy GDPR Article 22 for free.
How a single sprint of specialty-rule work — guided by a benchmark that wasn't afraid to print embarrassing numbers — turned a 'demo respiratory differential' into a five-condition rule-based diagnostic engine.
What HIPAA looks like when you express it as Go interfaces — governance policies, append-only audit at DB GRANTs, PHI redaction at the logger seam, and HITL as the §3060 CDS carve-out criterion 4.
We built a small Go + Python service that parses a project's INFORMATION_SCHEMA, asks Gemini to classify each top-spending query against a catalog of anti-patterns, and recommends a rewrite. It is not a magic box; it is a pipeline that cuts the human review time per query from 20 minutes to 90 seconds.
Notes from contributing to Google's open-source Spanner Migration Tool (HarbourBridge). Where to start reading the codebase, where the load-bearing logic lives, and the parts that look simple but aren't.
Spanner partitions by primary-key range. A monotonically-increasing PK like a timestamp or UUID-v1 funnels all writes to one server. The fix changes everything from your sequence strategy to your tenant model.
The Picnic social platform served 1M+ users across a graph of Go microservices behind a GraphQL gateway. The latency win came from a counter-intuitive move: fewer services, tighter contracts.
Test coverage and observability are the boring infrastructure that makes the interesting changes safe. Notes on how the Picnic team built both, and the on-call experience they enabled.
The transaction engine had to absorb 30K+ TPS across partner integrations, never lose a transaction, and survive partial failures. The architecture: Go, Kafka, Pub/Sub, Redis, K8s, with idempotency at every layer.
A single layer of idempotency will eventually fail. Three independent layers gives you a margin. Here is the pattern that worked across ingest, worker, and emit boundaries.
Status-code-based dispatch made every worker grow a longer and longer switch. Normalising every partner-specific error into an enumerated set let the orchestration logic stop changing as new partners landed.
5K+ loans per month. Three credit bureaus. Multiple payment gateways. The thing that has to be right is the ledger. Notes on what invariants the database enforces vs what the application enforces.
100K+ votes, 10K+ concurrent users during a live AFL Brownlow Medal broadcast. The architecture: Go on Cloud Run, GraphQL + gRPC behind a CDN, vote integrity through Cloud KMS + Security Command Center. Notes on what makes a live-broadcast load shape unusual.
What it actually takes to build a unified cloud API library — and why "write once, run anywhere" still doesn't quite work, even for the patterns where it almost does.
Every Professional Cloud Security Engineer exam bullet, mapped to a file path in an RBI FREE-AI aligned Go platform. Where the implementation matches, where the analog substitutes, and where the honest gaps are.
Stdlib over libraries, single binary over framework, fail-closed defaults over forgiveness. The boring-on-purpose case for choosing Go to ship a multi-agent system into a regulated environment.
Microsoft's Multi-Agent Reference Architecture in Go. Protocol, registry, bus, governance, orchestration, observability, evaluation — and how the seven hold each other up.
The course wrap-up: a Jupyter notebook driven by Go, using GoMLX for tensor ops and GoNB as the kernel. Showed me how to do exploratory Go AI work in the same shape data scientists already use.
A complete chat application: Go backend with RAG, React frontend, single binary. Showed me how to ship a full-stack AI demo without a separate frontend deployment.
Cursor / Claude Code in 600 lines of Go. The agent has read/write/search tools over a project directory and a loop that lets it iterate on its own work.
PDFs are the format that breaks every RAG pipeline. Docling is the IBM-research extractor that handles layout, tables, and figures. The example wires Docling + LLM to make PDFs usable.
Transcribe a video, chunk by timestamp, embed each chunk, RAG-style chat over the result. The shape that powers "ask questions about this meeting recording."
Generate a text description of an image with a vision LLM, embed the description, store in pgvector. Search becomes "find images that match this query" — works surprisingly well.
An agent that can call tools to call tools can drift indefinitely. The escalation budget caps depth and cost; the audit trail records every step so you can replay what the agent did.
An LLM that controls the output can embed malicious HTML, exfiltrate data via crafted links, or inject prompt-stealing payloads. Sanitisation is the defense; the example shows what to allow and what to strip.
A RAG pipeline that ingests user-supplied documents is a prompt-injection vector. An attacker uploads a document with hidden instructions; the LLM retrieves it and follows them. Defense: input filtering, content classification, output verification.
Giving an LLM a `run_command` tool is convenient and terrifying. The hardened version: allow-listed binaries, argument scrubbing, RBAC per user, audit per invocation.
The single biggest LLM security risk. The example walks through both forms (direct from user input, indirect via retrieved content) and the layered defenses: system prompt isolation, content classification, output validation, structured tool schemas.
Most queries are simple. A cascading router tries a small/fast/cheap model first; if confidence is low or the task is hard, it escalates to a larger one. Costs collapse without hurting quality.
Not every question needs retrieval. A classifier gates RAG: chat or general knowledge questions skip it; factual or document-grounded questions trigger it. Saves latency and tokens on the simple half of queries.
Exact-match caching misses paraphrases. "What is the refund policy?" and "How do refunds work?" should both hit the same cached answer. Semantic cache embeds queries and matches by similarity.
Run a small draft model to predict several tokens at once; verify them in a single pass with the large model. Latency drops without quality dropping. The technique production LLM serving uses but most application engineers don't see.
A long chat reprocesses the entire history on every turn. Prefix caching lets the LLM serve the cached KV-cache prefix from the previous turn and only compute the new suffix. Massive latency win on long conversations.
Model Context Protocol standardises tool calling across LLMs. The example builds both sides: an MCP server exposing tools, and an agent that calls them. Works the same against any MCP-compatible LLM.
A panicking tool kills the agent loop. A slow tool blocks the loop forever. The example shows the boring-but-essential wrappers: recover, deadlines, structured errors.
Give an LLM a SQL tool, watch it write delete statements. The read-only version: parse the generated SQL, refuse anything that isn't SELECT, validate against an allow-listed schema, run with a strict timeout.
Stream the agent's reasoning and tool calls to the UI as they happen. The user sees "thinking about X, calling tool Y, got result Z, now answering..." — dramatically better UX than waiting for the final answer.
The smallest possible multi-tool agent. The loop is 30 lines of Go and shows exactly what an "agent" is — there's no magic, just a structured back-and-forth between the LLM and a set of tools until the model says stop.
The tool-calling dance: the LLM emits a structured tool call → application runs the tool → application appends the result → the LLM uses it in the next turn. Two phases. Everything else is detail.
A simple RAG pipeline embeds documents one at a time. The performant version batches the embeddings, parallelises the chunks, and caches the responses. Throughput goes up 5-10×.
Tie all the RAG pieces together into one interactive REPL. Type a question, see the retrieval, see the answer, ask follow-ups. The shape of every "chat with your docs" demo.
When RAG gives wrong answers, the problem is usually retrieval, not the LLM. The example isolates the retrieval step so you can see exactly what chunks come back for a given query, with what scores, and tune K and the similarity threshold accordingly.
Ingest → embed → store → retrieve → answer. The full pipeline applied to Bill Kennedy's Go notebook. The result: a system that answers "how do channels work?" with quotes from the source material.
The ingestion step that turns a corpus into a vector database. Chunk the source, embed each chunk, store with metadata. The pre-work without which RAG is impossible.
pgvector adds vector similarity to Postgres. The example shows the schema, the indexes, the query, and what an ANN index buys you over a brute-force scan.
Side-by-side comparison: ask the LLM a domain question with no context, then ask with retrieved context. The without-RAG answer is plausible nonsense. The with-RAG answer is correct. The example that motivates everything else in the course.
Token-by-token streaming over Server-Sent Events. The Go HTTP handler is short; the UX win is huge. The pattern every chat app needs.
Before RAG and tools, the original way to give an LLM extra information was to inject it into the prompt. The example shows the right way to format injected context and what the LLM does (and doesn't) pay attention to.
Hand-crafting vectors stops scaling at about 10 dimensions. LLM-generated embeddings give you a 1024-dim vector that captures semantic meaning. The example shows how to generate them and what they're good for.
The foundation. Build vectors by hand for a few words, compute cosine similarity, see why "cat" and "dog" come out closer than "cat" and "car." Demystifies everything that comes after.
HS256 JWT issue + verify + audience check + expiry in pure stdlib. Why pulling a third-party JWT library is the wrong call for security-critical code.
Symmetric vs asymmetric JWT signing. The choice changes what fails when a key leaks, who can verify, and how rotation works.
PKCE is the load-bearing mitigation against authorization-code interception. The Go implementation is short; the parts every SPA gets wrong are documented here.
The flow where the device has no browser. User authenticates on their phone; the device polls until they're done. Implementation patterns in Go from the Genie reference.
Passkeys are FIDO2; FIDO2 is the spec; Ed25519 is the signature algorithm. The full registration + assertion flow in 200 lines of stdlib Go.
Dual-identity tokens for the agent → MCP server → upstream API chain. Subject stays the user; Actor identifies the agent acting on the user's behalf. Walked through with a worked clinical example.
Many banks have a SAML IdP they want you to federate against. The verify path is the boring-but-load-bearing piece. Notes on the stdlib-mostly Go implementation.
Two signals do most of the work for detecting compromised sessions: impossible travel between consecutive logins, and credential-stuffing density across an IP range. The Go implementation.
Anthropic's A2A spec standardises how agents talk to other agents (not just tools). The Go client is small; the conceptual shift is what matters.
A saga is fine when every step succeeds. The interesting code is what runs when step 3 of 5 fails and you have to undo 1 and 2 in the right order. The patterns I use.
Postgres over the latest vector DB. Go stdlib over the framework du jour. Single binary over Kubernetes operator. The choices that bore reviewers and delight on-call engineers.
Go's embed.FS bundles files into the binary at compile time. The pattern collapses what would be a multi-artefact deploy into one binary. Three places it pays back daily.
GOMEMLIMIT tells the Go runtime to keep memory below a soft cap by running GC harder when it's close. For containers with hard memory limits, this prevents OOM kills. The setting every Go service in K8s should have.
Go 1.21 added structured logging to the stdlib (slog). For a codebase with three or four logging-library generations layered on top of each other, the migration is a productive afternoon.
An enterprise customer wants you on AWS; the next one wants you on GCP. The provider router pattern that keeps the agent code identical and swaps only the LLM endpoint.
Patterns I confidently recommended five years ago that I'd argue against today. The list of "things you used to do in Go that don't pay back anymore."
Range-over-function landed in Go 1.23. `iter.Seq` lets you write iterators that compose. The patterns that pay back; the ones that don't.
Fan out to N agents; first error cancels the rest; collect successful results. errgroup is the right tool for this; the patterns are concise but worth getting exactly right.
An honest retrospective on the open-source Genie project after a year. The patterns that held up; the ones we rebuilt; the code we deleted because it solved problems we didn't actually have.