Running a HIPAA-Aware Multi-Agent Medical AI on GKE: A Field Map

Google's "AI/ML orchestration on GKE" documentation lists ~40 integrations. Here's which ones actually matter when the workload is a multi-agent clinical AI platform — and where the gaps sit.

May 26, 2026 · Cloud architects + medical AI engineers

GKEGCPMulti-AgentCloud Architecture

Why I went looking

I’ve been building Bodh, an open-source medical multi-agent AI platform in Go that implements Microsoft’s Multi-Agent Reference Architecture with a HIPAA-aware governance composer, append-only audit log, HITL gate aligned to the 21st Century Cures Act §3060 carve-out, PostgreSQL with tenant row-level security, and an MCP client.

The “deploy this on a real cloud” question keeps coming up — both for my own portfolio and for the kind of conversations recruiters keep starting with me. Google Cloud’s GKE AI infrastructure integrations page lists about 40 distinct integrations across training, inference, storage, observability, and agentic patterns. I mapped every one of them against Bodh’s existing surface to figure out which ones genuinely help, which ones are noise for this workload, and where Bodh would need to grow.

Below is what fell out.

What GKE genuinely solves for a multi-agent medical AI

1. Pod-isolated agent execution — Agent Sandbox

Bodh runs untrusted-ish content all the time: HL7 v2 messages from EHR vendors, FHIR resources from partner hospitals, RAG retrievals from a knowledge corpus, and (most worryingly) tool calls from LLM-backed diagnostic agents that may execute shell-like behaviour in MCP tools.

GKE’s Agent Sandbox runs containerised agent code in Linux User Namespaces with a sidecar controller. This is the right shape for two specific Bodh use cases:

MCP tool execution: When a vendor MCP server exposes shell-adjacent tools (lab feeds, imaging AI, RxNorm lookup), running them in a User-Namespaced sidecar gives you a contained blast radius if the tool is compromised.
LLM-backed diagnosticians that use tool calling: LLMDiagnosticianTools in Bodh today calls internal Go tools. After the MCP client merged, those calls go out over the network — but if you ever want an in-cluster MCP server, the sandbox is what lets you run it without giving it broad node access.

The trade-off: Agent Sandbox needs Autopilot allowlist exemptions because User Namespaces are elevated. Worth it for the AI workload class; might not be worth it for everything else in your cluster.

2. Job queuing across tenants — Kueue

This is the closest off-the-shelf analogue to Bodh’s per-tenant work queues. Kueue gives you:

ClusterQueues that enforce aggregate compute limits across the whole cluster
LocalQueues per tenant namespace
Cohort sharing — Team A can borrow Team B’s idle GPU quota with explicit policy

For a Bodh deployment serving multiple hospitals, the cohort pattern maps directly onto “Hospital A is quiet at 2 AM local; Hospital B is overflowing their on-call quota — let A’s slack flow to B with audit-logged borrow tickets.” Bodh today doesn’t have this — its tasks.Queue is in-process per binary. Kueue makes it cross-cluster.

3. Multi-pod job coordination — JobSet

Bodh’s diagnostic flow is sequential: intake → questioner → test_planner → cost_guardian → diagnostician → reasoning_verifier. Today this flows through an in-memory bus inside one Go binary. If you split agents into separate pods (which you eventually do, for scaling and blast-radius reasons), you need a coordinator that handles pod ordering, retry on failure, and clean teardown.

JobSet is that coordinator. Each diagnostic case becomes a JobSet with one job per agent, ordered by dependsOn. The MARA orchestrator stays in one pod (it’s the controller); the worker agents fan out. The audit log captures every JobSet step.

4. Hyperdisk ML + Cloud Storage FUSE for model weights

Bodh today uses vendor LLMs (Anthropic, OpenAI, Ollama) via API. Zero local model weights. But the moment you switch to self-hosted Gemma, Llama, or a fine-tuned clinical model, model-loading latency becomes the dominant cost on cold start.

Hyperdisk ML mounted as a block volume, pre-populated via Volume Populator from GCS, gives you ~4× model-loading speedup vs GCS Fuse
Cloud Storage FUSE is fine for the RAG corpus (15-doc shipped in Bodh today; production would be larger) — read-heavy, latency-tolerant, no need for block-level performance
Local SSD for diagnostic case checkpoints during long-running multi-round Reflexion or Debate inferences

These three together give Bodh a real story for self-hosted models without writing custom storage glue.

5. Cloud Audit Logs + Managed Prometheus + OpenTelemetry — the audit substrate

Bodh’s audit pipeline (PR #12 wired §164.312(b) end-to-end) emits audit.Event records. Today they land in Postgres audit_events table with SELECT + INSERT only at DB grants. On GKE, that pipe extends naturally:

Cloud Audit Logs captures every Kubernetes API call (pod creation, GPU allocation, secret access) — the infrastructure-layer audit
Application logs (Bodh’s audit.Event records) flow to Cloud Logging via stdout
Managed OpenTelemetry captures the LLMTrace records Bodh already emits (PR #7 added the OTLP exporter behind the otel build tag)
DCGM captures GPU metrics if/when self-hosted models run

The combined picture: every kubectl, every pod schedule, every LLM call, every governance decision, every HITL decision — all queryable in one place. That’s the HIPAA §164.312(b) story end-to-end.

6. Workload Identity + Cloud SQL IAM for the Postgres tenant story

Bodh’s PostgreSQL backend (PR #10) enforces tenant isolation through row-level security with a bodh_app DB role that has NOBYPASSRLS. The app.tenant_id setting is plumbed in via set_config per transaction.

On GKE, this composes cleanly:

Each tenant namespace has its own Kubernetes ServiceAccount
Workload Identity binds that K8s SA to a Google Service Account
The Google SA has an IAM binding to Cloud SQL with cloudsql.client and a tenant-specific DB role
The Cloud SQL Auth Proxy sidecar handles connection authentication; the application connects through localhost

Net effect: a compromised application credential can only authenticate as the workload’s Service Account, which only has Cloud SQL access for the specific tenant, which only has DB access to its own RLS-bounded rows. Three layers of defence stacked.

7. Pod Disruption Budgets for HITL cases that stretch days

Bodh’s HITL gate stores pending reviews in the queue (or Postgres in production). Some reviews — high-priority discharge handoffs, cascading care-plan approvals — may sit in the queue for hours or days waiting for a clinician.

PDBs let you say: “at most 1 pod of this agent group can be voluntarily disrupted at a time.” That’s the difference between Kubernetes opportunistically draining nodes mid-review (dropping the pending review state if it’s only in-memory) and respecting the long tail of clinician availability.

For Bodh, PDBs matter most for human_review agent pods and cmd/care HTTP server pods. The bus and stateless agents can tolerate restarts.

What GKE doesn’t solve for Bodh

The GKE documentation is comprehensive but Vertex-and-self-hosted-model-shaped. Six things it doesn’t address that Bodh either has built-in or has to handle itself:

PHI redaction at the logger seam. Bodh’s pkg/phi.LoggerWrapper redacts Safe Harbor identifiers (SSN, MRN, phone, email, ISO/US date) before they ever reach stdout. GKE has Cloud DLP, which is a different shape — it scans data at rest. The first-line defence at the logger seam is Bodh’s responsibility.
Audit event field selection. GKE gives you Cloud Audit Logs (Kubernetes API actions) and Cloud Logging (application logs). What it doesn’t give you is the schema discipline — “enough to reconstruct, never enough to leak.” Bodh’s audit.Event has 17 fields, and the absence of msg.Content / narrative / diagnosis-text / medication-name is the whole point. That discipline is application-layer.
HITL gate as a §3060 CDS carve-out enforcer. Lossless-on-reject, cascading approvals, SLA-aware queueing. GKE’s Kueue gives you compute scheduling; it doesn’t give you the semantic HITL pattern (review-required → reviewer-decides → pre-approved-payload-republishes-or-drops). That’s agents/human_review.go in Bodh.
Specialty diagnostic rules. GKE serves the runtime for models that make diagnostic recommendations. The rules themselves — diagnostician.inferDiagnosis switching on ACS / sepsis / DKA / CHF / CAP with threshold-based logic — live in Bodh’s pkg/medical and agents. GKE doesn’t have an opinion about this.
MCP wire protocol. GKE’s Agent Development Kit (ADK) does tool calling — implicitly MCP-shaped — but the docs don’t expose MCP as an explicit primitive. Bodh’s pkg/mcp (merged PR #15) is wire-compatible with the MCP standard regardless of deployment target.
Right-to-explanation endpoint. GET /explain/{case_id} (Bodh’s PR #14) returns a structured PHI-free explanation per GDPR Article 22(3) + Article 15. GKE doesn’t have an opinion about this either; it’s an application-layer concern that Bodh treats as load-bearing.

Reference topology — Bodh on GKE Autopilot

For a single-tenant pilot of Bodh deployed on GKE Autopilot in production-aspiring shape:

GKE Autopilot Cluster (regional, private cluster)
├── Namespace: bodh-prod
│   ├── Deployment: bodh-care (3 replicas)
│   │   ├── HTTP server (cmd/care, port 8088)
│   │   ├── In-process MARA bus (orchestrator + 17+ agents)
│   │   └── Sidecar: Cloud SQL Auth Proxy
│   ├── Deployment: bodh-bench (CronJob, daily)
│   ├── Service: bodh-care (Internal Load Balancer)
│   └── Ingress: GKE Gateway with Identity-Aware Proxy
│
├── Cloud SQL for PostgreSQL (regional HA)
│   ├── interaction_records / patient_records / audit_events
│   ├── bodh_app role (NOBYPASSRLS), CMEK-encrypted
│   └── Automated backups + WAL archive to GCS Object Lock
│
├── Cloud KMS
│   ├── CMEK for Cloud SQL
│   ├── Envelope keys for payload_ct BYTEA columns (pending PR)
│   └── Service Account: per-tenant audit + retention key
│
├── Secret Manager
│   ├── ANTHROPIC_API_KEY (provider rotation)
│   ├── DB connection string
│   └── HITL reviewer service tokens
│
├── Cloud Logging + Audit Logs
│   ├── Application logs (Bodh audit.Event records, PHI-redacted)
│   ├── Cloud Audit Logs (K8s API actions)
│   └── Log sink → BigQuery for compliance queries
│   └── Log sink → Cloud Storage with Object Lock for §164.312(b) WORM
│
├── Managed Service for Prometheus + Managed OpenTelemetry
│   ├── LLM call traces (hashed case_id per Bodh PR #7)
│   ├── Per-tenant SLI dashboards
│   └── Alerts: fallback rate, audit write rate, HITL queue depth
│
└── Cloud Storage
    ├── RAG corpus (mounted via GCS Fuse)
    ├── Audit cold storage (Object Lock, 6-year retention)
    └── Bench fixtures (versioned)

For multi-tenant, add: - Kueue with ClusterQueues + LocalQueues per tenant cohort - Namespace-per-tenant with NetworkPolicy default-deny - Workload Identity binding K8s SA → Google SA → Cloud SQL IAM role per tenant - VPC Service Controls perimeter around the data services

Cost: at Autopilot’s billing model, a single-tenant Bodh deployment running 3 replicas of cmd/care with a 2 vCPU / 4 GiB request and a small Cloud SQL backend lands around \$300–600/month before LLM-provider API costs. Multi-tenant deployments amortise the control plane (\$73/month) across tenants.

What I’d change in Bodh after reading the GKE docs

Three concrete TODOs surfaced by the mapping:

Add a docs/gke-deployment-recipe.md capturing the topology above, with kubectl-ready manifests. This is the deployable shape of all the work in docs/deployment.md, but GCP-specific.
Wire Cloud SQL IAM into the Postgres adapter. Today pkg/persistence/postgres uses a password DSN. For GKE deployments, swap that for the IAM database authentication path — short-lived tokens, no static credentials. This is a ~1-day PR.
Ship column-level encryption for payload_ct BYTEA columns. The schema (PR #10) is ready; the encryption layer is the deferred next-PR. Cloud KMS envelope encryption is the right shape — pgcrypto extension + KMS-wrapped DEKs per row. ~3-5 days. Lifts Bodh from “encryption-ready” to “encryption-enforced,” which is what the §164.404 Safe Harbor exclusion actually requires.

These aren’t speculative — they’re the next three PRs queued behind the ones already in flight.

Closing

GKE’s AI infrastructure surface is broad. For a multi-agent medical AI like Bodh, only about a third of it is directly applicable today (Agent Sandbox, Kueue, JobSet, the storage tier, Workload Identity, Cloud Audit Logs, OTel). Another third becomes applicable when self-hosted models enter the picture (Hyperdisk ML, NVIDIA NIM, vLLM, GPU operators). The remaining third is for workloads Bodh doesn’t have today (training, batch inference at scale, multi-tenant TPU orchestration).

The bigger lesson: GKE solves the runtime; the application has to solve the discipline. PHI redaction, audit-event field selection, HITL gate semantics, specialty diagnostic rules, MCP wire compatibility, right-to-explanation — these all live in Bodh, and Kubernetes neither helps nor hinders. Where the platform shines is when the discipline is already there and you need a place to run it that survives load, drains gracefully under disruption, and produces a clean forensic trail.

If you’re building clinical AI on GCP, the GKE AI integrations page is genuinely worth reading end-to-end. If you’re building it on AWS or Azure, the same patterns translate — Bedrock / Fargate / RDS / KMS on AWS; Container Apps / Postgres Flexible / Azure OpenAI / Key Vault on Azure. The platform is interchangeable; the application discipline isn’t.

Bodh is a research and engineering reference. Not a medical device. Not approved for clinical use. The deployment topology above is the architectural target for a HIPAA-aligned production deployment; the codebase is not under a Business Associate Agreement and has not been audited.