Building a HIPAA-Aware Medical AI Platform in Go: An Architecture & Compliance Deep Dive
How Bodh implements Microsoft's Multi-Agent Reference Architecture for sequential diagnosis, and what HIPAA looks like when it's expressed as Go interfaces instead of Confluence pages.
Why this article exists
Two patterns have been on my mind since Microsoft published The Path to Medical Superintelligence in 2025:
- MAI-DxO + the Sequential Diagnosis Benchmark (SD Bench) — a virtual physician panel that asks targeted questions, orders cost-aware tests, and self-verifies before committing to a diagnosis. Reportedly ~85.5% accuracy on 304 NEJM cases versus ~20% for experienced physicians in their study setup.
- The Multi-Agent Reference Architecture (MARA) — Microsoft’s vendor-neutral blueprint for orchestrators, agents, registries, governance, memory, and observability.
The clinical results are the headline. The architecture is the part you can actually build on. So I built Bodh (Sanskrit: insight) — an open-source Go reference implementation of MARA, tuned for medicine, with HIPAA-aware governance baked in.
This article is a knowledge dump in two halves:
- Part 1 — Architecture. The patterns that make a multi-agent clinical system auditable, swappable, and safe to evolve.
- Part 2 — Compliance. How those same patterns map to HIPAA, the 21st Century Cures Act, and adjacent regimes — with code, not bullet points.
Disclaimer up front: Bodh is a research and engineering reference. Not approved for clinical use. Demo agents use simulated answers and rule-based inference. Thresholds are illustrative, not clinical guidelines.
Part 1 — Architecture
The core insight: a monolithic LLM call is not the right shape for clinical work
A single big prompt cannot be:
- Audited — there’s no per-step record of who decided what
- Cost-capped — the cost is whatever the model decides to spend
- Regression-tested — when a guideline changes, you can’t isolate the affected step
- Stopped mid-flow — there’s no place for a clinician to intervene
- Swapped per step — you can’t use a cheap model for triage and a strong model for diagnosis
A multi-agent system, with the right boundary discipline, can do all five.
The MARA pattern, in one sentence
A user (or API call) publishes a clinical message → the orchestrator delivers it to the right agent → the agent returns zero or more new messages → the orchestrator publishes those → governance gates each hop → the workflow continues until a terminal agent decides we’re done.
That’s it. Every clinically interesting behaviour in Bodh is a consequence of that one loop.
Five interfaces hold the whole platform together
Everything in Bodh — 17 agents, the bus, the registry, the governance composer, the HITL gate — composes from these five contracts:
type Agent interface {
ID() string
Name() string
Capabilities() []string
HandleMessage(ctx context.Context, msg Message, env Environment) ([]Message, error)
}
type Bus interface {
Subscribe(agentID string, h Handler) (unsubscribe func())
Publish(ctx context.Context, msg protocol.Message)
}
type Registry interface {
Register(ctx, agent) error
Get(ctx, id) (Agent, error)
List(ctx) []Agent
FindByCapability(ctx, capability) []Agent
}
type Policy interface {
Evaluate(ctx context.Context, msg protocol.Message) (PolicyResult, error)
}
type Environment interface {
Now() time.Time
Logf(format string, args ...any)
}
Five interfaces. Each one has an in-memory implementation today and a clear production swap path (Kafka or NATS for Bus, PostgreSQL for the persistence layer behind Environment, an OIDC-aware policy composer for Policy).
Design rule #1: agents never call each other directly. They emit Messages with a To field. The bus routes. The orchestrator runs governance before any HandleMessage. This single rule is what makes the system testable, auditable, and safe to refactor.
The orchestrator is the only place where governance runs
Here’s the exact subscription closure that wires each agent to the bus. It’s ~30 lines and it’s the most important code in the platform:
func (o *Orchestrator) subscribeAgentLocked(a agent.Agent) {
agentID := a.ID()
o.bus.Subscribe(agentID, func(c context.Context, msg agent.Message) {
// Apply governance *before* the agent sees the message.
if o.policy != nil {
res, err := o.policy.Evaluate(c, msg)
if err != nil {
o.env.Logf("policy error for msg %s: %v", msg.ID, err)
return
}
if res.Decision == governance.DecisionDeny {
o.env.Logf("message %s denied by policy: %s", msg.ID, res.Reason)
return // ← agent never sees the message
}
}
o.env.Logf("agent %s handling message %s from %s", agentID, msg.ID, msg.From)
out, err := a.HandleMessage(c, msg, o.env)
if err != nil {
o.env.Logf("agent %s error: %v", agentID, err)
return
}
for _, m := range out {
o.bus.Publish(c, m) // ← one agent's output is another agent's input
}
})
}
Three properties fall out of this:
- Fail closed. A denied message never reaches an agent. There is no fallback path that bypasses policy.
- Single audit point. Every governance decision and every agent invocation passes through this closure. If you want a HIPAA audit log, you instrument here once, not in 17 agents.
- Pluggability. Want to add multi-tenancy, rate limiting, scope checks, or SUD record segmentation? Add a policy. The orchestrator doesn’t change.
Sequential diagnosis as a state machine, not a prompt
The diagnostic panel is eight agents:
| Agent | Action | Output |
|---|---|---|
intake |
Parse presentation | case_state |
rpm_monitor |
Evaluate remote monitoring alerts | case_state |
diagnostic_supervisor |
Route the case | (delegates) |
questioner |
History + symptoms | questions_complete |
test_planner |
Order tests from a virtual catalog | case_state |
cost_guardian |
Check spent + test_cost <= max_cost |
tests_complete or rework |
diagnostician |
Emit DiagnosisProposal with confidence |
diagnosis_proposal |
reasoning_verifier |
Validate rationale; optionally route to HITL | diagnosis (final) |
A full case for demo-case-001 (58M, cough/fever, gold = CAP) produces this trace:
| Step | From | To | Type | Notes |
|---|---|---|---|---|
| 1 | user |
intake |
presentation |
chief complaint + HPI |
| 2 | intake |
rpm_monitor |
case_state |
no RPM data, passthrough |
| 3 | rpm_monitor |
diagnostic_supervisor |
case_state |
|
| 4 | supervisor |
questioner |
case_state |
start history |
| 5 | questioner |
supervisor |
questions_complete |
4 Q&A, 3 hypotheses |
| 6 | supervisor |
test_planner |
case_state |
order labs/imaging |
| 7 | test_planner |
cost_guardian |
case_state |
total $220 |
| 8 | cost_guardian |
supervisor |
tests_complete |
budget OK ($220 / $500) |
| 9 | supervisor |
diagnostician |
case_state |
infer |
| 10 | diagnostician |
reasoning_verifier |
diagnosis_proposal |
CAP @ 87% |
| 11 | reasoning_verifier |
supervisor |
diagnosis |
verified |
| 12 | supervisor |
— | (terminal) | logs final + gold match |
Twelve messages. Twelve audit events. Twelve governance evaluations. No hidden state, no inter-agent function calls, no place where a clinician can’t intervene.
Every test has a virtual USD cost ($35 to $850). The Pareto frontier of accuracy vs. spend is the right way to evaluate sequential-diagnosis systems — Microsoft’s blog cites up to 25% of US health spending as low-value or wasted, so the cost dimension isn’t hypothetical.
Care services share the same bus
Nine more agents add value-based care workflows on the same platform:
- CDM (Chronic Disease Management) — care plan templates for HTN, DM, CHF, CKD, COPD
- TCM (Transitional Care Management) — discharge → nurse task queue + medication reconciliation
- Panel Management — gap-in-care detection across the patient roster (A1c overdue, BP uncontrolled, annual visit due, CHF weight alert, post-discharge follow-up open)
- Virtual Office — pre-visit prep, inbox triage, refill management, prior auth
- Patient engagement — SMS / email / IVR notifier with adherence tracking
- Readmission tracking — ADT feed + transparent LACE-inspired risk score
These aren’t bolted on. They subscribe to the same bus, share the same PatientRegistry and tasks.Queue, pass through the same governance and audit. Adding a new clinical workflow is “implement Agent, register it, optionally compose a new policy.”
Human-in-the-loop: the 21st Century Cures Act §3060, as code
Under Section 3060 of the Cures Act (21 USC §360j(o)), Clinical Decision Support software can avoid Software-as-a-Medical-Device classification if it meets four criteria. The hardest one — and the one most consequential for AI — is criterion four: the software must enable the clinician to independently review the basis for the recommendations.
Bodh’s human_review agent is that criterion expressed in code.
When HITL=true on an upstream agent, that agent doesn’t address its output directly to the next stage. It wraps it in a review envelope and addresses it to human_review:
| Envelope key | Purpose |
|---|---|
hitl_kind |
Review category (diagnosis_review, care_plan_review, etc.) |
hitl_next_to |
Agent to forward to on approval |
hitl_next_type |
Message type used on resume |
hitl_priority |
low / normal / high / urgent |
hitl_subject |
Short label for the queue UI |
hitl_sla_minutes |
Deadline; expired reviews never resume |
The gate records a ReviewRequest, emits nothing, and waits. A clinician decides via POST /reviews/{id}/decide with approve | approve_with_modifications | reject plus an optional rationale. On approve, the original payload is republished to its intended recipient. On reject, it’s logged and dropped — lossless on reject. No half-completed clinical action can leak past the gate.
Seven agents are gated when HITL is on:
| Agent | Gated output | Review kind |
|---|---|---|
reasoning_verifier |
Verified diagnosis | diagnosis_review |
cdm_planner |
Care plan | care_plan_review |
tcm_coordinator |
TCM welcome message | tcm_welcome_review |
panel_manager |
Per-gap outreach | outreach_review |
refill_manager |
Refill response | refill_review |
prior_auth |
PA status | prior_auth_review |
readmission_tracker |
Discharge handoff (priority scales with risk band) | discharge_review |
Approvals cascade. A discharge_review approval triggers TCM, which itself emits a tcm_welcome_review. The cmd/caredemo auto-approver drains the queue across rounds to demonstrate this; in production a clinician’s UI shows cascaded reviews as they appear.
The fallback contract for LLM agents
Every LLM-backed agent in Bodh has a deterministic rule-based fallback. If the model times out, returns malformed JSON, exceeds latency budget, or trips a safety check, the case still finalizes.
// LLMDiagnostician wraps an llm.Client and a rule-based fallback.
// On any per-call failure, the case still finalises via the fallback.
type LLMDiagnostician struct {
Client llm.Client
Fallback *DiagnosticianAgent
MinConfidence float64 // below this, fall back
}
This isn’t a workaround. It’s the contract. Failure modes degrade — they don’t break. Provider matrix:
| Provider | Chat | Embeddings | Vision | Key env var |
|---|---|---|---|---|
| Anthropic | yes | (via Voyage) | yes (Sonnet, Opus) | ANTHROPIC_API_KEY |
| OpenAI | yes | yes | yes (gpt-4o) |
OPENAI_API_KEY |
| Ollama (local) | yes | yes | yes (llama3.2-vision, bakllava) |
(none) |
| Voyage | — | yes | — | VOYAGE_API_KEY |
Prompt caching is on by default for Anthropic and OpenAI — the diagnostic system prompt is marked cacheable so repeated calls within the TTL pay only for incremental tokens. The [llm-trace] log line reports cached_tokens per call so you can see cache hit rates in production.
RAG done with care: vector + BM25
A pure vector retriever loses on rare clinical terms — drug names, ICD codes, acronyms like BNP or LACE. So Bodh’s HybridStore blends vector cosine with Okapi BM25 lexical scoring using reciprocal-rank fusion (k=60). Useful when “BNP > 400” needs to win on the lexical match even if the vector similarity is lukewarm.
Citations are first-class — every RAG-backed DiagnosisProposal carries a Citations[] array with source, title, snippet, and score. The hallucination check in pkg/safety then asserts that at least half of the significant terms in the diagnosis rationale appear in those citations’ snippets — if grounding fails, the agent falls back to deterministic rules.
Part 2 — Compliance, expressed as code
Most HIPAA work I’ve seen is documentation. Policies in Confluence. Risk assessments in spreadsheets. SOC 2 evidence in a folder. The actual code is often surprisingly silent on PHI.
The interesting question is: what does HIPAA look like when you express it as Go interfaces?
The eight PHI handling principles, baked into the architecture
- Minimum necessary — agents see only the fields they need for their task.
- Opaque identifiers —
case_idandpatient_idare routing tokens, never MRNs or SSNs. - Fail closed — governance denies messages missing correlation IDs; a denied message never reaches an agent.
- Auditable everything — every interaction is a
Messagewith a UUID; every governance decision is logged with a reason. - No PHI in third-party telemetry — redact
ContentandMetadatabefore any OTel or log export. - De-identify for evaluation — CI uses synthetic data only;
gold_diagnosisis never production PHI. - Encrypted at rest and in transit — TLS 1.3 + KMS-backed AES-256-GCM in production.
- Right to delete — the persistence layer must support point-deletion by
patient_id(GDPR Article 17, several US state laws).
Three of those (#3, #4, #6) are enforceable at the platform layer. The rest are architectural constraints the code is designed to support.
§164.312 Technical Safeguards — code-level crosswalk
| Standard | Bodh code |
|---|---|
| Audit Controls (R) | pkg/audit — append-only Event records for every message, governance decision, HITL review, and agent error |
| Access Control (R) — unique user ID | Planned — OIDC sub becomes the user ID, persisted in audit |
| Integrity (R) | Append-only audit; periodic Merkle chain-hash for tamper evidence |
| Person/Entity Auth (R) | Planned — OIDC / SMART on FHIR / mTLS |
| Transmission Security (R) | Planned — TLS 1.3 end-to-end |
Below is what’s actually shipping today, with code.
Governance policies = enforceable contracts at the boundary
These run before any agent’s HandleMessage is invoked. Here’s RequireCaseIDPolicy in full:
type RequireCaseIDPolicy struct{}
func (RequireCaseIDPolicy) Evaluate(_ context.Context, msg protocol.Message) (PolicyResult, error) {
clinicalTypes := map[string]bool{
"presentation": true, "case_state": true, "ask_question": true,
"order_test": true, "diagnosis": true, "verify_reasoning": true,
}
if !clinicalTypes[msg.Type] {
return allow("non-clinical message"), nil
}
if msg.Metadata == nil {
return deny("missing metadata with case_id"), nil
}
if _, ok := msg.Metadata["case_id"].(string); !ok {
return deny("case_id required for clinical messages"), nil
}
return allow("case_id present"), nil
}
Thirteen lines. It’s the simplest possible expression of “fail closed on missing correlation.” A clinical message without a case_id is silently dropped by the orchestrator — and the denial itself is recorded as a policy_decision audit event.
MaxDiagnosticCostPolicy is the same shape:
func (MaxDiagnosticCostPolicy) Evaluate(_ context.Context, msg protocol.Message) (PolicyResult, error) {
if msg.Type != "order_test" {
return allow("not a test order"), nil
}
maxCost, _ := msg.Metadata["max_cost_usd"].(float64)
testCost, _ := msg.Metadata["test_cost_usd"].(float64)
spent, _ := msg.Metadata["spent_usd"].(float64)
if spent+testCost > maxCost {
return deny("test would exceed diagnostic budget"), nil
}
return allow("within diagnostic budget"), nil
}
The composer ties them together:
policy := governance.NewComposite(
governance.MaxContentLengthPolicy{Max: 65536}, // DoS / prompt-bomb cap
governance.RequireCaseIDPolicy{}, // correlation enforced
governance.MaxDiagnosticCostPolicy{}, // per-case budget
governance.ResearchDisclaimerPolicy{}, // intent marker
)
CompositePolicy denies if any child denies (fail fast). Adding multi-tenancy, SMART scopes, rate limiting, or 42 CFR Part 2 SUD segmentation is “implement a Policy, add it to the composer.” The orchestrator doesn’t change. The agents don’t change.
The planned set:
| Policy | Purpose |
|---|---|
RequireTenantPolicy |
Multi-tenant isolation — tenant_id must be present so downstream stores (PostgreSQL row-level security) can enforce |
RequireScopePolicy{Required} |
SMART-on-FHIR scope check — e.g. cdm_request requires user/CarePlan.write |
RedactNarrativePolicy |
Strip clinical narrative from log-bound copies |
Part2ConsentPolicy |
42 CFR Part 2 substance-use records — demand explicit consent token |
BreakGlassPolicy |
Audited emergency cross-patient reads |
RateLimitPolicy |
Per-tenant / per-clinician QPS cap |
RequireScopePolicy already ships:
RequireScopePolicy{Required: map[string]string{
"cdm_request": "user/CarePlan.write",
"discharge_event": "user/Encounter.read",
}}
PHI redaction at the logger seam
pkg/phi ships a LoggerWrapper that sits in front of the standard logger and runs Safe Harbor patterns over every format string and string-typed argument before it reaches stdout, Loki, or any third-party sink. The default pattern set:
func DefaultPatterns() []Pattern {
return []Pattern{
{Name: "ssn", Re: regexp.MustCompile(`\b\d{3}-\d{2}-\d{4}\b`), Replace: "[REDACTED-SSN]"},
{Name: "mrn", Re: regexp.MustCompile(`(?i)\bMRN[:\s]*[A-Z0-9-]{4,}\b`), Replace: "[REDACTED-MRN]"},
{Name: "phone", Re: regexp.MustCompile(`\b\d{3}[-.\s]\d{3}[-.\s]\d{4}\b`), Replace: "[REDACTED-PHONE]"},
{Name: "email", Re: regexp.MustCompile(`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b`), Replace: "[REDACTED-EMAIL]"},
{Name: "iso_date", Re: regexp.MustCompile(`\b(19|20)\d{2}-\d{2}-\d{2}\b`), Replace: "[REDACTED-DATE]"},
{Name: "us_date", Re: regexp.MustCompile(`\b\d{1,2}/\d{1,2}/(19|20)\d{2}\b`), Replace: "[REDACTED-DATE]"},
}
}
Deliberately a “first line” — not a Data Loss Prevention service replacement. In regulated production, layer it in front of Google Cloud DLP, AWS Macie, or Microsoft Presidio with custom patterns for the field shapes you actually see (internal MRN formats, payer claim IDs, etc.).
The wrapper composes with anything that satisfies observability.Logger:
inner := observability.NewStdLogger()
logger := &phi.LoggerWrapper{
Redactor: phi.NewRedactor(),
Inner: inner,
}
// All Logf calls are now redacted before they reach the inner logger.
Audit events — “enough to reconstruct, never enough to leak”
pkg/audit.Event is what gets persisted. Field selection is deliberate:
type Event struct {
ID string `json:"id"`
At time.Time `json:"at"`
Kind Kind `json:"kind"` // message | policy_decision | review_decision | agent_error
AgentID string `json:"agent_id,omitempty"`
MessageID string `json:"message_id,omitempty"`
From string `json:"from,omitempty"`
To string `json:"to,omitempty"`
Type string `json:"type,omitempty"`
TenantID string `json:"tenant_id,omitempty"` // correlation, not PHI
PatientID string `json:"patient_id,omitempty"` // opaque ID, not MRN
CaseID string `json:"case_id,omitempty"` // opaque ID
TraceID string `json:"trace_id,omitempty"`
Decision string `json:"decision,omitempty"` // allow | deny | approve | reject
Reason string `json:"reason,omitempty"`
ReviewerID string `json:"reviewer_id,omitempty"`
ErrorText string `json:"error,omitempty"`
}
What’s deliberately absent: no narrative, no diagnosis text, no patient name, no DOB, no test result content. The event is a pointer into the system — enough to reconstruct who-did-what-when, never enough to leak PHI if the audit sink is ever compromised. The regulated data lives behind it under access control.
Today’s implementation is a thread-safe in-memory recorder plus a file-backed FileStore that appends one JSON line per event. Production swaps this for a WORM sink: S3 Object Lock, Azure Immutable Blob, or Loki + retention lock. Add Merkle chain-hashing for tamper evidence and you have what HIPAA §164.312(b) and 21 CFR Part 11 both expect.
HIPAA Privacy Rule — the operator side and the BAA chain
If a covered entity deploys Bodh, the operator becomes a business associate under 45 CFR §164.504(e). The operator must sign BAAs with both the covered entity and every PHI-touching subcontractor:
| Subcontractor type | What you need |
|---|---|
| Cloud provider (AWS / Azure / GCP) | BAA — only use services on the HIPAA-eligible list |
| SMS / email (Twilio, SendGrid) | BAA — engagement notifier must respect opt-out and never sell PHI |
| Observability (Datadog, New Relic) | BAA before piping logs or traces with PHI shapes |
| LLM provider | OpenAI: Zero Data Retention agreement. Azure OpenAI: standard Azure BAA. Anthropic: BAA available via API on request. |
The minimum-necessary principle propagates down this chain. If your engagement notifier sends “Reminder: lab on 2026-05-30 at 9:00 AM” via Twilio, that string transits a third party — and it had better be under a BAA with no analytics retention.
HIPAA Breach Notification — the Safe Harbor exclusion is why encryption isn’t optional
| Audience | Deadline | Source |
|---|---|---|
| Affected individuals | 60 days from discovery | §164.404 |
| HHS Secretary | 60 days (≥500 affected) or annually (<500) | §164.408 |
| Prominent media (≥500 in a state) | 60 days | §164.406 |
| Business associate → covered entity | “without unreasonable delay” and ≤60 days | §164.410 |
Safe Harbor exclusion: a breach of properly encrypted data per HHS Guidance (FIPS 140-2/3 cryptographic modules, NIST-approved algorithms) does not count as a breach. This is the practical reason TLS 1.3 in transit + KMS-backed AES-256-GCM at rest isn’t optional — encryption is what makes the difference between an incident response and a regulator-facing breach notification.
Bodh’s audit log is the forensic record that lets the operator determine breach scope: what PHI was touched, whose, by which users, during which window. If your audit log can’t answer those three questions in a 60-day window, you don’t have a breach response — you have a guess.
Adjacent regimes — what bites if you only think HIPAA
- HITECH + Information Blocking Rule — your FHIR adapter must not artificially obstruct legitimate access requests. Eight named exceptions (Preventing Harm, Privacy, Security, Infeasibility, Health IT Performance, Content & Manner, Fees, Licensing).
- 42 CFR Part 2 — substance-use treatment records require explicit consent for many disclosures HIPAA allows. Bodh’s planned
Part2ConsentPolicyis the policy-shaped hook for this. - State laws — California CMIA (private right of action), NY SHIELD, Texas HB 300, Illinois BIPA (biometrics), Washington My Health My Data Act (consumer health data outside HIPAA), CCPA/CPRA.
- GDPR Article 9 — special-category data. Article 22’s “right not to be subject to a decision based solely on automated processing” is exactly the use case Bodh’s verifier + HITL pattern is designed for. Article 35 requires a DPIA for AI-based clinical decision support.
- NIST AI RMF (AI 600-1) — the generative-AI profile is required if Bodh ever runs LLM-backed agents in production. The audit log + per-call
LLMTracerecords are designed to satisfy it.
De-identification — two paths under §164.514
Either Safe Harbor (strip the 18 named identifiers — names, geography smaller than state, full dates, SSN, MRN, IPs, biometrics, photos) or Expert Determination (a qualified statistician certifies very small re-identification risk). Re-identification risk rises with:
- High-resolution dates (use year only)
- Rare diagnoses combined with demographics
- ZIP5 vs ZIP3
- Free-text narrative (clinical notes often contain names, addresses, employer)
In Bodh, the rule is: de-identification runs before records leave the regulated boundary. Not after, not during export, not “in the warehouse” — at the boundary.
What ships today vs. what’s planned
| Status | Components |
|---|---|
| Done | MARA platform · 17 agents · sequential diagnosis demo · care services demo · HTTP API · operator CLI · HITL gate · PHI redaction · append-only audit (in-memory + file-backed) · multi-tenant + scope policies · cost-rework loop · SD-Bench runner · LLM stack (Anthropic / OpenAI / Ollama) · hybrid RAG (vector + BM25) · LLM-as-judge · safety guardrails · OTel-ready tracing · FHIR R4 (Patient / Observation / Encounter / Condition) · cohort analytics |
| Planned | HL7 v2 · SMART on FHIR auth · FHIR Bulk Data export · NEJM case loader · PostgreSQL persistence · case timeline UI · immutable audit pipeline (S3 Object Lock / Azure Immutable Blob with Merkle chain-hash) · 21 CFR Part 11 e-signature · ONC Certification artifacts |
Five takeaways
-
HIPAA is an architecture problem, not a paperwork problem. Encryption, audit, minimum necessary, breach forensics, and the right to delete all show up as concrete interfaces. If the architecture doesn’t express them, no amount of policy documentation will.
-
The orchestrator must stay thin. Clinical logic belongs in agents. Governance belongs in policies. Routing belongs in the bus. Discovery belongs in the registry. Mix any of these and the system gets harder to test, harder to audit, and harder to evolve.
-
Human-in-the-loop is the regulatory hinge. The 21st Century Cures Act §3060 CDS carve-out and GDPR Article 22 both turn on whether a clinician can independently review each recommendation. That’s not a UI feature — it’s an architectural commitment that touches every gated agent, every audit event, and every review queue.
-
The fallback is the feature. Every LLM-backed agent has a deterministic rule-based fallback. The case always finalizes. Production-grade clinical AI is fail-soft, not fail-hard.
-
Compliance composes.
governance.NewComposite(...)is the operational shape of HIPAA / GDPR / 42 CFR Part 2 / SMART scopes / rate limiting / tenancy. New regulation? New policy. The orchestrator doesn’t change.
Try it
git clone https://github.com/PratikDhanave/bodh.git
cd bodh
go run ./cmd/demo # Sequential diagnosis (rule-based)
go run ./cmd/caredemo # Care services with HITL on
go run ./cmd/care -addr=:8088 # HTTP API
go run ./cmd/bench # SD-Bench runner
# LLM-backed diagnostician
export ANTHROPIC_API_KEY=sk-...
go run ./cmd/demo -diagnostician=llm-anthropic
# Side-by-side provider compare
go run ./cmd/bench -providers=rule,llm-anthropic,llm-openai,llm-ollama
Repo: github.com/PratikDhanave/bodh
If you’re building clinical AI, working on HIPAA-aligned infrastructure, or thinking about how Microsoft’s MAI-DxO / SD Bench ideas translate to production-ready systems — issues, PRs, and conversations are welcome.
Bodh is a research and engineering reference. It is not a medical device, not approved for clinical use, and not under a Business Associate Agreement. The HIPAA architecture described above is the target for a production deployment — the codebase is not certified or audited.