Building a HIPAA-Aware Medical AI Platform in Go: An Architecture & Compliance Deep Dive

How Bodh implements Microsoft's Multi-Agent Reference Architecture for sequential diagnosis, and what HIPAA looks like when it's expressed as Go interfaces instead of Confluence pages.

May 19, 2026 · Compliance + engineering

HIPAAComplianceGoPrivacy Engineering

Why this article exists

Two patterns have been on my mind since Microsoft published The Path to Medical Superintelligence in 2025:

MAI-DxO + the Sequential Diagnosis Benchmark (SD Bench) — a virtual physician panel that asks targeted questions, orders cost-aware tests, and self-verifies before committing to a diagnosis. Reportedly ~85.5% accuracy on 304 NEJM cases versus ~20% for experienced physicians in their study setup.
The Multi-Agent Reference Architecture (MARA) — Microsoft’s vendor-neutral blueprint for orchestrators, agents, registries, governance, memory, and observability.

The clinical results are the headline. The architecture is the part you can actually build on. So I built Bodh (Sanskrit: insight) — an open-source Go reference implementation of MARA, tuned for medicine, with HIPAA-aware governance baked in.

This article is a knowledge dump in two halves:

Part 1 — Architecture. The patterns that make a multi-agent clinical system auditable, swappable, and safe to evolve.
Part 2 — Compliance. How those same patterns map to HIPAA, the 21st Century Cures Act, and adjacent regimes — with code, not bullet points.

Disclaimer up front: Bodh is a research and engineering reference. Not approved for clinical use. Demo agents use simulated answers and rule-based inference. Thresholds are illustrative, not clinical guidelines.

Part 1 — Architecture

The core insight: a monolithic LLM call is not the right shape for clinical work

A single big prompt cannot be:

Audited — there’s no per-step record of who decided what
Cost-capped — the cost is whatever the model decides to spend
Regression-tested — when a guideline changes, you can’t isolate the affected step
Stopped mid-flow — there’s no place for a clinician to intervene
Swapped per step — you can’t use a cheap model for triage and a strong model for diagnosis

A multi-agent system, with the right boundary discipline, can do all five.

The MARA pattern, in one sentence

A user (or API call) publishes a clinical message → the orchestrator delivers it to the right agent → the agent returns zero or more new messages → the orchestrator publishes those → governance gates each hop → the workflow continues until a terminal agent decides we’re done.

That’s it. Every clinically interesting behaviour in Bodh is a consequence of that one loop.

Five interfaces hold the whole platform together

Everything in Bodh — 17 agents, the bus, the registry, the governance composer, the HITL gate — composes from these five contracts:

type Agent interface {
    ID() string
    Name() string
    Capabilities() []string
    HandleMessage(ctx context.Context, msg Message, env Environment) ([]Message, error)
}

type Bus interface {
    Subscribe(agentID string, h Handler) (unsubscribe func())
    Publish(ctx context.Context, msg protocol.Message)
}

type Registry interface {
    Register(ctx, agent) error
    Get(ctx, id) (Agent, error)
    List(ctx) []Agent
    FindByCapability(ctx, capability) []Agent
}

type Policy interface {
    Evaluate(ctx context.Context, msg protocol.Message) (PolicyResult, error)
}

type Environment interface {
    Now() time.Time
    Logf(format string, args ...any)
}

Five interfaces. Each one has an in-memory implementation today and a clear production swap path (Kafka or NATS for Bus, PostgreSQL for the persistence layer behind Environment, an OIDC-aware policy composer for Policy).

Design rule #1: agents never call each other directly. They emit Messages with a To field. The bus routes. The orchestrator runs governance before any HandleMessage. This single rule is what makes the system testable, auditable, and safe to refactor.

The orchestrator is the only place where governance runs

Here’s the exact subscription closure that wires each agent to the bus. It’s ~30 lines and it’s the most important code in the platform:

func (o *Orchestrator) subscribeAgentLocked(a agent.Agent) {
    agentID := a.ID()
    o.bus.Subscribe(agentID, func(c context.Context, msg agent.Message) {
        // Apply governance *before* the agent sees the message.
        if o.policy != nil {
            res, err := o.policy.Evaluate(c, msg)
            if err != nil {
                o.env.Logf("policy error for msg %s: %v", msg.ID, err)
                return
            }
            if res.Decision == governance.DecisionDeny {
                o.env.Logf("message %s denied by policy: %s", msg.ID, res.Reason)
                return  // ← agent never sees the message
            }
        }

        o.env.Logf("agent %s handling message %s from %s", agentID, msg.ID, msg.From)

        out, err := a.HandleMessage(c, msg, o.env)
        if err != nil {
            o.env.Logf("agent %s error: %v", agentID, err)
            return
        }
        for _, m := range out {
            o.bus.Publish(c, m)  // ← one agent's output is another agent's input
        }
    })
}

Three properties fall out of this:

Fail closed. A denied message never reaches an agent. There is no fallback path that bypasses policy.
Single audit point. Every governance decision and every agent invocation passes through this closure. If you want a HIPAA audit log, you instrument here once, not in 17 agents.
Pluggability. Want to add multi-tenancy, rate limiting, scope checks, or SUD record segmentation? Add a policy. The orchestrator doesn’t change.

Sequential diagnosis as a state machine, not a prompt

The diagnostic panel is eight agents:

Agent	Action	Output
`intake`	Parse presentation	`case_state`
`rpm_monitor`	Evaluate remote monitoring alerts	`case_state`
`diagnostic_supervisor`	Route the case	(delegates)
`questioner`	History + symptoms	`questions_complete`
`test_planner`	Order tests from a virtual catalog	`case_state`
`cost_guardian`	Check `spent + test_cost <= max_cost`	`tests_complete` or rework
`diagnostician`	Emit `DiagnosisProposal` with confidence	`diagnosis_proposal`
`reasoning_verifier`	Validate rationale; optionally route to HITL	`diagnosis` (final)

A full case for demo-case-001 (58M, cough/fever, gold = CAP) produces this trace:

Step	From	To	Type	Notes
1	`user`	`intake`	`presentation`	chief complaint + HPI
2	`intake`	`rpm_monitor`	`case_state`	no RPM data, passthrough
3	`rpm_monitor`	`diagnostic_supervisor`	`case_state`
4	`supervisor`	`questioner`	`case_state`	start history
5	`questioner`	`supervisor`	`questions_complete`	4 Q&A, 3 hypotheses
6	`supervisor`	`test_planner`	`case_state`	order labs/imaging
7	`test_planner`	`cost_guardian`	`case_state`	total $220
8	`cost_guardian`	`supervisor`	`tests_complete`	budget OK ($220 / $500)
9	`supervisor`	`diagnostician`	`case_state`	infer
10	`diagnostician`	`reasoning_verifier`	`diagnosis_proposal`	CAP @ 87%
11	`reasoning_verifier`	`supervisor`	`diagnosis`	verified
12	`supervisor`	—	(terminal)	logs final + gold match

Twelve messages. Twelve audit events. Twelve governance evaluations. No hidden state, no inter-agent function calls, no place where a clinician can’t intervene.

Every test has a virtual USD cost ($35 to $850). The Pareto frontier of accuracy vs. spend is the right way to evaluate sequential-diagnosis systems — Microsoft’s blog cites up to 25% of US health spending as low-value or wasted, so the cost dimension isn’t hypothetical.

Nine more agents add value-based care workflows on the same platform:

CDM (Chronic Disease Management) — care plan templates for HTN, DM, CHF, CKD, COPD
TCM (Transitional Care Management) — discharge → nurse task queue + medication reconciliation
Panel Management — gap-in-care detection across the patient roster (A1c overdue, BP uncontrolled, annual visit due, CHF weight alert, post-discharge follow-up open)
Virtual Office — pre-visit prep, inbox triage, refill management, prior auth
Patient engagement — SMS / email / IVR notifier with adherence tracking
Readmission tracking — ADT feed + transparent LACE-inspired risk score

These aren’t bolted on. They subscribe to the same bus, share the same PatientRegistry and tasks.Queue, pass through the same governance and audit. Adding a new clinical workflow is “implement Agent, register it, optionally compose a new policy.”

Human-in-the-loop: the 21st Century Cures Act §3060, as code

Under Section 3060 of the Cures Act (21 USC §360j(o)), Clinical Decision Support software can avoid Software-as-a-Medical-Device classification if it meets four criteria. The hardest one — and the one most consequential for AI — is criterion four: the software must enable the clinician to independently review the basis for the recommendations.

Bodh’s human_review agent is that criterion expressed in code.

When HITL=true on an upstream agent, that agent doesn’t address its output directly to the next stage. It wraps it in a review envelope and addresses it to human_review:

Envelope key	Purpose
`hitl_kind`	Review category (`diagnosis_review`, `care_plan_review`, etc.)
`hitl_next_to`	Agent to forward to on approval
`hitl_next_type`	Message type used on resume
`hitl_priority`	`low` / `normal` / `high` / `urgent`
`hitl_subject`	Short label for the queue UI
`hitl_sla_minutes`	Deadline; expired reviews never resume

The gate records a ReviewRequest, emits nothing, and waits. A clinician decides via POST /reviews/{id}/decide with approve | approve_with_modifications | reject plus an optional rationale. On approve, the original payload is republished to its intended recipient. On reject, it’s logged and dropped — lossless on reject. No half-completed clinical action can leak past the gate.

Seven agents are gated when HITL is on:

Agent	Gated output	Review kind
`reasoning_verifier`	Verified diagnosis	`diagnosis_review`
`cdm_planner`	Care plan	`care_plan_review`
`tcm_coordinator`	TCM welcome message	`tcm_welcome_review`
`panel_manager`	Per-gap outreach	`outreach_review`
`refill_manager`	Refill response	`refill_review`
`prior_auth`	PA status	`prior_auth_review`
`readmission_tracker`	Discharge handoff (priority scales with risk band)	`discharge_review`

Approvals cascade. A discharge_review approval triggers TCM, which itself emits a tcm_welcome_review. The cmd/caredemo auto-approver drains the queue across rounds to demonstrate this; in production a clinician’s UI shows cascaded reviews as they appear.

The fallback contract for LLM agents

Every LLM-backed agent in Bodh has a deterministic rule-based fallback. If the model times out, returns malformed JSON, exceeds latency budget, or trips a safety check, the case still finalizes.

// LLMDiagnostician wraps an llm.Client and a rule-based fallback.
// On any per-call failure, the case still finalises via the fallback.
type LLMDiagnostician struct {
    Client        llm.Client
    Fallback      *DiagnosticianAgent
    MinConfidence float64   // below this, fall back
}

This isn’t a workaround. It’s the contract. Failure modes degrade — they don’t break. Provider matrix:

Provider	Chat	Embeddings	Vision	Key env var
Anthropic	yes	(via Voyage)	yes (Sonnet, Opus)	`ANTHROPIC_API_KEY`
OpenAI	yes	yes	yes (`gpt-4o`)	`OPENAI_API_KEY`
Ollama (local)	yes	yes	yes (`llama3.2-vision`, `bakllava`)	(none)
Voyage	—	yes	—	`VOYAGE_API_KEY`

Prompt caching is on by default for Anthropic and OpenAI — the diagnostic system prompt is marked cacheable so repeated calls within the TTL pay only for incremental tokens. The [llm-trace] log line reports cached_tokens per call so you can see cache hit rates in production.

RAG done with care: vector + BM25

A pure vector retriever loses on rare clinical terms — drug names, ICD codes, acronyms like BNP or LACE. So Bodh’s HybridStore blends vector cosine with Okapi BM25 lexical scoring using reciprocal-rank fusion (k=60). Useful when “BNP > 400” needs to win on the lexical match even if the vector similarity is lukewarm.

Citations are first-class — every RAG-backed DiagnosisProposal carries a Citations[] array with source, title, snippet, and score. The hallucination check in pkg/safety then asserts that at least half of the significant terms in the diagnosis rationale appear in those citations’ snippets — if grounding fails, the agent falls back to deterministic rules.

Part 2 — Compliance, expressed as code

Most HIPAA work I’ve seen is documentation. Policies in Confluence. Risk assessments in spreadsheets. SOC 2 evidence in a folder. The actual code is often surprisingly silent on PHI.

The interesting question is: what does HIPAA look like when you express it as Go interfaces?

The eight PHI handling principles, baked into the architecture

Minimum necessary — agents see only the fields they need for their task.
Opaque identifiers — case_id and patient_id are routing tokens, never MRNs or SSNs.
Fail closed — governance denies messages missing correlation IDs; a denied message never reaches an agent.
Auditable everything — every interaction is a Message with a UUID; every governance decision is logged with a reason.
No PHI in third-party telemetry — redact Content and Metadata before any OTel or log export.
De-identify for evaluation — CI uses synthetic data only; gold_diagnosis is never production PHI.
Encrypted at rest and in transit — TLS 1.3 + KMS-backed AES-256-GCM in production.
Right to delete — the persistence layer must support point-deletion by patient_id (GDPR Article 17, several US state laws).

Three of those (#3, #4, #6) are enforceable at the platform layer. The rest are architectural constraints the code is designed to support.

§164.312 Technical Safeguards — code-level crosswalk

Standard	Bodh code
Audit Controls (R)	`pkg/audit` — append-only `Event` records for every message, governance decision, HITL review, and agent error
Access Control (R) — unique user ID	Planned — OIDC `sub` becomes the user ID, persisted in audit
Integrity (R)	Append-only audit; periodic Merkle chain-hash for tamper evidence
Person/Entity Auth (R)	Planned — OIDC / SMART on FHIR / mTLS
Transmission Security (R)	Planned — TLS 1.3 end-to-end

Below is what’s actually shipping today, with code.

Governance policies = enforceable contracts at the boundary

These run before any agent’s HandleMessage is invoked. Here’s RequireCaseIDPolicy in full:

type RequireCaseIDPolicy struct{}

func (RequireCaseIDPolicy) Evaluate(_ context.Context, msg protocol.Message) (PolicyResult, error) {
    clinicalTypes := map[string]bool{
        "presentation": true, "case_state": true, "ask_question": true,
        "order_test": true, "diagnosis": true, "verify_reasoning": true,
    }
    if !clinicalTypes[msg.Type] {
        return allow("non-clinical message"), nil
    }
    if msg.Metadata == nil {
        return deny("missing metadata with case_id"), nil
    }
    if _, ok := msg.Metadata["case_id"].(string); !ok {
        return deny("case_id required for clinical messages"), nil
    }
    return allow("case_id present"), nil
}

Thirteen lines. It’s the simplest possible expression of “fail closed on missing correlation.” A clinical message without a case_id is silently dropped by the orchestrator — and the denial itself is recorded as a policy_decision audit event.

MaxDiagnosticCostPolicy is the same shape:

func (MaxDiagnosticCostPolicy) Evaluate(_ context.Context, msg protocol.Message) (PolicyResult, error) {
    if msg.Type != "order_test" {
        return allow("not a test order"), nil
    }
    maxCost, _ := msg.Metadata["max_cost_usd"].(float64)
    testCost, _ := msg.Metadata["test_cost_usd"].(float64)
    spent, _ := msg.Metadata["spent_usd"].(float64)
    if spent+testCost > maxCost {
        return deny("test would exceed diagnostic budget"), nil
    }
    return allow("within diagnostic budget"), nil
}

The composer ties them together:

policy := governance.NewComposite(
    governance.MaxContentLengthPolicy{Max: 65536},   // DoS / prompt-bomb cap
    governance.RequireCaseIDPolicy{},                // correlation enforced
    governance.MaxDiagnosticCostPolicy{},            // per-case budget
    governance.ResearchDisclaimerPolicy{},           // intent marker
)

CompositePolicy denies if any child denies (fail fast). Adding multi-tenancy, SMART scopes, rate limiting, or 42 CFR Part 2 SUD segmentation is “implement a Policy, add it to the composer.” The orchestrator doesn’t change. The agents don’t change.

The planned set:

Policy	Purpose
`RequireTenantPolicy`	Multi-tenant isolation — `tenant_id` must be present so downstream stores (PostgreSQL row-level security) can enforce
`RequireScopePolicy{Required}`	SMART-on-FHIR scope check — e.g. `cdm_request` requires `user/CarePlan.write`
`RedactNarrativePolicy`	Strip clinical narrative from log-bound copies
`Part2ConsentPolicy`	42 CFR Part 2 substance-use records — demand explicit consent token
`BreakGlassPolicy`	Audited emergency cross-patient reads
`RateLimitPolicy`	Per-tenant / per-clinician QPS cap

RequireScopePolicy already ships:

RequireScopePolicy{Required: map[string]string{
    "cdm_request":     "user/CarePlan.write",
    "discharge_event": "user/Encounter.read",
}}

PHI redaction at the logger seam

pkg/phi ships a LoggerWrapper that sits in front of the standard logger and runs Safe Harbor patterns over every format string and string-typed argument before it reaches stdout, Loki, or any third-party sink. The default pattern set:

func DefaultPatterns() []Pattern {
    return []Pattern{
        {Name: "ssn",      Re: regexp.MustCompile(`\b\d{3}-\d{2}-\d{4}\b`),                  Replace: "[REDACTED-SSN]"},
        {Name: "mrn",      Re: regexp.MustCompile(`(?i)\bMRN[:\s]*[A-Z0-9-]{4,}\b`),         Replace: "[REDACTED-MRN]"},
        {Name: "phone",    Re: regexp.MustCompile(`\b\d{3}[-.\s]\d{3}[-.\s]\d{4}\b`),        Replace: "[REDACTED-PHONE]"},
        {Name: "email",    Re: regexp.MustCompile(`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b`), Replace: "[REDACTED-EMAIL]"},
        {Name: "iso_date", Re: regexp.MustCompile(`\b(19|20)\d{2}-\d{2}-\d{2}\b`),           Replace: "[REDACTED-DATE]"},
        {Name: "us_date",  Re: regexp.MustCompile(`\b\d{1,2}/\d{1,2}/(19|20)\d{2}\b`),       Replace: "[REDACTED-DATE]"},
    }
}

Deliberately a “first line” — not a Data Loss Prevention service replacement. In regulated production, layer it in front of Google Cloud DLP, AWS Macie, or Microsoft Presidio with custom patterns for the field shapes you actually see (internal MRN formats, payer claim IDs, etc.).

The wrapper composes with anything that satisfies observability.Logger:

inner := observability.NewStdLogger()
logger := &phi.LoggerWrapper{
    Redactor: phi.NewRedactor(),
    Inner:    inner,
}
// All Logf calls are now redacted before they reach the inner logger.

Audit events — “enough to reconstruct, never enough to leak”

pkg/audit.Event is what gets persisted. Field selection is deliberate:

type Event struct {
    ID         string    `json:"id"`
    At         time.Time `json:"at"`
    Kind       Kind      `json:"kind"`         // message | policy_decision | review_decision | agent_error
    AgentID    string    `json:"agent_id,omitempty"`
    MessageID  string    `json:"message_id,omitempty"`
    From       string    `json:"from,omitempty"`
    To         string    `json:"to,omitempty"`
    Type       string    `json:"type,omitempty"`
    TenantID   string    `json:"tenant_id,omitempty"`   // correlation, not PHI
    PatientID  string    `json:"patient_id,omitempty"`  // opaque ID, not MRN
    CaseID     string    `json:"case_id,omitempty"`     // opaque ID
    TraceID    string    `json:"trace_id,omitempty"`
    Decision   string    `json:"decision,omitempty"`    // allow | deny | approve | reject
    Reason     string    `json:"reason,omitempty"`
    ReviewerID string    `json:"reviewer_id,omitempty"`
    ErrorText  string    `json:"error,omitempty"`
}

What’s deliberately absent: no narrative, no diagnosis text, no patient name, no DOB, no test result content. The event is a pointer into the system — enough to reconstruct who-did-what-when, never enough to leak PHI if the audit sink is ever compromised. The regulated data lives behind it under access control.

Today’s implementation is a thread-safe in-memory recorder plus a file-backed FileStore that appends one JSON line per event. Production swaps this for a WORM sink: S3 Object Lock, Azure Immutable Blob, or Loki + retention lock. Add Merkle chain-hashing for tamper evidence and you have what HIPAA §164.312(b) and 21 CFR Part 11 both expect.

HIPAA Privacy Rule — the operator side and the BAA chain

If a covered entity deploys Bodh, the operator becomes a business associate under 45 CFR §164.504(e). The operator must sign BAAs with both the covered entity and every PHI-touching subcontractor:

Subcontractor type	What you need
Cloud provider (AWS / Azure / GCP)	BAA — only use services on the HIPAA-eligible list
SMS / email (Twilio, SendGrid)	BAA — engagement notifier must respect opt-out and never sell PHI
Observability (Datadog, New Relic)	BAA before piping logs or traces with PHI shapes
LLM provider	OpenAI: Zero Data Retention agreement. Azure OpenAI: standard Azure BAA. Anthropic: BAA available via API on request.

The minimum-necessary principle propagates down this chain. If your engagement notifier sends “Reminder: lab on 2026-05-30 at 9:00 AM” via Twilio, that string transits a third party — and it had better be under a BAA with no analytics retention.

HIPAA Breach Notification — the Safe Harbor exclusion is why encryption isn’t optional

Audience	Deadline	Source
Affected individuals	60 days from discovery	§164.404
HHS Secretary	60 days (≥500 affected) or annually (<500)	§164.408
Prominent media (≥500 in a state)	60 days	§164.406
Business associate → covered entity	“without unreasonable delay” and ≤60 days	§164.410

Safe Harbor exclusion: a breach of properly encrypted data per HHS Guidance (FIPS 140-2/3 cryptographic modules, NIST-approved algorithms) does not count as a breach. This is the practical reason TLS 1.3 in transit + KMS-backed AES-256-GCM at rest isn’t optional — encryption is what makes the difference between an incident response and a regulator-facing breach notification.

Bodh’s audit log is the forensic record that lets the operator determine breach scope: what PHI was touched, whose, by which users, during which window. If your audit log can’t answer those three questions in a 60-day window, you don’t have a breach response — you have a guess.

Adjacent regimes — what bites if you only think HIPAA

HITECH + Information Blocking Rule — your FHIR adapter must not artificially obstruct legitimate access requests. Eight named exceptions (Preventing Harm, Privacy, Security, Infeasibility, Health IT Performance, Content & Manner, Fees, Licensing).
42 CFR Part 2 — substance-use treatment records require explicit consent for many disclosures HIPAA allows. Bodh’s planned Part2ConsentPolicy is the policy-shaped hook for this.
State laws — California CMIA (private right of action), NY SHIELD, Texas HB 300, Illinois BIPA (biometrics), Washington My Health My Data Act (consumer health data outside HIPAA), CCPA/CPRA.
GDPR Article 9 — special-category data. Article 22’s “right not to be subject to a decision based solely on automated processing” is exactly the use case Bodh’s verifier + HITL pattern is designed for. Article 35 requires a DPIA for AI-based clinical decision support.
NIST AI RMF (AI 600-1) — the generative-AI profile is required if Bodh ever runs LLM-backed agents in production. The audit log + per-call LLMTrace records are designed to satisfy it.

De-identification — two paths under §164.514

Either Safe Harbor (strip the 18 named identifiers — names, geography smaller than state, full dates, SSN, MRN, IPs, biometrics, photos) or Expert Determination (a qualified statistician certifies very small re-identification risk). Re-identification risk rises with:

High-resolution dates (use year only)
Rare diagnoses combined with demographics
ZIP5 vs ZIP3
Free-text narrative (clinical notes often contain names, addresses, employer)

In Bodh, the rule is: de-identification runs before records leave the regulated boundary. Not after, not during export, not “in the warehouse” — at the boundary.

What ships today vs. what’s planned

Status	Components
Done	MARA platform · 17 agents · sequential diagnosis demo · care services demo · HTTP API · operator CLI · HITL gate · PHI redaction · append-only audit (in-memory + file-backed) · multi-tenant + scope policies · cost-rework loop · SD-Bench runner · LLM stack (Anthropic / OpenAI / Ollama) · hybrid RAG (vector + BM25) · LLM-as-judge · safety guardrails · OTel-ready tracing · FHIR R4 (Patient / Observation / Encounter / Condition) · cohort analytics
Planned	HL7 v2 · SMART on FHIR auth · FHIR Bulk Data export · NEJM case loader · PostgreSQL persistence · case timeline UI · immutable audit pipeline (S3 Object Lock / Azure Immutable Blob with Merkle chain-hash) · 21 CFR Part 11 e-signature · ONC Certification artifacts

Status

Components

Done

MARA platform · 17 agents · sequential diagnosis demo · care services demo · HTTP API · operator CLI · HITL gate · PHI redaction · append-only audit (in-memory + file-backed) · multi-tenant + scope policies · cost-rework loop · SD-Bench runner · LLM stack (Anthropic / OpenAI / Ollama) · hybrid RAG (vector + BM25) · LLM-as-judge · safety guardrails · OTel-ready tracing · FHIR R4 (Patient / Observation / Encounter / Condition) · cohort analytics

Planned

HL7 v2 · SMART on FHIR auth · FHIR Bulk Data export · NEJM case loader · PostgreSQL persistence · case timeline UI · immutable audit pipeline (S3 Object Lock / Azure Immutable Blob with Merkle chain-hash) · 21 CFR Part 11 e-signature · ONC Certification artifacts

Five takeaways

HIPAA is an architecture problem, not a paperwork problem. Encryption, audit, minimum necessary, breach forensics, and the right to delete all show up as concrete interfaces. If the architecture doesn’t express them, no amount of policy documentation will.
The orchestrator must stay thin. Clinical logic belongs in agents. Governance belongs in policies. Routing belongs in the bus. Discovery belongs in the registry. Mix any of these and the system gets harder to test, harder to audit, and harder to evolve.
Human-in-the-loop is the regulatory hinge. The 21st Century Cures Act §3060 CDS carve-out and GDPR Article 22 both turn on whether a clinician can independently review each recommendation. That’s not a UI feature — it’s an architectural commitment that touches every gated agent, every audit event, and every review queue.
The fallback is the feature. Every LLM-backed agent has a deterministic rule-based fallback. The case always finalizes. Production-grade clinical AI is fail-soft, not fail-hard.
Compliance composes. governance.NewComposite(...) is the operational shape of HIPAA / GDPR / 42 CFR Part 2 / SMART scopes / rate limiting / tenancy. New regulation? New policy. The orchestrator doesn’t change.

Try it

git clone https://github.com/PratikDhanave/bodh.git
cd bodh

go run ./cmd/demo                            # Sequential diagnosis (rule-based)
go run ./cmd/caredemo                        # Care services with HITL on
go run ./cmd/care -addr=:8088                # HTTP API
go run ./cmd/bench                           # SD-Bench runner

# LLM-backed diagnostician
export ANTHROPIC_API_KEY=sk-...
go run ./cmd/demo -diagnostician=llm-anthropic

# Side-by-side provider compare
go run ./cmd/bench -providers=rule,llm-anthropic,llm-openai,llm-ollama

Repo: github.com/PratikDhanave/bodh

If you’re building clinical AI, working on HIPAA-aligned infrastructure, or thinking about how Microsoft’s MAI-DxO / SD Bench ideas translate to production-ready systems — issues, PRs, and conversations are welcome.

Bodh is a research and engineering reference. It is not a medical device, not approved for clinical use, and not under a Business Associate Agreement. The HIPAA architecture described above is the target for a production deployment — the codebase is not certified or audited.

HealthTech #ArtificialIntelligence #Go #Golang #HIPAA #Compliance #ClinicalDecisionSupport #MultiAgent #LLM #OpenSource #HealthcareIT #MedicalAI #FHIR #Interoperability #PrivacyEngineering #SoftwareArchitecture