"Enough to Reconstruct, Never Enough to Leak": The HIPAA Audit Log Design Problem
Why my audit event schema deliberately doesn't carry the patient narrative, the diagnosis text, or the test result content — and how that constraint makes the audit log a credible forensic record under HIPAA §164.312(b).
The question I had to answer
HIPAA §164.312(b) requires “hardware, software, and procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information.”
Easy paraphrase: log everything that touches PHI.
Harder version of the question: what fields should each log event contain?
This sounds trivial. It isn’t. The answer determines whether your audit log is a credible forensic record or a second copy of the PHI you’re trying to protect.
I just shipped audit pipeline wiring for Bodh, the open-source medical multi-agent platform I’ve been building in Go. The audit event schema is the part I spent the most time thinking about. The pattern I landed on, after a few iterations, is “enough to reconstruct, never enough to leak.”
The schema
type Event struct {
ID string `json:"id"` // UUID per event
At time.Time `json:"at"`
Kind Kind `json:"kind"` // message | policy_decision | review_decision | agent_error
AgentID string `json:"agent_id,omitempty"`
MessageID string `json:"message_id,omitempty"`
From string `json:"from,omitempty"`
To string `json:"to,omitempty"`
Type string `json:"type,omitempty"`
TenantID string `json:"tenant_id,omitempty"` // routing token only
PatientID string `json:"patient_id,omitempty"` // opaque ID, NEVER MRN
CaseID string `json:"case_id,omitempty"` // opaque ID
TraceID string `json:"trace_id,omitempty"`
Decision string `json:"decision,omitempty"` // allow | deny | approve | reject
Reason string `json:"reason,omitempty"`
ReviewerID string `json:"reviewer_id,omitempty"`
ErrorText string `json:"error,omitempty"` // err.Error() ONLY — never msg.Content
}
Sixteen fields. Notice what’s deliberately absent:
- No narrative
- No HPI text
- No diagnosis label
- No medication name or dose
- No test result value
- No patient name, DOB, MRN
- No
msg.Contentever
That absence is load-bearing. Let me explain why.
What an audit log is actually for
Three jobs, in priority order:
Job 1: Reconstruct what happened on a specific case
A clinician says “Bodh’s recommendation for case-2026-05-24-001 was wrong.” Or a privacy investigator says “we got a complaint that someone accessed patient pt-9382 inappropriately last Tuesday — what happened?”
The audit log needs to answer:
- Who acted on this case (which agents, which human reviewers)
- When each action happened (RFC3339 timestamps with timezone)
- What kind of action it was (policy decision, agent invocation, review decision, agent error)
- In what order the actions happened (message_id + UUID + at provide causal ordering)
- What outcome each step produced (decision: allow / deny / approve / reject)
- What rationale governed each decision (
Reasonfield, populated by policies and reviewers)
The shipped schema answers all six.
Job 2: Aggregate over many cases to detect patterns
Compliance teams need to ask:
- “What’s our policy-denial rate by tenant this quarter?”
- “Which reviewer rejects refill requests at 3× the team average?”
- “Did this agent’s error rate spike after the last deploy?”
- “Show me every case where a
discharge_reviewwas approved with a SLA breach.”
The schema’s structured fields (Kind, TenantID, ReviewerID, Type, Decision) make these queries straightforward. No need to parse free text out of a narrative field.
Job 3: Be a defensible forensic record
When the audit log is examined as evidence — in a HIPAA breach investigation, in litigation, in a 21 CFR Part 11 inspection — it needs to:
- Show no tampering (append-only at the database GRANT level, plus Merkle chain-hash in production)
- Show no PHI leakage if the audit sink itself is compromised
- Survive a court-ordered subpoena that demands “all activity on this patient” without requiring redaction passes
The third point is where the “never enough to leak” rule pays off. A regulator can request, and you can produce, the full audit history for a specific patient ID — and that history reveals routing pointers but not regulated content.
Why narrative belongs somewhere else
The narrative — chief complaint, HPI, test results, diagnosis, rationale — is regulated PHI. It needs:
- TLS in transit
- AES-256-GCM at rest (with KMS-managed keys)
- Access control per HIPAA Privacy Rule §164.502(b) (minimum necessary)
- BAA coverage on every subcontractor that touches it
- Right-to-delete on patient request (GDPR Article 17)
- Retention per HIPAA (6 years from creation OR last action, whichever later)
Bodh’s pattern: the narrative lives in pkg/persistence/postgres.interaction_records.payload_ct (BYTEA, encryption-ready, plaintext today, ciphertext after the column-encryption PR). The audit event references it via message_id and case_id. To reconstruct a case, you join the audit table to the persistence table — under the access controls that govern the persistence table.
This separation is what lets the audit log be less sensitive than the data it indexes. The two sinks have different access controls, different retention policies, different export paths to forensic services. A compromised audit sink leaks routing pointers. A compromised persistence sink leaks PHI. The blast radius of an audit-sink incident is recoverable; the blast radius of a persistence-sink incident is a HIPAA breach.
If you put the narrative in the audit log, you’ve conflated the two sinks. Now an audit-log compromise is a PHI breach. That’s worse.
Three examples of the field-selection logic
Policy denial event
A clinical message arrives without a case_id. The RequireCaseIDPolicy denies it. The audit event:
{
"id": "01HXVY...",
"at": "2026-05-24T07:02:11Z",
"kind": "policy_decision",
"agent_id": "orchestrator",
"message_id": "6ab0f882-...",
"from": "api-gateway",
"to": "intake",
"type": "presentation",
"tenant_id": "tenant-mercy-north",
"decision": "deny",
"reason": "case_id required for clinical messages"
}
What’s there: enough to reconstruct who tried to do what, where the request originated, and why it was denied. What’s missing: the narrative body of the rejected message. The denied content never gets stored — both because it might be malformed and because we don’t want to retain content we explicitly refused.
Agent error event
The LLM diagnostician times out. The audit event:
{
"id": "01HXVY...",
"at": "2026-05-24T07:14:23Z",
"kind": "agent_error",
"agent_id": "diagnostician/llm-anthropic",
"message_id": "6ab0f882-...",
"tenant_id": "tenant-mercy-north",
"case_id": "case-2026-05-24-001",
"trace_id": "trace-abc123",
"error": "context deadline exceeded after 30s"
}
What’s there: which agent failed, when, on which case (opaque ID), with what error message. What’s missing: the prompt, the response, the case state. The error text is err.Error() ONLY — never msg.Content. The audit code path explicitly never reads msg.Content into the event.
When the operator investigates “why did this case use the rule-based fallback?”, the trace_id correlates to the [llm-trace] JSON line (which has token counts, stop_reason, hashed case_id). For the actual case state, you go to the persistence table under separate access controls.
Review decision event
A clinician approves a care plan with modifications:
{
"id": "01HXVY...",
"at": "2026-05-24T11:42:08Z",
"kind": "review_decision",
"agent_id": "human_review",
"message_id": "6ab0f882-...",
"from": "rn-rachel",
"to": "cdm_planner",
"type": "care_plan_review",
"tenant_id": "tenant-mercy-north",
"case_id": "case-2026-05-24-001",
"reviewer_id": "rn-rachel",
"decision": "approve_with_modifications",
"reason": "Spirometry frequency reduced to monthly given patient burden"
}
What’s there: who decided, what kind of review, what decision, what rationale. What’s missing: the care plan content itself, the patient demographics, the diagnosis label. The reviewer’s rationale is preserved (it’s their reasoning, not patient PHI per se) — but it’s expected to be sanitised by the reviewer (per UI guidance / training) to avoid leaking PHI into the rationale field.
The Reason field is the one place narrative-y content lives in the audit log. Reviewers are explicitly trained: rationales are for clinical reasoning, not patient identifiers. The phi.Redactor runs on rationales as a defence-in-depth measure.
What “fail-open on audit” means
if err := o.recorder.Record(event); err != nil {
o.env.Logf("audit record error: %v", err)
// CONTINUE PROCESSING — never block on audit failure
}
Audit failures (sink down, disk full, network partition) are logged but do not block message processing. This is deliberate and worth defending.
The alternative is fail-closed: if audit fails, refuse to process the clinical message. The cost: the clinician now has nothing to look at, because the audit sink is temporarily down. Is this safer?
No. It’s worse.
- The clinician’s workflow blocks for an operational issue.
- The clinical decision they were about to make happens outside the system — pen and paper, or a different tool, with no audit trail at all.
- The original message is dropped, so even after audit recovers, the case isn’t replayable.
Fail-open on audit means: the clinical workflow continues, the audit gap is logged (in the operational logs, which are separately persisted), and the operator can investigate the gap after the fact. The case still has some trail — the operational logs — even if the audit-specific sink is temporarily unavailable.
This is the principle: audit failures are operational issues, not safety issues. Blocking the clinical flow because the audit sink is down is the worst outcome.
The HIPAA Security Rule §164.312(b) doesn’t require zero audit gaps — it requires “mechanisms that record and examine activity.” A small gap with a logged explanation is compatible with §164.312(b); blocking the clinical workflow is not.
What “append-only at DB GRANTs” means
The Postgres backend’s audit_events table has grants:
GRANT SELECT, INSERT ON audit_events TO bodh_app;
-- Notably absent: UPDATE, DELETE
The application role cannot modify or delete audit events. Append-only is enforced by the database, not by the application’s intentions.
Why this matters:
- A compromised application credential can write garbage events but cannot rewrite history.
- An application bug that issues an
UPDATE audit_eventsfails with a permission error. - A bad migration that tries to “clean up” old audit rows fails with a permission error.
- Only privileged DBA roles can run retention deletes — those are governed by separate access controls, audit-logged in a different system, and time-limited.
This is defense in depth in literal Postgres GRANTs. The audit log’s integrity doesn’t depend on the application code being correct — it depends on the database role’s privileges being correct, which is much easier to audit (\du in psql).
What you do with this audit log
Three queries that fall out of the schema naturally:
“Show me every action on case X”
SELECT at, kind, agent_id, decision, reason
FROM audit_events
WHERE tenant_id = current_setting('app.tenant_id')
AND case_id = 'case-2026-05-24-001'
ORDER BY at ASC;
Returns the full chronological trail for a case — every governance decision, every agent invocation, every review. The narrative isn’t there, but the shape of what happened is fully reconstructible.
“Show me policy denial spikes by tenant”
SELECT tenant_id, date_trunc('hour', at) AS hour, count(*)
FROM audit_events
WHERE kind = 'policy_decision' AND decision = 'deny'
AND at > now() - interval '24 hours'
GROUP BY tenant_id, hour
ORDER BY count DESC;
Operational query. Spikes indicate misconfigured callers, attacks, or policy changes that broke a workflow.
“Reviewer rationale audit for the last month”
SELECT reviewer_id, kind, decision, reason
FROM audit_events
WHERE kind = 'review_decision'
AND at > now() - interval '30 days'
ORDER BY reviewer_id, at;
Compliance review. Are reviewers documenting their reasoning? Are rationales pattern-matched (copy/paste) or unique? Are there reviewers who never reject? Are there reviewers who never approve?
The Reason field is the one place narrative-y content lives. Operators can analyse rationale text for drift, missing reasoning, or copy-paste approvals — without ever needing the patient narrative or test results.
Production add-ons not in this PR
The shipped audit pipeline is complete enough for HIPAA §164.312(b) compliance at the data-collection layer. Three production additions are tracked:
1. WORM sink
For long-term retention (HIPAA: 6 years), forward audit events to a write-once-read-many sink: S3 Object Lock, Azure Immutable Blob, or Loki with retention lock. The application database is for hot access (90 days typical); the WORM sink is the cold-storage forensic record.
2. Merkle chain-hash
Periodic export jobs compute hash_n = sha256(hash_n-1 || serialize(events_n)) and write the chain into a separate signed manifest. Any tampering with historical events breaks the chain and is detectable.
This addresses the “tamper-evident” expectation in 21 CFR Part 11 and the Joint Commission’s audit-log integrity guidance.
3. Real-time alerting
Stream from the Postgres LISTEN/NOTIFY (or replace with NATS / Kafka for higher throughput) to a real-time alerting layer: spike-on-deny, anomaly-on-reviewer-pattern, integrity-check-failure. Detection latency goes from “discovered next quarter” to “discovered in minutes.”
None of these are required for the §164.312(b) data-collection baseline. All of them are part of a mature audit-log operation.
Try it
git clone https://github.com/PratikDhanave/bodh.git
cd bodh
# Start cmd/care
go run ./cmd/care -addr=:8088 &
# Trigger a clinical flow
curl -X POST localhost:8088/cdm \
-H 'content-type: application/json' \
-d '{"patient_id":"pt-001","condition":"dm","case_id":"case-001"}'
# Inspect the audit log
curl -s localhost:8088/audit?limit=20 | jq '.[] | {kind, agent_id, decision, reason}'
Schema details in docs/governance.md, Postgres backend details in docs/deployment.md, full audit pipeline architecture in docs/operations.md.
Repo: github.com/PratikDhanave/bodh
If you’re building HIPAA-aligned audit pipelines and want to compare schema decisions, retention models, or the WORM-sink integration story — issues, PRs, and DMs welcome.
Bodh is a research and engineering reference. Not under a Business Associate Agreement. The audit pipeline described here is the architectural target for HIPAA-aligned clinical deployments; the codebase is not certified.