The Right-to-Explanation Handler: GDPR Article 22 as a Go HTTP Endpoint

How a 200-line Go handler turns an audit log and an eval store into a regulator-friendly answer to "why did the AI decide that?" — without leaking a single byte of PHI.

The regulation, in one paragraph

GDPR Article 22(3) gives a data subject the right to “obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision.” Article 15 layers in the right of access — “meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing.” Read together, they describe a feature, not a policy memo: when an automated system makes a consequential decision about a person, that person must be able to (a) ask what happened, (b) get a meaningful answer, and (c) contest it through a human.

In a hospital, that translates to a clinician calling on behalf of a patient and saying: “Why did your platform recommend this diagnosis? Show me the steps. Show me who reviewed it. Show me whether anyone could have overridden it.”

In Bodh, the medical multi-agent platform I’ve been writing about, the answer is GET /explain/{case_id}.

What the endpoint returns

The response is a single JSON document. Here’s the struct (cmd/care/explain.go):

type Explanation struct {
    CaseID          string                  `json:"case_id"`
    TenantID        string                  `json:"tenant_id,omitempty"`
    GeneratedAt     time.Time               `json:"generated_at"`
    Status          string                  `json:"status"`          // pending | completed | failed | unknown
    FinalDiagnosis  string                  `json:"final_diagnosis,omitempty"`
    Accuracy        float64                 `json:"accuracy_vs_gold,omitempty"`
    TotalCostUSD    float64                 `json:"total_cost_usd,omitempty"`
    UnderBudget     *bool                   `json:"under_budget,omitempty"`
    AgentHops       []AgentHop              `json:"agent_hops"`
    HumanReviews    []HumanReviewSummary    `json:"human_reviews"`
    PolicyDecisions []PolicyDecisionSummary `json:"policy_decisions"`
    EventCounts     map[string]int          `json:"event_counts"`
    Caveats         []string                `json:"caveats"`
}

A clinician (or a regulator with the right credentials) gets back:

The decision — final diagnosis label, accuracy against the gold answer (if it’s a bench case), and whether it stayed under the cost budget.
The reasoning chain — AgentHops, an ordered list of which agents emitted what messages in what sequence. Hypothesist, test planner, challenger, cost guardian, judge, human review — every step that touched the case.
The human-in-the-loop record — every ReviewRequest that was raised, who decided what, when, and the rationale. Pending reviews surface too.
The governance trail — every policy decision (allow / deny / defer) with the reason. If the cost-guardian policy rejected a $5000 workup, that decision is here.
The audit summary — EventCounts shows how many message events, how many policy decisions, how many errors. The full audit log is the source of truth; the explanation is the projection.
The caveats — explicit disclaimers (“Bodh is a research and engineering reference. Not a medical device, not approved for clinical use”) plus any case-specific caveats, baked into the response so they cannot be stripped by an over-eager UI layer.

What the endpoint does not return

This is the part that took the most design work.

The response carries zero PHI. No chief complaint, no HPI narrative, no lab values, no medication names with doses, no patient name, no DOB, no MRN, no SSN. Not because we run a redactor over it — because the data source itself does not contain PHI.

The handler reads from two stores:

pkg/audit.Recorder — the immutable append-only audit log. Its field selection follows the “enough to reconstruct, never enough to leak” rule: IDs, kinds, timestamps, agent IDs, decision verdicts, redacted reasons. Never clinical content.
pkg/eval.Store — case-level metrics: final diagnosis label, accuracy, cost. The clinical narrative lives in the persistence layer behind row-level-security policies; the eval store records the outcome, not the story.

So the right-to-explanation surface inherits its privacy guarantee from architecture, not from a regex pass. The handler doesn’t need to know what PHI looks like; it can’t reach it.

The test that proves this is short:

func TestExplainHandler_PHIFreeFieldSelection(t *testing.T) {
    // ... drive the handler against a populated audit + eval store ...
    body := strings.ToLower(rec.Body.String())

    forbiddenKeys := []string{
        `"chief_complaint":`, `"hpi":`,
        `"patient_name":`, `"dob":`, `"mrn":`, `"ssn":`,
        `"narrative_text":`, `"test_result_value":`,
        `"medication_name":`, `"medication_dose":`,
    }
    for _, k := range forbiddenKeys {
        if strings.Contains(body, k) {
            t.Errorf("response contains forbidden JSON key %q — PHI surface leaked", k)
        }
    }
}

If anyone ever tries to add a “rich” explanation field that quotes the chief complaint, this test fires in CI before the PR can merge.

Why “audit log + eval store” is the right design

The first design instinct is to store an “explanation document” per case — a separate record summarising what the AI thought. That design has three problems:

It’s a second source of truth. Now there are two records that can diverge. Audit log says X happened; explanation document says Y was decided. Which is right? The audit log, always — so the explanation document is redundant.
It can drift from reality. If someone fixes a bug in the diagnostician three months later, the historical explanation document still reflects the old reasoning. Reconstructing from the audit log + eval store at query time means the explanation always matches what actually happened.
It expands the PHI surface. Storing a narrative explanation creates a new place where PHI can land, where access controls must be re-enforced, where breach notification must be re-considered. Computing the explanation from PHI-free sources sidesteps the whole problem.

Reconstructing at read time also makes the right-to-erasure path (GDPR Article 17) cleaner: there’s no separate explanation cache to invalidate. Delete the patient’s clinical record from the persistence layer (via the documented hard-delete path in pkg/persistence/postgres/), and the explanation handler now returns status: unknown for any future query. No background job, no eventual consistency.

The four kinds of audit event the explanation surfaces

Bodh’s audit log uses four event kinds, and the explanation handler maps each to a different part of the response:

Audit kind	Maps to	What it records
`KindMessage`	`AgentHops`	An agent received and processed a message. Records the sending and receiving agent IDs, the message type, and timestamps. Never the content.
`KindPolicyDecision`	`PolicyDecisions`	The governance layer evaluated a policy and returned a verdict. Records the policy name, verdict (allow/deny/defer), and a redacted reason.
`KindReviewDecision`	`HumanReviews`	A clinician decided on a HITL review. Records the review ID, decision (approve/reject/modify), and a redacted reason.
`KindAgentError`	flag on `AgentHops` (`HadError=true`) + `Status="failed"`	An agent encountered an error. Records the error class and a redacted error text.

The handler iterates the audit log once, partitions by kind, deduplicates by timestamp where appropriate (a KindReviewDecision event might already appear in the pending-reviews queue), and assembles the response in chronological order.

Status transitions

The status field is computed, not stored:

pending — audit events exist but no eval record yet. The case is still flowing through the pipeline.
completed — audit events exist and an eval record exists. The case finished.
failed — at least one KindAgentError event exists. The case errored.
unknown — no audit events for this case ID. Either it never existed or it was erased.

This is intentionally observable from the outside. A regulator should not need to understand Bodh’s internal state machine; they should see a status and have it match their mental model.

The auth question, honestly

Production deployments need authentication on this endpoint, scoped by tenant and possibly by role (a clinician can see their own cases; a compliance officer can see all cases for their tenant; nobody can see another tenant’s cases). The current implementation has no auth on the route. It’s a research reference; the binary scopes the audit log it reads from via BODH_TENANT_ID, but per-request tenant routing — ?tenant_id= or a JWT claim — is future work documented in the handler’s comments.

I’m calling this out because right-to-explanation is exactly the kind of feature where “we’ll add auth later” is the wrong answer. If you stand up Bodh in front of any system that touches a real patient, wrap this endpoint with your auth middleware on day one. The contract of the response is PHI-safe; the existence of the response for a given case ID is still a fact you don’t want to expose to anyone who can guess case IDs.

The right answer for a production fork is probably a small middleware:

mux.Handle("GET /explain/{case_id}",
    authMiddleware(
        tenantScopingMiddleware(
            explainHandler(...))))

And the tenantScopingMiddleware reads the JWT, sets BODH_TENANT_ID from the claim, and refuses if the case’s tenant doesn’t match.

What this maps to in the regulatory landscape

Regulation	Article / Section	What the handler satisfies
GDPR	Article 22(3)	Right to human intervention — the `HumanReviews` field surfaces the HITL gate.
GDPR	Article 15(1)(h)	“Meaningful information about the logic involved” — `AgentHops` + `PolicyDecisions`.
GDPR	Article 17	Right to erasure — the handler reads from PHI-free stores, so erasure of the patient’s clinical record automatically downgrades future explanations to `status=unknown`.
21st Century Cures Act	§3060 CDS carve-out, criterion 4	“Independent review by a healthcare professional” — the `HumanReviews` field is the field-level proof of that.
HIPAA Privacy Rule	§164.524 (access of individuals to PHI)	Orthogonal — the handler is PHI-free by design, so it doesn’t fall under §164.524’s PHI access scope. Clinical content access is handled separately by the persistence layer.
HIPAA Security Rule	§164.312(b) (audit controls)	The audit log this handler reads from is the §164.312(b) artefact.
EU AI Act	Article 13 (transparency for high-risk systems)	Per-decision transparency, computable from logs without bespoke explanation infrastructure.

What it doesn’t try to be

This isn’t a global model explanation. It doesn’t explain why the LLM weights make Pattern X more likely than Pattern Y in general. It explains this case — what messages flowed, what policies fired, what humans decided.

That’s deliberate. Model-level explanations are a research problem; per-decision explanations are an engineering problem. The regulatory frameworks that matter today (GDPR Article 22, Cures Act §3060) ask for the engineering version. The research version is best left to the bench, the red-team catalogue, and the model cards under docs/model-cards/ — which are written manually, not generated per case.

Six lines of test coverage

The handler ships with six test cases. They’re worth listing because they’re what regulators would want to see:

Unknown case — returns status=unknown, three caveats, empty hops. Confirms the endpoint doesn’t 500 on a missing case.
Reconstructs agent hops — three KindMessage events become three chronologically-sorted hops. Status is pending (no eval record). Policy decisions are deduplicated.
Completed case from eval store — looks up Accuracy=1.0, TotalCostUSD=220.0, UnderBudget=true from the eval record.
Failed case — a KindAgentError event flips the case to status=failed and stamps HadError=true on the affected hop.
HITL decisions and pending reviews — a decided review (from the audit log) and a pending review (from the queue) both surface, in chronological order, without duplication.
PHI-free field selection — the response body, lowercased, contains none of chief_complaint, hpi, patient_name, dob, mrn, ssn, narrative_text, test_result_value, medication_name, medication_dose.

That last test is the one that matters in a review. The first five prove the handler works; the sixth proves it can’t be made to misbehave by a careless schema change.

What I’d build next

Per-request tenant routing — JWT-claim-driven tenant scoping, replacing the binary-level BODH_TENANT_ID.
Signed responses — a JWS signature on the explanation document so it can be archived and replayed in a regulatory proceeding.
PDF rendering — the same JSON document rendered as a one-page PDF for the patient’s record, with the caveats on a footer.
An /explain index — GET /explain (no case ID) returns a paginated list of explainable cases for the tenant, again PHI-free (just IDs, kinds, statuses).

None of these change the contract. The handler’s invariants — PHI-free, audit-sourced, computed-not-stored — already make it forward-compatible with all four.

The takeaway

GDPR Article 22 reads like a compliance burden. In a system where the audit log is already PHI-free and append-only, where governance decisions are already structured records, where HITL is already a queue with an audit trail — the right-to-explanation handler is 200 lines of Go that project existing data onto a JSON shape.

The work is upstream. Build the audit log right, build governance as code, build the HITL gate as a first-class queue, and GET /explain/{case_id} is almost free. Build them wrong, and no amount of handler engineering will give you a regulator-friendly answer.

Code: cmd/care/explain.go (handler), cmd/care/explain_test.go (six-case suite), docs/aigp-governance-mapping.md (regulatory cross-reference).

Bodh is a research and engineering reference. Not a medical device, not approved for clinical use.