The 21st Century Cures Act, Expressed in Go
How one architectural commitment — letting a clinician independently review every recommendation — moves software out of FDA Software-as-a-Medical-Device classification and into the Clinical Decision Support carve-out.
The line nobody talks about
There’s a sentence in 21 USC §360j(o)(1) — the section the 21st Century Cures Act amended in 2016 — that decides whether a piece of clinical software gets regulated by the FDA as a medical device. It’s deceptively short:
The software is intended for the purpose of … enabling [the] health care professional to independently review the basis for [the] recommendations …
That’s criterion four of the four CDS carve-out criteria. The first three are about scope (no signal processing, displays patient information, supports HCP recommendations). The fourth is about architecture.
If your AI proposes a diagnosis and acts on it — sends the prescription, books the imaging, places the order — you’re a medical device. If your AI proposes a diagnosis and a clinician approves it before any action happens, you might be Clinical Decision Support under the carve-out.
The difference is not a checkbox. It’s a different shape of system.
What “independently review” actually means
Three things, on close reading of FDA guidance documents and the rulemaking discussion:
- The recommendation has a basis — explainable rationale, not just a confident assertion.
- The clinician can see that basis — visible in the UI, queryable in the system.
- The clinician’s review is consequential — clicking through doesn’t count; the system must actually wait for human disposition before any consequence flows.
The third is the hard one. Most “human in the loop” AI systems treat humans as a speed bump, not a gate. The recommendation appears; the clinician sees it; the system has already cached its consequences and acted; the human can override but rarely does. That’s not the carve-out. That’s a regulated device with a UI veneer.
What the architecture looks like in Bodh
I’ve been building Bodh, an open-source medical multi-agent platform in Go. The whole point of Bodh’s human_review agent — and the design of the platform around it — is to express that third requirement in code.
Here’s the pattern:
upstream agent ──► human_review ──► (queued)
│
reviewer decides (HTTP / API)
│
┌───────────┴───────────┐
│ │
approve reject
│ │
▼ ▼
next agent (resume) (no resume; logged)
Three properties make this load-bearing:
- The upstream agent emits no message to the next stage. It emits a
review_requestenvelope addressed tohuman_review. - The review queue persists. Until a human decides, nothing happens. There’s no path that bypasses the gate.
- Reject is lossless. On reject, the original payload is logged and dropped. No half-completed clinical action can leak past.
That’s it. Not a UX layer. Not a notification. A queue that stops the system until a human acts.
The envelope
When an upstream agent wants to gate its output, it doesn’t send the payload to the next agent. It wraps the payload and addresses it to human_review. The wrapping is six metadata keys:
| Key | Required | Purpose |
|---|---|---|
hitl_kind |
yes | review category (diagnosis_review, care_plan_review, …) |
hitl_next_to |
yes | agent to forward to on approval |
hitl_next_type |
yes | message type used on resume |
hitl_priority |
no | low / normal / high / urgent |
hitl_subject |
no | short label for the queue UI |
hitl_sla_minutes |
no | deadline; expired reviews never resume |
The original payload sits in msg.Content. The human_review agent doesn’t parse it — it just records the ReviewRequest and waits. When a reviewer decides via POST /reviews/{id}/decide, the agent’s ApplyDecision method either republishes the original payload to hitl_next_to (on approve) or logs it (on reject).
This pattern is opt-in per agent. Adding HITL to a new flow doesn’t touch the bus, the orchestrator, or the governance composer. The agent gains a HITL bool field and wraps its output when that’s set.
Seven gated agents (so far)
| Agent | What gets gated | Review kind |
|---|---|---|
reasoning_verifier |
Verified diagnosis | diagnosis_review |
cdm_planner |
Chronic disease care plan | care_plan_review |
tcm_coordinator |
Transitional care welcome message | tcm_welcome_review |
panel_manager |
Per-gap outreach | outreach_review |
refill_manager |
Refill approval | refill_review |
prior_auth |
Prior auth status | prior_auth_review |
readmission_tracker |
Discharge handoff to TCM | discharge_review |
Notice the pattern: anything that becomes consequential — a diagnosis recorded, a refill sent, a prior auth submitted, a TCM cascade triggered — is gated. Inbox triage (already routes urgent items to a human via the nurse task queue), pre-visit chart prep (read-only), RPM ingest (advisory, not action-taking) are intentionally not gated. Each gating decision is a deliberate choice about whether human disposition adds safety.
Five agents are intentionally not gated. Each “not gated” choice is documented in the agent’s contract.
Cascading approvals
The pattern composes. An approval can trigger downstream agents that themselves emit review requests:
readmission_tracker emits discharge_review
→ reviewer approves
→ tcm_coordinator handles discharge_event
→ emits tcm_welcome_review
→ reviewer approves
→ engagement sends welcome SMS
Two reviews per high-risk discharge. The reviewer sees the cascade in their queue; the audit log records each decision with reviewer ID, timestamp, decision, and reason.
The cmd/caredemo auto-approver drains the queue across rounds to demonstrate. Production UIs surface cascaded reviews as they appear so the reviewer can see the chain.
GDPR Article 22 gets it for free
The same architecture also satisfies a different regulatory regime: GDPR Article 22.
The data subject shall have the right not to be subject to a decision based solely on automated processing … which produces legal effects … or similarly significantly affects him or her.
“Solely” is the operative word. A system where every consequential recommendation is reviewed by a human before action is, by construction, not “solely” automated.
So one architectural pattern — the HITL gate — earns coverage under both the 21st Century Cures Act §3060 (US) and GDPR Article 22 (EU). Build it once, satisfy both.
Article 35 (DPIA — Data Protection Impact Assessment, required for AI-based clinical decision support in the EU) is also easier to write when the system has explicit gates: the DPIA gets to point at the queue, the audit log, and the reviewer’s role definition rather than asserting that “the human is in the loop somewhere.”
The audit log is the forensic record
Every review decision becomes an audit.Event of kind review_decision with:
Event{
Kind: "review_decision",
AgentID: "human_review",
MessageID: msg.ID,
From: "rn-rachel", // reviewer ID
To: "cdm_planner", // hitl_next_to from envelope
Type: "care_plan_review",
ReviewerID: "rn-rachel",
Decision: "approve_with_modifications",
Reason: "Spirometry frequency reduced to monthly given patient burden",
TenantID: "tenant-mercy-north",
CaseID: "case-2026-05-24-001",
TraceID: "trace-...",
}
Notice what’s deliberately absent: no diagnosis text, no medication dose, no patient name, no MRN. The audit event is a pointer into the system — enough to reconstruct who did what when, and to pull the regulated data from a separately controlled store under separate access control.
This matters because audit logs go to compliance teams, regulators, and discovery requests. The shape of the event determines what’s exposed in those contexts. Bodh’s rule: enough to reconstruct, never enough to leak.
“Lossless on reject” is the key property
This is the part most “HITL” systems get wrong.
When a reviewer rejects, what happens to the proposed action?
- Common pattern: the system has already partially acted. The notification was already queued. The patient already got the appointment text. The reject just adds a “this was rejected” note for later.
- Bodh’s pattern: the system has done nothing. The original payload was logged in the audit trail (with kind=
review_decision, decision=reject, reason from the reviewer) and never republished. No SMS sent. No PA submitted. No TCM cascade started.
Lossless on reject means the reviewer’s “no” is final. The system has no “wait and see” mode that lets rejected actions slip through later.
For criterion 4 — the clinician’s review must be consequential — this is the only shape that actually delivers.
What this costs
Five concrete trade-offs to walk into eyes open:
1. Latency
A review queue introduces wall-clock time between the AI’s recommendation and the consequence. Hours, sometimes days. For routine chronic-disease management, that’s fine — care plans are reviewed at the speed of human clinical work anyway. For urgent flows (sepsis recognition, stroke alerts), HITL gating means the AI is an alert producer, not a treatment initiator.
The hitl_priority and hitl_sla_minutes envelope keys let agents tier their requests. Urgent reviews can be page-out-loud; routine reviews wait in a queue.
2. Reviewer workload
If you gate seven agents, every patient generates 7×N reviews (in the worst case). The system has to be tuned so reviewers spend their time on cases where their judgment changes the outcome — not rubber-stamping every refill.
Two mitigations:
- High-confidence auto-approve with required audit and periodic sampling. The
refill_manageralready auto-approves on-plan medications; only off-plan refills hit the queue. - Reviewer triage scoring — a separate model can pre-rank the queue by likely-to-need-review, putting the easy cases at the bottom.
3. Cascade overflow
Approvals trigger downstream reviews. A high-risk discharge approval cascades into a TCM welcome review, which might cascade into a panel-gap outreach review. One discharge → 3 reviews. A busy hospital → hundreds of reviews per day.
The architecture supports cascade depth limits (planned: a hitl_cascade_depth envelope key with a default of 3). Once a cascade is N reviews deep, the system either stops (silent failure) or pages the on-call clinician (loud failure). We’ve discussed but not shipped this.
4. Reviewer drift
The same reviewer, week 1 vs week 30: are they approving for the same reasons? Are they rubber-stamping by month 12?
Bodh records reviewer rationales — the Reason field on each review decision. Periodic audit can analyse rationale text for drift, missing reasoning, or copied-template approvals. This is also Joint Commission territory if Bodh ever deploys clinically.
5. Reject-rate paradox
If reviewers reject 0% of recommendations, you’re rubber-stamping. If they reject 30%, your AI isn’t useful enough to ship.
The healthy band is somewhere in between. Tracking per-hitl_kind reject rates over time tells you which gated flows are tuned correctly. We expose this on the audit endpoint; production deployments will graph it.
Where this fits in the regulatory shift
The FDA’s regulatory framework for AI/ML SaMD is itself a moving target. The PCCP (Predetermined Change Control Plan) guidance, GMLP (Good Machine Learning Practice), and the ongoing 510(k) clearance process for AI-enabled devices are all evolving.
What’s relatively stable is the architectural commitment: if you want to stay out of SaMD classification under the Cures Act CDS carve-out, criterion 4 is the load-bearing constraint. The other three are easier to assert; criterion 4 has to be visible in the system architecture.
Most “human in the loop” claims I’ve reviewed for clinical AI products don’t survive contact with this. They have UIs that show recommendations to clinicians, but the system has already acted by the time the clinician sees them — the “review” is a confirmation dialog, not a gate.
Bodh’s human_review agent — and the seven gated agents wrapping their outputs in review_request envelopes — is what criterion 4 looks like when you actually implement it.
Try it
git clone https://github.com/PratikDhanave/bodh.git
cd bodh
# Diagnostic flow with HITL on (default in caredemo)
go run ./cmd/caredemo
# Inspect the queue
go run ./cmd/care -addr=:8088 &
curl -s localhost:8088/reviews?status=pending | jq .
# Approve via HTTP
curl -s -X POST localhost:8088/reviews/{id}/decide \
-H 'content-type: application/json' \
-d '{"reviewer_id":"rn-rachel","decision":"approve","reason":"vitals consistent"}'
Repo: github.com/PratikDhanave/bodh
If you’re building clinical decision support and wrestling with the same regulatory question — what does criterion 4 actually require? — I’d love to compare implementations. The audit envelope and the lossless-on-reject pattern aren’t claims; they’re code in pkg/hitl and agents/human_review.go. Open issues, PRs, or DMs.
Bodh is a research and engineering reference. Not a medical device. Not approved for clinical use. This article describes architectural patterns aligned with the 21st Century Cures Act CDS carve-out criteria; it is not legal advice. Production deployments require independent regulatory review.