Contextual guardrails for agentic AI — why per-message filters miss the point

Invariant's Guardrails evaluates sequences of actions, not individual messages. Dataflow control, seven security capabilities, deterministic enforcement. The difference between 'this message looks bad' and 'this sequence of actions is a known attack.'

Invariant released Guardrails on April 17, 2025. The technical announcement is clear about what makes this different from content filters: it is contextual, meaning it evaluates sequences of actions rather than individual messages in isolation. This distinction is load-bearing.

Why per-message filtering is insufficient

"Send a URL to Slack" — benign. "Read a spreadsheet, then send a URL to Slack" — this is the link-preview exfiltration pattern. The harm is in the sequence, not in either individual action. A filter that evaluates each message in isolation cannot detect this. A filter that evaluates the trace can.

This is the architectural insight from the ICML 2024 formal security paper made operational. The Guardrails system is the production implementation of the trace-level policy evaluation that paper described.

The seven guardrailing capabilities

  1. API Secret Detection — prevents credentials, tokens, and API keys from appearing in outbound messages.
  2. PII Detection — blocks personally identifiable information from flowing to tools or outputs where it doesn't belong.
  3. Dataflow Control — the most powerful capability. Defines allowed and prohibited sequences of tool calls. Directly prevents the link-preview exfiltration class of attacks.
  4. Tool Call Guardrails — restricts which tools can be called, with what parameters, from which context.
  5. Code Guardrails — prevents unsafe patterns in generated or executed code: eval(), exec(), unvalidated subprocess calls.
  6. Content Guardrails — blocks copyrighted content reproduction and harmful output.
  7. Loop Detection — identifies agents that have entered pathological retry loops and halts them.

Deterministic policy expression

Rules are expressed in a deterministic policy language, not in prompts to the model. An example rule that blocks the link-preview exfiltration attack:

deny {
  trace.contains(tool_call("read_spreadsheet"))
  m := trace.filter(tool_call("slack_send"))
  contains_url(m.args.text)
  order("read_spreadsheet") < order(m)
}

This rule either fires or it doesn't. There is no probability distribution, no context dependence, no "the model usually refuses this." When configured, the rule fires 100% of the time regardless of model, prompt, or session context.

Integration — one URL change

Guardrails runs through Invariant Gateway. The agent routes through the Gateway endpoint. The gateway intercepts all LLM and MCP traffic, applies the policy, and forwards or blocks. Existing agent code requires only the base URL change.

How this maps to Genie's architecture

Genie implements the same principle at the message-bus layer. The CompositePolicy in pkg/governance/ is a chain of deterministic Go policies that evaluate every message before it reaches any agent. The PromptInjectionPolicy, PIIPolicy, SovereigntyPolicy, and the board-approved DSL rules all reflect the same design: move security decisions out of the model's context and into verifiable infrastructure that cannot be overridden by prompt injection.

The Guardrails product is a managed version of this architecture, available without building the policy infrastructure from scratch. For teams not running on Go or not building on MARA, it's the fastest path to the same guarantee.


Source: Invariant Labs — Introducing Guardrails