· 13 min read · ← All posts
Governance FREE-AI Compliance Multi-Agent AI

AI Governance: From Credential to Codebase

The field is becoming a profession. Certifications are landing, bodies of knowledge are crystallising. The harder question: what does AI governance look like when it actually ships?


The professionalisation moment

Something quiet is happening to AI governance. Over the last 18 months it has stopped being a section in a CISO’s annual report and started being a discipline — with bodies of knowledge, formal certifications, and (the surest sign of a real profession) a recognisable career path.

The IAPP — best known for the CIPP family of privacy certifications — launched the Artificial Intelligence Governance Professional (AIGP) credential. Its body of knowledge has four pillars:

  1. Foundations of AI governance
  2. Laws, standards and frameworks that apply to AI
  3. Governing AI development
  4. Governing AI deployment and use

That’s a remarkable shift in framing. AI governance is no longer “a thing the legal team helps with.” It’s a defined practice with assessable competencies. The four pillars are exactly the right four.

But the credential answers one question — does this professional understand the principles — and not the other one — does the organisation they work for actually live by them. The gap between knowing what governance should look like and shipping a system that embodies it is where most teams fail.

This article is about closing that gap. For each of the four pillars, I’ll show what the principle is — and what it looks like when it lands in a codebase, with concrete patterns from a multi-agent financial assistant we built called Genie.

The TL;DR: AI governance is what you ship, not what you know. Both halves matter.


Pillar I — Foundations: define what AI governance even means

The first pillar of any AI governance practice is having a clear answer to two questions: what is AI in your context, and what does “good governance” of it mean for your organisation?

These sound philosophical. They’re load-bearing.

If “AI” includes traditional ML models, your governance perimeter is much larger than if it covers only LLM-based agents. If “good governance” means “we have a policy document,” you’re at a different maturity level than if it means “every AI-driven action is auditable to a specific named rule and a specific commit hash.”

The principle: an organisation must define what AI governance means for it — usually as a set of guiding principles, a defined scope, and explicit decision authority.

What this looks like in a codebase:

Not a slide deck. A YAML file. A board-approved YAML file, with the principles named explicitly and version-controlled like the rest of the system.

For Genie, this is config/constitution.yaml and config/ai-policy.example.yaml:

# constitution.yaml — derived from RBI FREE-AI's 7 Sutras
principles:
  - "Trust is the Foundation"
  - "People First"
  - "Innovation over Restraint"
  - "Fairness and Equity"
  - "Accountability"
  - "Understandable by Design"
  - "Safety, Resilience, and Sustainability"

The principles aren’t a slogan. They’re loaded at boot into pkg/constitution, prefixed onto every LLM system prompt, and scored by an LLM-as-judge auditor on every outbound message. “How aligned were our outputs to People First in Q1?” becomes a SQL query.

The organisation’s principles become a measurement. Foundations stop being aspirational.

The credential teaches: what principles a mature framework should have. The codebase enforces: which principles your organisation actually obeys.


Pillar II — Laws, standards and frameworks: meeting the regulators where they are

The second pillar is fluency in the regulatory landscape. Privacy laws that already apply (GDPR, DPDP), AI-specific laws emerging (EU AI Act, Korea AI Basic Act, Brazil PL 2338, RBI FREE-AI in India), and the standards bodies’ frameworks (NIST AI RMF, ISO/IEC 42001, the OECD AI Principles).

The professional needs to know which laws apply to which AI systems, in which jurisdictions, under which conditions. The organisation needs to demonstrate compliance — not in a PowerPoint, on demand, when the regulator asks.

The principle: AI systems must operate within applicable laws, standards and frameworks — and the operator must be able to prove it.

What this looks like in a codebase:

A traceable cross-walk between every regulatory recommendation and the file path that implements it. Not “we comply with RBI FREE-AI.” Instead:

FREE-AI Rec Implementation
Rec 4 (Indigenous AI Models) pkg/llm/ollama.go declares Region() == "on-prem"; pkg/governance/sovereignty.go denies PII bound for non-home regions
Rec 14 (Board-Approved Policy) config/ai-policy.example.yaml with board_approved_on field, hashed at boot
Rec 20 (Red Teaming) cmd/red-team/ runs an adversarial probe corpus against the active composite policy in CI
Rec 22 (Annexure VI) pkg/incidents/ auto-generates the structured incident form on policy deny, agent panic, budget breach
Rec 23 (AI Inventory) GET /v1/ai-inventory reads from the live registry; cannot drift from what’s actually running
Rec 25 (Disclosures) GET /v1/disclosures is public and unauthenticated

The cross-walk fits on one page. Every claim has a file path. Every file path has a test. The regulator can verify any claim by reading the source.

The professional knows there are 26 FREE-AI recommendations. The codebase shows which ones are ✅, which are 🟡 partial, and which are ⚪ explicitly outside scope (regulator/SRO actions). Honesty about partials is part of governance.

The credential teaches: what the laws and frameworks require. The codebase shows: how each requirement is met, file by file.


Pillar III — Governing AI development: from notebook to production

The third pillar covers the build-side governance — how AI systems are designed, what data goes into them, how they’re tested, and how they’re released into production.

This is where most organisations have the biggest gap between policy and practice. The policy document says “all AI models must undergo bias testing prior to release.” The reality is that a data scientist trained the model in a Jupyter notebook, the deployment ran through a Slack DM, and the bias test was “we eyeballed it.”

The principle: AI systems must be designed, built and tested under controlled processes that produce evidence of safety, fairness, and quality before release.

What this looks like in a codebase:

Multiple, layered:

Design-time: per-agent contract

Every agent — there are 60+ in Genie — implements the same minimal interface:

type Agent interface {
    ID() string
    Name() string
    Capabilities() []string
    HandleMessage(ctx, msg, env) ([]Message, error)
}

type RiskAware interface {
    RiskLevel() RiskClass  // Low | Medium | High
}

That risk level isn’t a comment — it’s enforced. A RiskHigh agent (KYC orchestrator, payment orchestrator, AML monitor) cannot execute on a message that lacks advisor or admin role on metadata.user_roles. The orchestrator denies before dispatch and records an incident.

Design-time governance becomes a compile-time + runtime contract.

Build-time: tests that gate merge

Every agent ships with _test.go covering its decision tree. The cross-cutting tests/agents_registry/ enforces:

CI runs all of this on every PR. The “we forgot to test the new agent” failure mode is structurally impossible.

Release-time: red team + BCP drill

make red-team    # adversarial probe corpus vs the ACTIVE policy
make bcp-drill   # forces an agent failure; asserts fallback fires

Both run in CI on every PR. The red-team probes target the same composite policy that’s in production — not a mock. A new attack class is discovered: add a probe, see the failure, fix the policy, the test is permanent. The BCP drill forces a portfolio_advisor failure and asserts the deterministic fallback agent fires — proving that LLM-provider outages won’t cause customer-facing 500s.

These aren’t compliance theatre. They’re code paths that the build won’t pass without.

The credential teaches: what controls a mature build process should have. The codebase enforces: which controls actually run on every PR.


Pillar IV — Governing deployment and use: the runtime story

The fourth pillar is what happens after release — selection of models for specific use cases, deployment patterns that minimise risk, monitoring, incident response, and the ongoing question “is this still safe and aligned, six months in?”

This is where the operational rubber meets the regulatory road. A model that was safe at launch can drift. An agent that was correctly scoped at deployment can have its scope quietly broadened by a “minor refactor” that adds a new tool. A policy that was board-approved last quarter can be silently bypassed by a hot-patch.

The principle: AI systems in production must be continuously monitored, periodically reassessed, and capable of demonstrating their current safety posture on demand.

What this looks like in a codebase:

Every message passes through a composite governance policy before dispatch

Not after. Before. The orchestrator evaluates ~10 policies on every bus message, regardless of which agent handles it:

Any deny short-circuits the message, records a structured incident, and returns an error. Deployment-time governance lives in a single audit surface, not scattered across 60+ agent handlers.

Every deployment exposes its current posture publicly

GET /v1/disclosures — unauthenticated — returns:

{
  "agent_counts": { "high": 7, "low": 18, "medium": 9, "total": 34 },
  "home_region": "in",
  "policy_version": "0.1.0",
  "policy_approved_on": "2025-08-13",
  "principles": [ "Trust is the Foundation", ... ]
}

A regulator, journalist, or customer can verify the current governance posture without authenticating. The endpoint is generated from the live policy, so it cannot drift from what the system is actually running.

Every action emits a structured incident payload, not a log line

When a guardrail fires, the system writes a structured payload (Annexure VI-shaped for the Indian regulator) into a hash-chained audit log. The payload conforms to the regulator’s schema at the moment of detection, not as a post-hoc log-scraping job.

When the regulator asks “show me every high-grade incident in the last 90 days affecting customer onboarding,” the answer is a SQL query. Five minutes. Not the weekend.

Every agent has a deterministic fallback

When the LLM provider is down, agents fall back to deterministic implementations that don’t need the network. The user gets a degraded but truthful answer. The on-call gets paged. The system stays up.

CI proves the fallback fires (make bcp-drill). Operational governance becomes a test, not a hope.

The credential teaches: what runtime monitoring and incident response should look like. The codebase enforces: that every message, every agent, every deployment is in this posture.


The shape of the gap

The IAPP AIGP credential, and credentials like it, do something important: they raise the floor on the conversation. Three years ago “AI governance” was something a junior compliance analyst figured out from a Notion doc. Today it’s a defined practice with a body of knowledge and a peer community.

But the credential is necessary, not sufficient. A team with three AIGP-certified governance professionals on staff can still ship a system that has zero of the patterns above. The certification proves the people understand the principles. The codebase proves the organisation lives by them.

Both halves are needed. The shape of the gap I see in most organisations:

Has Lacks
Board-approved AI policy The system loading that policy at boot and obeying it
Risk-classification framework Agent-level RiskLevel() declarations enforced at the orchestrator
AI inventory in a spreadsheet Live /v1/ai-inventory endpoint built from the registry
Incident-reporting procedure Auto-generated structured incident payloads at the source
Red-team plan Red-team probe corpus running against the active policy in CI
BCP plan including AI A forced-failure drill that runs every PR and proves the fallback fires
Quarterly residency review Region() declared on every LLM provider with a policy that denies at the bus
Quarterly bias audit Bias scorers in pkg/safety with composite output gating per output

Every left column is what mature organisations have. Every right column is what mature organisations ship. The credential teaches you to ask for the left. The codebase is the right.


The shape of the bridge

Closing the gap requires three things working together:

  1. Certified governance professionals who understand the body of knowledge. The AIGP and its peers do this work. Hire and train.

  2. Engineering practices that translate principles into enforceable patterns. The patterns aren’t novel. They’re standard distributed-systems and security-engineering, applied to the AI surface. Composite governance at the bus. Risk class per agent. Auto-incident at the source. Public disclosures from the live policy.

  3. A shared language between the two. The governance professional says “we need data residency.” The engineer says “we’ll add a DataResidencyPolicy to the composite.” Same concept, different vocabulary. Teams that close the gap build the translation layer deliberately.

The hardest part is usually the third. Compliance teams and engineering teams default to their own languages. The compliance team’s “we need a board-approved policy” becomes the engineer’s “another YAML file to load.” The engineer’s “we have circuit breakers on the LLM” becomes the compliance team’s “what does that mean?” Building shared documentation that explains both perspectives — what the principle is, what the file path is — is most of the work.

That’s the bet behind everything in docs/ of Genie’s repo: every package explained in two layers, every FREE-AI recommendation cross-walked to a file. The credential teaches the principle. The doc teaches the bridge. The code is the truth.


What this looks like end-to-end

The complete story for one decision — a user’s KYC application — passing through a well-governed system:

  1. User submits application via authenticated HTTP. JWT carries identity + roles.
  2. Governance composite evaluates: length OK, metadata complete, role authorised, classification within ceiling, residency satisfied, consent live, no PII regex hit, no injection markers, schema valid, explainability noted.
  3. KYC orchestrator (declared RiskHigh) runs a pure deterministic function: PAN structural check → Aadhaar offline KYC → name match → liveness → PEP/sanctions → tier assignment.
  4. Sanctions hit auto-generates an Annexure VI incident payload, hash-chains it into the audit log.
  5. LLM-as-judge auditor (broadcast subscriber) scores the outgoing verdict against the 7-Sutra constitution.
  6. Customer receives the verdict paraphrased into their preferred language, with an AI disclosure banner.
  7. Trace spans every step in OpenTelemetry, with OpenInference semantic conventions on every LLM call.
  8. Warehouse sink writes one row to the BigQuery genie_events table for long-horizon analytics.
  9. Public disclosures endpoint reflects the system’s current posture.

Every step is governable. Every step is auditable. Every step is reproducible. That’s what the four pillars look like when they’re code, not slides.


Where the field goes next

Two predictions I’m willing to bet on:

1. AI governance becomes a board-level role. Just as Chief Information Security Officer became standard in the 2010s, Chief AI Governance Officer (or equivalent) becomes standard by 2028. The AIGP and its peers are the credentialing path for that role.

2. “Show me the file” becomes the test. Regulators are realising that policy documents aren’t enforceable. The next generation of supervisory letters will ask for the active configuration: the policy version that was running on the day of the incident, the hash of the YAML that the board approved, the test that proved the guardrail fires.

Organisations that have shipped the patterns above are ready for that conversation. Organisations that haven’t will be retrofitting in 2027 under pressure.


The repo

Genie is open source under MIT. The principles-to-code mapping:

Full docs in docs/ — including 13 detailed agent pages, 7 package pages, a FREE-AI cross-walk, an architecture deep-dive, and an operations runbook.

git clone https://github.com/c2siorg/genie.git
go test ./...
curl http://localhost:8080/v1/disclosures | jq .

If you’re building AI governance as a practice — whether you’re studying for the AIGP, ramping a governance function, or trying to translate principles into shipped patterns — which of the four pillars is hardest to operationalise in your shop? For us it was Pillar IV (deployment governance); the runtime patterns took the longest to converge. Always interested in how others sliced this.

AIGovernance #ResponsibleAI #IAPP #AIGP #RBI #FREEAI #DPDP #BankingAI #FinTechIndia #DataPrivacy #SecurityArchitecture

← Back to all posts