Bloom — Terraform for regulated banks
SC Ventures (Standard Chartered’s innovation arm) needed cloud provisioning that bank engineering teams could self-serve, but that didn’t break SOC 2 or ISO 27001 controls. The answer was Terraform plus policy. Here is what the architecture looked like and the patterns that survived.
The constraint
A bank engineering team wants a new project. They need a network, some compute, a database, some IAM. The naive answer is “give them console access.” The compliance answer is “no.”
Console access means uncontrolled changes. Uncontrolled changes mean failed audits. Failed audits mean regulator letters. The bank will not give engineering teams console access for that exact reason — but engineering teams need to ship.
Bloom was the bridge: a self-service platform where teams could request a stack, the platform provisioned it via Terraform under controlled IAM, and the resulting state was always audit-traceable.
The shape
┌──────────────────┐
│ engineer files │
│ a "stack request"│ ← web form, Slack command, or API
└────────┬─────────┘
│
▼
┌──────────────────┐ ┌──────────────────┐
│ policy gate │ ───► │ approvers (auto │
│ (OPA / Sentinel) │ │ or human) │
└────────┬─────────┘ └──────────────────┘
│ approved
▼
┌──────────────────┐
│ Terraform runner │ ← runs as a non-engineer service principal
│ (Atlantis-style)│
└────────┬─────────┘
│
▼
┌──────────────────┐
│ stack provisioned│
│ + state in S3 │
│ + audit row in │
│ the platform │
└──────────────────┘
The engineer never touched cloud credentials. The Terraform runner held the credentials and applied changes only after the policy gate said yes.
The policy gate
OPA policies enforced the boring-but-load-bearing rules:
- All S3 buckets have encryption at rest.
- All RDS instances are in private subnets.
- All IAM roles use least-privilege boilerplate.
- No public IPs without an approval annotation.
- Region restrictions per stack class (a “regulated” stack can only land in approved regions).
Policies were version-controlled and reviewed by the security team. Engineering teams couldn’t bypass them; the runner refused to plan a non-compliant change.
Pattern that worked well: each policy had a human-readable failure
message. When the runner rejected a plan, the engineer got
“S3 bucket foo-bar is missing server-side encryption (rule
s3-encryption-required). Add server_side_encryption_configuration
to the resource.” — not a stack trace.
SOC 2 controls as Terraform modules
For each SOC 2 control we cared about, the platform shipped a Terraform module that implemented it. Engineering teams composed their stack from these modules; the controls came along for free.
Examples:
bloom_logging— wires CloudTrail → S3 → KMS-encrypted bucket with object-lock, configures log retention to 7 years, sets up the SIEM forwarder.bloom_iam_boundary— applies a permissions boundary to every role the stack creates, preventing privilege escalation.bloom_secrets— wires Vault or Secret Manager with rotation policies and a default 90-day rotation cadence.
The audit team reviewed the modules once. Every engineering team that used them inherited the review. SOC 2 audits became fast because the auditor could check that team X used module Y, instead of inspecting team X’s per-resource configuration.
ISO 27001 — what code can prove
ISO 27001 cares about controls being in place AND being documented. Code can prove they’re in place; documentation has to come from somewhere else.
For each Bloom module, the README documented:
- Which ISO 27001 control it satisfied.
- The threat the control mitigated.
- The failure modes if the control was disabled.
- The change-control process for the module itself.
The README was the documentation the auditor read. The Terraform was the evidence the control was applied. Together they took audit cycles from weeks to days.
The push-to-deploy self-service capability
Once a stack was provisioned, engineering teams needed to deploy their application into it. The Bloom platform extended into a CD pipeline:
- Push to a feature branch → runs tests, lint, security scan.
- Merge to main → builds the image, signs it, pushes to ECR.
- The runner deploys the image into the stack’s runtime (ECS or AKS depending on the stack class).
The pipeline shipped with the stack. Teams got it for free, with the same controls baked in. Onboarding time dropped 20% measured across the first six teams that adopted the platform.
What broke
Three recurring pains:
-
Module version pinning. Teams pinned to old module versions to avoid the breaking changes. The platform had to either support old versions indefinitely or force upgrades. We landed on a 6-month deprecation window with active migration support.
-
Terraform state corruption. Two engineers ran apply simultaneously. The Atlantis-style locking prevented this in theory; in practice it broke under high-concurrency onboarding waves. We added a global lock per stack with a visible queue.
-
The “I just need to look at it” problem. Engineers wanted read-only console access to debug. We compromised: read-only IAM roles, time-bound (4-hour TTL), audited, requested through the platform. Not perfect; it was the workable shape.
What I learned
Provisioning a regulated cloud environment via Terraform is the right shape. The hard parts are not the Terraform — the hard parts are:
- The policy review cadence (security has to engage early or the modules drift from controls).
- The deprecation discipline (you’ll keep supporting old shapes longer than you want; budget for it).
- The “edge case” that’s not actually an edge case (teams find three new edge cases a week; the platform has to absorb them without breaking the controls).
The platform pattern itself was right. The Terraform-as-control- implementation pattern was right. The OPA-policy-gate-with- human-readable-errors was right. The push-to-deploy bundle shipping with the stack was right. The pieces that survived years of bank engineering use were the boring, opinionated parts.