The 57% number
One large customer. Four levers. A year-plus engagement. The headline number is the easy part of the story. The harder part is which lever to pull first and why.
The four levers
In rough order of contribution to the 57%:
- Capacity-based slot reservation. Moving the predictable batch workload off on-demand pricing into a reservation with committed slots. Single biggest one-time cut.
- Query refactoring. MERGE rewrites (separate post), join reordering, predicate pushdown, materialised-view extraction.
- Storage tiering. Long-tail tables moved from active storage to long-term storage; cold partitions to archive; column-level retention.
- Dashboard pre-aggregation. Looker dashboards that scanned billion-row fact tables on every refresh, replaced with pre-aggregated summary tables refreshed hourly.
What surprised me
I went in thinking query refactoring would be the largest lever. It wasn’t. The capacity transition was. Reservation pricing is flat-rate; on-demand is bytes-billed. For a workload with stable diurnal patterns, the math is straightforward — but the organisational lift to commit to a reservation is real, and the finance conversation has to come before the engineering one.
The transition order mattered:
month 0 → audit on-demand spend, establish baseline
month 1 → first wave of query rewrites (the loudest 20)
month 2 → reservation purchase (sized against the now-reduced workload)
month 3+ → storage + dashboard + the long tail
If you do reservation first against the un-rewritten workload, you buy too many slots. If you do query first and stop, you leave 30%+ on the table because the remaining bytes-billed cost stays high until you commit.
The dashboard line item
A dashboard that scans 200 GB on every refresh costs about ₹1 per refresh on standard on-demand pricing. One product manager hits refresh 30 times a day. Ten product managers, 300 refreshes, ₹300/day, ₹9,000/month — for one dashboard. We found 40+ dashboards in the same shape.
The fix was a dashboard_facts table refreshed hourly via a scheduled
query and a one-line Looker change to point at it instead of the raw
fact table. Refresh cost dropped from 200 GB to 12 MB. The hourly
refresh itself cost more than zero, but capped: ₹100/day for the table
regardless of how many people opened the dashboard.
The materialised view audit
BigQuery’s materialised views are valuable but easy to mis-configure. Two recurring mistakes we untangled:
- MV that the planner never used. The MV was created against a table whose schema had drifted; the planner couldn’t prove equivalence so the original query ran. The MV was just sitting there, costing storage and refresh.
- MV refreshed more than read. A view refreshed every five minutes that downstream consumers read once an hour. Move the refresh to align with the consumer cadence.
Eight MVs deprecated, six recreated against the current schema, three with refresh cadence aligned to consumer reads.
What we documented
For each lever:
- The before-state metric (bytes-billed by job, by table, by reservation).
- The change (PR / config diff).
- The after-state metric, measured at +24h, +7d, +30d.
- The estimated annual run-rate saving.
The +30d measurement was important — slot reservation transitions and storage tier changes have settle-in effects. A win at +24h that disappears at +30d wasn’t a win.
The handoff
The engagement became a recurring Searce managed service. The handoff artefact: a runbook with the four levers, the INFORMATION_SCHEMA queries to detect each anti-pattern, and the escalation paths for each. Other Tata business units kicked off similar engagements off the same playbook within six months.
What the customer asked us next
Once the bill stabilised at the new run-rate, the next question was not “save more.” It was “now that we have headroom, what should we build?” The conversation moved from FinOps to enablement — vector search, BigQuery Knowledge Graph, ML inference at SQL boundaries. The cost work was the entry ticket; the data-platform conversation was the durable engagement.
What I’d do differently
Three things, in retrospect:
- Establish the baseline more carefully. I underweighted storage in the first audit. Storage was the second-biggest line by month 6.
- Get finance in the room earlier. The reservation transition needed CFO-level sign-off. We had the engineering decision two months before the finance decision; that gap was avoidable.
- Build the dashboard for the customer’s own FinOps team. We ran the reporting; we should have shipped it as a permanent Looker dashboard the customer’s FinOps lead owned. They didn’t need us in the room to see the savings.
The 57% is the headline. The durable artefact is the runbook plus the handoff dashboard. Both are reusable. Both are why the engagement productised.