Storage tiering — the line item most teams ignore

Compute gets the FinOps attention because it’s loud. Storage is quiet and accretive. On a year-plus engagement, the storage line item was 32% of the bill by month six. Here is what cut it.

Active vs long-term storage

BigQuery automatically moves a table partition from active storage to long-term storage 90 days after the last modification. Long-term is half the price.

The catch: any UPDATE, MERGE, or partition rewrite resets the 90-day clock. Pipelines that touch old partitions for any reason keep them in active storage indefinitely.

Audit query:

SELECT
  table_id,
  ROUND(SUM(IF(storage_class='LONG_TERM', size_bytes, 0)) / POW(10, 12), 2) AS long_term_tb,
  ROUND(SUM(IF(storage_class='ACTIVE',    size_bytes, 0)) / POW(10, 12), 2) AS active_tb,
  ROUND(SUM(IF(storage_class='ACTIVE',    size_bytes, 0))
        / NULLIF(SUM(size_bytes), 0) * 100, 1) AS pct_active
FROM `region-us`.INFORMATION_SCHEMA.PARTITIONS
WHERE table_id IN (SELECT table_name FROM `dataset.__TABLES__`)
GROUP BY table_id
ORDER BY active_tb DESC
LIMIT 50;

If a table is years old and 100% active, something is rewriting historical partitions. Find what, fix what.

Physical vs logical billing

In late 2023 BigQuery introduced physical-byte billing. Logical billing (the default) charges by uncompressed size; physical billing charges by the actual storage footprint after compression and clustering.

For columnar, well-clustered tables, physical billing is 3-5× cheaper than logical for the same data. For sparse / poorly-compressing tables it’s about the same or slightly worse.

The switch is per-dataset. You can A/B test by enabling physical billing on one dataset and comparing the storage line item month-over-month.

We moved 18 of 24 datasets to physical billing across the engagement. The six we didn’t move were either too small to matter or had compression ratios under 2× where the savings didn’t justify the operational change.

Column-level retention

BigQuery doesn’t have column-level retention. But you can fake it with a scheduled job that nulls old columns:

UPDATE my_table
SET pii_column = NULL
WHERE event_date < CURRENT_DATE() - 90
  AND pii_column IS NOT NULL;

Three reasons this is worth doing:

Storage. A nulled column compresses to nearly nothing.
Audit. The historical row still exists for analytics; the PII is gone. Privacy review likes that.
Cost of breach. A column that doesn’t exist after 90 days is a column that can’t be in a breach disclosure.

The UPDATE itself costs bytes scanned; do it in monthly batches over partitions that have already reached long-term storage so the cost is bounded.

Clustering after the fact

Many old tables were created without clustering keys because the team didn’t know which columns would dominate query filters yet. Clustering can be added retroactively to a table; the existing partitions don’t auto-recluster but new writes do.

For the hot tables we identified, we:

Added clustering on the top filter column (usually user_id or tenant_id).
Created a new table with the same schema + clustering, copied the historical partitions into it, swapped table names.

The swap-and-rename approach has the advantage that every partition ends up clustered. The simple ALTER TABLE approach only clusters new writes; old partitions stay un-clustered until natively rewritten.

After clustering, query costs on the same workload drop because filtered queries read fewer blocks. The same engagement saw 22% reduction in bytes-billed on the top 10 fact tables after a backfill-and-cluster pass.

Snapshots vs copies

Backups are a storage line item too. The native answer: time-travel (automatic, 7 days) + snapshots (manual, billed at storage rates).

Snapshots are cheap because they’re stored as deltas from the source. A nightly snapshot of a 14 TB table that changes 100 GB per day costs ~100 GB of incremental storage per snapshot, not 14 TB. Configure a snapshot retention schedule that matches your RPO; resist the urge to keep daily snapshots for years.

The 30-day cycle

Storage savings are slower to land than query savings. A query rewrite shows up in the bill within 24 hours. A storage tier transition takes the 90-day clock to fully settle. A physical-billing migration shows up in the next monthly bill.

Plan a storage audit every quarter, not every month. The signal-to- noise ratio is too low at higher frequency.

The numbers

On the engagement:

Long-term storage backfill (pipelines stopped touching old partitions): ~₹50 L / month removed.
Physical billing on 18 datasets: ~₹1.2 Cr / month removed.
Column-level retention on PII columns: ~₹15 L / month removed.
Snapshot retention rationalisation: ~₹8 L / month removed.

About 18-22% of the engagement’s eventual 57% came from storage work alone. It was the quietest contributor and the most durable — nobody had to maintain the savings the way they have to maintain query rewrites.