· 3 min read · ← All posts
BigQuery FinOps SQL Cost Optimisation

The MERGE pattern that cost ten times more

One client. One pattern. ₹10 Cr / month in avoidable spend. Here is exactly what the bad MERGE looked like, why it was 10× the cost of the obvious alternative, and the rewrite that landed.

The pattern in the wild

The team had standardised on MERGE INTO target USING source for every nightly upsert job. The SQL looked like the textbook example. The bill did not.

MERGE INTO sales_target t
USING sales_source s
ON t.order_id = s.order_id
WHEN MATCHED THEN
  UPDATE SET amount = s.amount, updated_at = s.updated_at
WHEN NOT MATCHED THEN
  INSERT (order_id, amount, updated_at)
  VALUES (s.order_id, s.amount, s.updated_at);

sales_target was a 14 TB partitioned table. sales_source was the last hour’s writes — maybe 200 MB. Every nightly MERGE was scanning the full target table because the ON clause did not include the partition column. BigQuery had no way to prune partitions; the full 14 TB went through the bytes-billed meter every single run.

What the planner sees

The explain plan made it explicit. The MERGE engine needs to look at every row in sales_target to decide if the source row matches. Without a partition predicate on the target side of the join, the planner reads everything.

The fix is to put the partition column on both sides of the ON clause:

MERGE INTO sales_target t
USING sales_source s
ON t.order_id = s.order_id
   AND t.partition_date = s.partition_date          -- ← this line
   AND t.partition_date >= CURRENT_DATE() - 7       -- ← and this
WHEN MATCHED THEN
  UPDATE SET amount = s.amount, updated_at = s.updated_at
WHEN NOT MATCHED THEN
  INSERT (order_id, amount, updated_at, partition_date)
  VALUES (s.order_id, s.amount, s.updated_at, s.partition_date);

Two added predicates and the scan dropped from 14 TB to ~300 GB. Same correctness; 47× less data through the meter.

When MERGE is still the wrong tool

Not every upsert wants a MERGE. If the source is large relative to the target window, the planner often does better with INSERT INTO a staging table, then DELETE + INSERT against the target inside a transaction. Three reasons:

  1. Shuffles. MERGE forces a shuffle on the ON columns. A straight INSERT into a partitioned table writes directly.
  2. Streaming buffer interactions. Recently-streamed rows can confuse MERGE matching for tens of minutes after ingest.
  3. Materialised view refresh. MERGE invalidates the dependent materialised views even when the rewrite was a no-op for the subset the view cares about.

For one Tata pipeline we replaced six MERGE jobs with INSERT-then- DELETE flows and dropped daily cost by another 18%.

The accidental architecture

The biggest lesson from the engagement: an SQL pattern that compiles and gives correct answers is not a pattern that is cheap. The “idiomatic” MERGE was idiomatic for the textbook example (small source, small target). At our scale it was the most expensive query shape we had. We caught it because the bill made it caught us.

What landed

This was one lever out of about a dozen. The 57% figure on the engagement was the sum of MERGE rewrites, capacity-based slot adoption, storage tier changes, dashboard pre-aggregation, and a materialised-view audit. The MERGE rewrite was the loudest single contributor.

What to do tomorrow

If you run BigQuery at scale:

  1. SELECT user_email, job_id, total_bytes_billed FROM region-XX.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE statement_type='MERGE' AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) ORDER BY total_bytes_billed DESC LIMIT 50;
  2. For each top hit, check the ON clause. If the partition column isn’t there, fix it before the next billing window.
  3. Add the lint rule. The pattern keeps coming back unless you stop it at PR time.

The savings compound across every nightly run, every backfill, every re-run after a failure. A single afternoon of grep and rewrite paid for a quarter of engineering time.

← Back to all posts