The MERGE pattern that cost ten times more
One client. One pattern. ₹10 Cr / month in avoidable spend. Here is exactly what the bad MERGE looked like, why it was 10× the cost of the obvious alternative, and the rewrite that landed.
The pattern in the wild
The team had standardised on MERGE INTO target USING source for every
nightly upsert job. The SQL looked like the textbook example. The bill
did not.
MERGE INTO sales_target t
USING sales_source s
ON t.order_id = s.order_id
WHEN MATCHED THEN
UPDATE SET amount = s.amount, updated_at = s.updated_at
WHEN NOT MATCHED THEN
INSERT (order_id, amount, updated_at)
VALUES (s.order_id, s.amount, s.updated_at);
sales_target was a 14 TB partitioned table. sales_source was the
last hour’s writes — maybe 200 MB. Every nightly MERGE was scanning
the full target table because the ON clause did not include the
partition column. BigQuery had no way to prune partitions; the full
14 TB went through the bytes-billed meter every single run.
What the planner sees
The explain plan made it explicit. The MERGE engine needs to look at
every row in sales_target to decide if the source row matches.
Without a partition predicate on the target side of the join, the
planner reads everything.
The fix is to put the partition column on both sides of the ON clause:
MERGE INTO sales_target t
USING sales_source s
ON t.order_id = s.order_id
AND t.partition_date = s.partition_date -- ← this line
AND t.partition_date >= CURRENT_DATE() - 7 -- ← and this
WHEN MATCHED THEN
UPDATE SET amount = s.amount, updated_at = s.updated_at
WHEN NOT MATCHED THEN
INSERT (order_id, amount, updated_at, partition_date)
VALUES (s.order_id, s.amount, s.updated_at, s.partition_date);
Two added predicates and the scan dropped from 14 TB to ~300 GB. Same correctness; 47× less data through the meter.
When MERGE is still the wrong tool
Not every upsert wants a MERGE. If the source is large relative to the target window, the planner often does better with INSERT INTO a staging table, then DELETE + INSERT against the target inside a transaction. Three reasons:
- Shuffles. MERGE forces a shuffle on the ON columns. A straight INSERT into a partitioned table writes directly.
- Streaming buffer interactions. Recently-streamed rows can confuse MERGE matching for tens of minutes after ingest.
- Materialised view refresh. MERGE invalidates the dependent materialised views even when the rewrite was a no-op for the subset the view cares about.
For one Tata pipeline we replaced six MERGE jobs with INSERT-then- DELETE flows and dropped daily cost by another 18%.
The accidental architecture
The biggest lesson from the engagement: an SQL pattern that compiles and gives correct answers is not a pattern that is cheap. The “idiomatic” MERGE was idiomatic for the textbook example (small source, small target). At our scale it was the most expensive query shape we had. We caught it because the bill made it caught us.
What landed
- 19 MERGE jobs rewritten with partition-aware predicates.
- 6 MERGE jobs replaced with INSERT-then-DELETE.
- ~₹6 Cr / month removed from the BigQuery bytes-billed line.
- Lint rule in CI: no MERGE without a partition predicate on the target.
This was one lever out of about a dozen. The 57% figure on the engagement was the sum of MERGE rewrites, capacity-based slot adoption, storage tier changes, dashboard pre-aggregation, and a materialised-view audit. The MERGE rewrite was the loudest single contributor.
What to do tomorrow
If you run BigQuery at scale:
SELECT user_email, job_id, total_bytes_billed FROM region-XX.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE statement_type='MERGE' AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) ORDER BY total_bytes_billed DESC LIMIT 50;- For each top hit, check the ON clause. If the partition column isn’t there, fix it before the next billing window.
- Add the lint rule. The pattern keeps coming back unless you stop it at PR time.
The savings compound across every nightly run, every backfill, every re-run after a failure. A single afternoon of grep and rewrite paid for a quarter of engineering time.