Spanner interleaving — when and when not

Interleaving is one of Spanner’s most useful features and one of its most over-used. Here is the decision matrix I use, with the trade-offs both directions.

What it does

A normal parent-child relationship in Spanner puts the parent and child on separate ranges, often on separate servers. A query that joins them does a distributed join.

An interleaved child stores its rows physically next to the parent row. Joins become local reads. Throughput on join-heavy workloads goes up. Latency on those joins goes down.

CREATE TABLE customers (
  customer_id INT64 NOT NULL,
  ...
) PRIMARY KEY (customer_id);

CREATE TABLE orders (
  customer_id INT64 NOT NULL,
  order_id    INT64 NOT NULL,
  ...
) PRIMARY KEY (customer_id, order_id),
  INTERLEAVE IN PARENT customers ON DELETE CASCADE;

orders rows for customer_id = 42 live physically next to customers row 42.

When to interleave

Use it when:

Joins on the parent-child relationship are hot. The most common query pattern is “customer + their orders.” Co-location makes those queries cheap.
Child cardinality is bounded. A customer with a few hundred orders is fine. A customer with millions of orders puts too much weight on the parent’s storage row.
The lifecycle is shared. When a customer goes, the orders should too. ON DELETE CASCADE makes this clean.
The application reads them together. If 90% of order reads include the customer context, interleaving pays. If 90% don’t, you’re paying for co-location you don’t use.

When NOT to interleave

Don’t, when:

Child cardinality is unbounded. One parent with millions of children creates a hot row.
The child has its own access patterns. If “all orders for product P” is a common query and you’ve interleaved orders into customers, you’ll do a fan-out scan across every customer shard. A non-interleaved child with a secondary index on product_id is faster.
The relationship might change. Interleaving is a physical layout decision. Un-interleaving later requires a table rebuild. If you’re not certain the relationship is permanent, don’t commit physical layout to it.
You want independent backup / restore. Interleaved tables travel with the parent. If you need to restore just the order table without touching customers, the interleaving makes that awkward.

The migration tool’s recommendation

The Spanner Migration Tool’s schema review surfaces interleaving candidates based on:

Foreign key relationship in the source schema.
Child table size relative to parent (bounded if child rows per parent < ~10K).
Whether the source had cascading deletes.

The tool’s recommendation is a starting point. The migration is the right moment to make the decision — afterwards it’s a table rebuild.

The hidden tax: secondary indexes

Interleaved tables can have interleaved secondary indexes:

CREATE INDEX orders_by_status
  ON orders (status)
  INTERLEAVE IN customers;

This co-locates the index with the parent. Good for “all orders with status X for customer Y” queries. Bad for “all orders with status X across all customers” — the latter is a fan-out across every parent shard.

A non-interleaved index on status alone is faster for the cross-customer query but doesn’t have the locality for the per-customer query.

Sometimes the right answer is both indexes. It costs storage; it buys flexibility.

The numbers from one migration

For one client we migrated 14 tables. The schema review identified 6 strong interleaving candidates. After migration, on representative workloads:

Query pattern	Before	After	Note
Single-customer history	180ms	22ms	Interleaved orders + payments
Cross-customer reporting	1.2s	1.4s	Slight regression — the index fan-out cost
Bulk export	8.4s	6.1s	Locality helped the scan

We accepted the cross-customer regression because it ran once a day off-hours and the single-customer wins ran thousands of times per minute.

What to verify before you commit

Run the actual top-10 queries against a test cluster with both shapes (interleaved and not) before deciding. Spanner’s behaviour on small datasets does not predict its behaviour at scale; do the test with at least the realistic production volume.

The test query I run first:

SELECT *
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = @id;

Look at the execution plan. With interleaving, Spanner should report a local read. Without, a distributed scan. The latency difference is usually 5-10× on workloads where the join is hot.

The summary

Interleaving is the right call about a third of the time for parent-child relationships. It’s wrong half the time because the child has independent access patterns; it’s wrong the rest of the time because the cardinality isn’t bounded.

When you get it right, it’s the cheapest performance win in your schema. When you get it wrong, you’ve baked a wrong physical layout into a system that’s hard to change. The migration moment is the easiest time to decide; do it carefully, and ship the decision in the schema review PR with the reasoning written down.