Spanner interleaving — when and when not
Interleaving is one of Spanner’s most useful features and one of its most over-used. Here is the decision matrix I use, with the trade-offs both directions.
What it does
A normal parent-child relationship in Spanner puts the parent and child on separate ranges, often on separate servers. A query that joins them does a distributed join.
An interleaved child stores its rows physically next to the parent row. Joins become local reads. Throughput on join-heavy workloads goes up. Latency on those joins goes down.
CREATE TABLE customers (
customer_id INT64 NOT NULL,
...
) PRIMARY KEY (customer_id);
CREATE TABLE orders (
customer_id INT64 NOT NULL,
order_id INT64 NOT NULL,
...
) PRIMARY KEY (customer_id, order_id),
INTERLEAVE IN PARENT customers ON DELETE CASCADE;
orders rows for customer_id = 42 live physically next to
customers row 42.
When to interleave
Use it when:
- Joins on the parent-child relationship are hot. The most common query pattern is “customer + their orders.” Co-location makes those queries cheap.
- Child cardinality is bounded. A customer with a few hundred orders is fine. A customer with millions of orders puts too much weight on the parent’s storage row.
- The lifecycle is shared. When a customer goes, the orders
should too.
ON DELETE CASCADEmakes this clean. - The application reads them together. If 90% of order reads include the customer context, interleaving pays. If 90% don’t, you’re paying for co-location you don’t use.
When NOT to interleave
Don’t, when:
- Child cardinality is unbounded. One parent with millions of children creates a hot row.
- The child has its own access patterns. If “all orders for
product P” is a common query and you’ve interleaved orders
into customers, you’ll do a fan-out scan across every customer
shard. A non-interleaved child with a secondary index on
product_idis faster. - The relationship might change. Interleaving is a physical layout decision. Un-interleaving later requires a table rebuild. If you’re not certain the relationship is permanent, don’t commit physical layout to it.
- You want independent backup / restore. Interleaved tables travel with the parent. If you need to restore just the order table without touching customers, the interleaving makes that awkward.
The migration tool’s recommendation
The Spanner Migration Tool’s schema review surfaces interleaving candidates based on:
- Foreign key relationship in the source schema.
- Child table size relative to parent (bounded if child rows per parent < ~10K).
- Whether the source had cascading deletes.
The tool’s recommendation is a starting point. The migration is the right moment to make the decision — afterwards it’s a table rebuild.
The hidden tax: secondary indexes
Interleaved tables can have interleaved secondary indexes:
CREATE INDEX orders_by_status
ON orders (status)
INTERLEAVE IN customers;
This co-locates the index with the parent. Good for “all orders with status X for customer Y” queries. Bad for “all orders with status X across all customers” — the latter is a fan-out across every parent shard.
A non-interleaved index on status alone is faster for the
cross-customer query but doesn’t have the locality for the
per-customer query.
Sometimes the right answer is both indexes. It costs storage; it buys flexibility.
The numbers from one migration
For one client we migrated 14 tables. The schema review identified 6 strong interleaving candidates. After migration, on representative workloads:
| Query pattern | Before | After | Note |
|---|---|---|---|
| Single-customer history | 180ms | 22ms | Interleaved orders + payments |
| Cross-customer reporting | 1.2s | 1.4s | Slight regression — the index fan-out cost |
| Bulk export | 8.4s | 6.1s | Locality helped the scan |
We accepted the cross-customer regression because it ran once a day off-hours and the single-customer wins ran thousands of times per minute.
What to verify before you commit
Run the actual top-10 queries against a test cluster with both shapes (interleaved and not) before deciding. Spanner’s behaviour on small datasets does not predict its behaviour at scale; do the test with at least the realistic production volume.
The test query I run first:
SELECT *
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = @id;
Look at the execution plan. With interleaving, Spanner should report a local read. Without, a distributed scan. The latency difference is usually 5-10× on workloads where the join is hot.
The summary
Interleaving is the right call about a third of the time for parent-child relationships. It’s wrong half the time because the child has independent access patterns; it’s wrong the rest of the time because the cardinality isn’t bounded.
When you get it right, it’s the cheapest performance win in your schema. When you get it wrong, you’ve baked a wrong physical layout into a system that’s hard to change. The migration moment is the easiest time to decide; do it carefully, and ship the decision in the schema review PR with the reasoning written down.