Performance — Blog — Pratik Dhanave

May 12, 2026 · Engineering

Primary-key design for Cloud Spanner — preventing write hotspots, 40-60% performance gains

Spanner partitions by primary-key range. A monotonically-increasing PK like a timestamp or UUID-v1 funnels all writes to one server. The fix changes everything from your sequence strategy to your tenant model.

SpannerDatabase DesignPerformanceGo

May 05, 2026 · Engineering

Picnic — cutting API latency 47% by consolidating microservices behind protobuf contracts

The Picnic social platform served 1M+ users across a graph of Go microservices behind a GraphQL gateway. The latency win came from a counter-intuitive move: fewer services, tighter contracts.

GogRPCGraphQLMicroservicesPerformance

Mar 26, 2026 · Engineering

Ardan Ultimate AI #19 — Speculative decoding with a draft model

Run a small draft model to predict several tokens at once; verify them in a single pass with the large model. Latency drops without quality dropping. The technique production LLM serving uses but most application engineers don't see.

Ardan LabsGoLLM OpsPerformance

Mar 25, 2026 · Engineering

Ardan Ultimate AI #18 — Incremental message caching (IMC) for chat

A long chat reprocesses the entire history on every turn. Prefix caching lets the LLM serve the cached KV-cache prefix from the previous turn and only compute the new suffix. Massive latency win on long conversations.

Ardan LabsGoLLM OpsPerformance

Mar 18, 2026 · Engineering

Ardan Ultimate AI #11 — RAG performance: parallel and batched embeddings, response cache

A simple RAG pipeline embeds documents one at a time. The performant version batches the embeddings, parallelises the chunks, and caches the responses. Throughput goes up 5-10×.

Ardan LabsGoRAGPerformance

#Performance

Primary-key design for Cloud Spanner — preventing write hotspots, 40-60% performance gains

Picnic — cutting API latency 47% by consolidating microservices behind protobuf contracts

Ardan Ultimate AI #19 — Speculative decoding with a draft model

Ardan Ultimate AI #18 — Incremental message caching (IMC) for chat

Ardan Ultimate AI #11 — RAG performance: parallel and batched embeddings, response cache