Picnic — the protobuf consolidation win
The Picnic social platform served 1M+ users on a graph of Go microservices behind a GraphQL gateway. The headline performance win was a 47% reduction in median API response latency. The counter-intuitive part: we got there by fewer services, not more.
The starting point
The system had grown organically into ~15 microservices. Each one spoke gRPC; each one had its own JSON-over-HTTP for legacy callers; the GraphQL gateway translated front-end queries into fan-out gRPC calls.
The median front-end query touched 4-6 services. P95 latency was dominated by the slowest service in the chain plus the serialisation overhead at each hop.
What we measured
Before changing anything, we instrumented and measured. The latency budget for the median front-end query broke down roughly like this:
Serialisation: 32% ← JSON ↔ Go structs at every hop
Network: 18% ← inter-service RPC, mostly intra-pod
Application: 28% ← actual business logic
DB query: 15% ← Spanner reads
Misc: 7%
Almost a third of the latency was JSON serialisation. The system had grown up speaking JSON between services because that’s what the original Node.js prototype used. The Go rewrite preserved the JSON-over-HTTP shape because changing it felt big.
The first move — JSON to protobuf, everywhere
We migrated every inter-service call to protobuf-over-gRPC. The gRPC stack was already there; the JSON paths were ripped out.
This took about three months of careful work because every service had multiple callers, the rollout had to be staged, and the conversion functions had to be tested per-field. We used buf for schema management; the schema repo became the authoritative contract.
Serialisation budget after: ~9%, down from 32%. End-to-end median latency dropped ~24%.
The second move — service consolidation
The serialisation win exposed the next pattern: several services were so tightly coupled that they were really one service split across deployment boundaries.
The classic example: a “user profile” service that read from a “user accounts” service on every call. They were always deployed together; they always scaled together; the only thing they didn’t share was a process boundary. The process boundary cost us a network hop and added failure modes (one could be up while the other was down).
We merged them. Not back into a monolith — the still-distinct services that had genuinely different scaling profiles stayed separate. But the artificially-split services collapsed into single processes that exposed a unified gRPC interface to the gateway.
The consolidation took the average front-end query from 4-6 hops to 3-4. End-to-end median latency dropped another ~23%, bringing the total to ~47%.
What I’d been wrong about
I’d been carrying the assumption that “more services = better microservices.” Microservices are right when service boundaries match team boundaries, scaling boundaries, or failure boundaries. The Picnic services that hurt us were split along none of those axes — they were split because someone started writing the next piece of functionality as a new service out of habit.
The consolidation didn’t remove any team’s autonomy; it removed imaginary boundaries that cost us in latency and complexity.
The contract repo
The buf-managed schema repo became the most-discussed artefact on the team. Every cross-service contract change went through it.
Two patterns that helped:
-
Buf breaking-change linting in CI. A PR that breaks backwards compat fails CI unless the author explicitly marks the change as breaking. The friction is high enough that accidental breaks don’t ship.
-
Schema review as a separate PR. A change that touched a service contract was always two PRs: schema first (reviewed by the affected teams), implementation second (smaller, less contested). The two-PR flow halved review time per change.
What didn’t help
Tempting moves we considered and skipped:
-
Service mesh. The team was tempted to adopt Istio for observability and traffic management. The cost-benefit on a pre-consolidation 15-service deployment was unclear; on a post-consolidation 8-service deployment, it was negative. Native Go libraries + OpenTelemetry covered our needs.
-
GraphQL stitching. The gateway was a custom Go implementation; someone proposed adopting Apollo Federation. We tried it; the added latency from the federation gateway negated the consolidation wins. We kept the custom gateway.
-
Caching layer. Redis in front of the slow services. We were about to ship this when the consolidation work removed the need — the formerly-slow services were now hot Go memory away from their callers.
Test coverage as the safety net
The migration to protobuf and the service consolidation were both broad-blast-radius changes. The team had pushed test coverage to 80%+ before the work started; that coverage was the only reason the migrations were safe.
A typical migration day:
- Open PR with the schema change.
- CI runs full integration test suite (~10 minutes).
- Land schema.
- Open PR with the implementation change.
- CI runs full suite again.
- Land implementation.
- Canary deploy to 5% of pods.
- Monitor for an hour.
- Roll out fully.
Without the test coverage, step 5 would have been “hope”. With coverage, step 5 was a useful signal.
The Prometheus story
The other measurement piece: Prometheus dashboards for every service. Before the dashboards, incident detection took days. After, it took minutes — partly because of the dashboards, partly because the consolidated architecture had fewer signals to triangulate.
The dashboards became the team’s home page. Every standup started with a glance at the dashboards. Issues that would have lived for days as “the user reported X” became “the dashboard shows X” within minutes.
What I’d carry forward
Two convictions from Picnic:
-
Serialisation cost is invisible until you measure it. Run a pprof CPU profile against a few real production requests. Serialisation will be in the top 5; it usually is.
-
Microservices boundaries should match team / scaling / failure boundaries. Anything else is a tax. When in doubt, keep code in the same process. Process boundaries are easy to add later if the shape is wrong; they’re hard to remove.
The 47% latency win was the headline. The deeper learning was about the cost of imaginary boundaries.