Field notes from working through example 20 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.
What the example teaches
Traditional cache: key on the literal prompt string. Miss on every paraphrase.
Semantic cache: embed the prompt, nearest-neighbour against cached embeddings, return the cached answer if similarity > threshold.
What it looks like
func cached(ctx context.Context, prompt string) (string, bool) {
queryEmb := embed.Generate(prompt)
hit := cache.NearestNeighbour(queryEmb, threshold=0.92)
if hit != nil {
return hit.Response, true
}
return "", false
}
func generate(ctx context.Context, prompt string) string {
if resp, ok := cached(ctx, prompt); ok {
return resp
}
resp := llm.Chat(ctx, prompt)
cache.Insert(embed.Generate(prompt), resp)
return resp
}
What I learned
The threshold is the entire game. Too high and you barely cache anything. Too low and the cache returns “close enough” answers that are actually wrong. 0.90-0.95 is the working range; tune per workload.
TTL matters because answers go stale. Static FAQ → long TTL. Recent-data queries → short TTL or no cache. Add a per-entry expiry on insert.
Production connection
Genie’s pkg/llm has a per-prompt response cache (exact match). Adding the semantic layer is on the roadmap — for a customer-support workload it would land a 25-35% additional hit rate. The example is the cleanest implementation I’ve found in Go.
Credit & reference. This post is field notes on example 20 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example20-semantic-cache/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.