Field notes from working through example 09 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.
What the example teaches
Most RAG failures look like “the LLM hallucinated.” 80% of the time, the real problem is that the retrieval surfaced wrong (or no) chunks and the LLM did the best it could with bad inputs.
The debugging shape: query → retrieval → print the top-K chunks with scores → human eyeballs the result → tune K and the score threshold.
What it looks like
func debugRetrieve(query string, k int) {
emb := embed.Generate(query)
hits := db.NearestK(emb, k)
fmt.Printf("Query: %s\n\n", query)
for i, h := range hits {
fmt.Printf("[%d] score=%.3f doc=%s\n", i+1, h.Score, h.DocID)
fmt.Printf(" %s\n\n", truncate(h.Text, 200))
}
}
What I learned
The first three results should be obviously relevant. If they’re not, retrieval is broken. No prompt engineering will save you; fix retrieval first.
Score distributions matter more than the top score. If the top hit is 0.85 and the rest are 0.84, 0.83, 0.83 → retrieval is fuzzy; chunks are too similar. If the top is 0.85 and the rest are 0.65, 0.63, 0.60 → retrieval has clear winners; trust the top hit.
Production connection
The single most useful debugging tool when working on a RAG system. I run it on every “the answer is wrong” report before opening a prompt or model issue. 80% of the time the fix is in retrieval (chunk size, embedding model, query rewriting); the LLM is fine.
Credit & reference. This post is field notes on example 09 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example09-retrieval-debug/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.