Ardan Ultimate AI #09 — Debugging retrieval in isolation (K and threshold)

Field notes from working through example 09 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

Most RAG failures look like “the LLM hallucinated.” 80% of the time, the real problem is that the retrieval surfaced wrong (or no) chunks and the LLM did the best it could with bad inputs.

The debugging shape: query → retrieval → print the top-K chunks with scores → human eyeballs the result → tune K and the score threshold.

What it looks like

func debugRetrieve(query string, k int) {
    emb := embed.Generate(query)
    hits := db.NearestK(emb, k)

    fmt.Printf("Query: %s\n\n", query)
    for i, h := range hits {
        fmt.Printf("[%d] score=%.3f doc=%s\n", i+1, h.Score, h.DocID)
        fmt.Printf("    %s\n\n", truncate(h.Text, 200))
    }
}

What I learned

The first three results should be obviously relevant. If they’re not, retrieval is broken. No prompt engineering will save you; fix retrieval first.

Score distributions matter more than the top score. If the top hit is 0.85 and the rest are 0.84, 0.83, 0.83 → retrieval is fuzzy; chunks are too similar. If the top is 0.85 and the rest are 0.65, 0.63, 0.60 → retrieval has clear winners; trust the top hit.

Production connection

The single most useful debugging tool when working on a RAG system. I run it on every “the answer is wrong” report before opening a prompt or model issue. 80% of the time the fix is in retrieval (chunk size, embedding model, query rewriting); the LLM is fine.

Credit & reference. This post is field notes on example 09 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example09-retrieval-debug/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.