Field notes from working through example 08 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.
What the example teaches
The full RAG pipeline, stitched together against a real document (the Ardan Go notebook). All the pieces from previous examples now run end-to-end:
- Chunk the notebook by section.
- Embed each chunk.
- Insert into pgvector with metadata (page number, section title).
- At query time: embed query, nearest-K search, build prompt with citations, LLM answers.
What it looks like
The orchestration is straightforward once the pieces work:
func answer(ctx context.Context, query string) (Answer, error) {
queryEmb, err := embed.Generate(ctx, query)
if err != nil { return Answer{}, err }
hits := db.NearestK(ctx, queryEmb, k=5)
context := buildContext(hits)
prompt := fmt.Sprintf(answerPromptTemplate, query, context)
resp, err := llm.Generate(ctx, prompt)
if err != nil { return Answer{}, err }
return Answer{
Text: resp.Text,
Citations: extractCitations(resp.Text, hits),
}, nil
}
What I learned
The plumbing is straightforward; the prompt is where the work is. A bad prompt template makes the LLM ignore the context. A good one makes it cite specifically. Iterate on the prompt with a small evaluation set before scaling.
Citation extraction is its own problem. The LLM produces text that references “[source 2]”; the application has to map “[source 2]” back to the original chunk to render the citation as a link. The example shows a robust regex-based extractor.
Production connection
This is the shape of Genie’s RAG for the document Q&A endpoint. Same chunking, same embedding model, same prompt template. Direct lift.
Credit & reference. This post is field notes on example 08 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example08-rag-pipeline/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.