Ardan Ultimate AI #10 — Interactive RAG REPL end-to-end

Field notes from working through example 10 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

Pull together ingestion + retrieval + LLM + chat history into one interactive Go REPL. The reader types questions; the program retrieves, calls the LLM, prints the answer with citations, accepts follow-ups.

What it looks like

scanner := bufio.NewScanner(os.Stdin)
history := []Message{}

for {
    fmt.Print("> ")
    if !scanner.Scan() { break }
    question := scanner.Text()

    hits := db.Search(ctx, embed.Generate(question), k=5)
    context := buildContext(hits)

    messages := append(history,
        Message{Role: "user", Content: question + "\n\n" + context},
    )

    resp := llm.Chat(ctx, messages)
    fmt.Println(resp.Content)

    history = append(history, messages[len(messages)-1], Message{Role: "assistant", Content: resp.Content})
}

What I learned

Conversation history makes RAG dramatically better. A standalone retrieval gets the right chunks. A history-aware retrieval understands “and what about X?” referring to the previous turn. The cost is some context juggling; the win is huge for usability.

Citations as part of the prompt template. The LLM has to be instructed to cite the chunks it uses. Without the instruction it’ll synthesise an answer and the chunks are decorative. With the instruction it’s “according to [source 3]…”

Production connection

Bancnet’s customer-support copilot (UAE / Saudi open banking) uses this exact shape — REPL replaced with a chat UI, citations rendered as expandable references the operator can verify. The Ardan REPL is the dev-mode equivalent of the production UX.

Credit & reference. This post is field notes on example 10 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example10-rag-end-to-end/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.