· 2 min read · ← All posts
Ardan Labs Go RAG Cost Optimisation

Field notes from working through example 21 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

Default RAG pipelines run retrieval on every query. Half the queries don’t need it (“hi”, “thanks”, “can you explain that?”). The classifier decides.

What it looks like

class := classifier.Classify(query)

switch class.Kind {
case KindGreeting, KindClarification, KindGeneralKnowledge:
    // skip RAG
    resp := llm.Chat(ctx, messages)

case KindFactual, KindDocumentSpecific:
    hits := retriever.Search(ctx, query, k=5)
    resp := llm.ChatWithContext(ctx, messages, hits)
}

What I learned

A small classifier (50ms) saves a big retrieval (200-500ms). The latency math is in favor of the gate. So is the token math — RAG-extended prompts are 3-5× larger.

Misclassification cost is asymmetric. Skipping RAG when you should have run it → wrong answer (bad). Running RAG when you didn’t need to → slightly slower correct answer (fine). Bias the classifier toward running RAG on the uncertain cases.

Production connection

For a customer-support copilot we built, the adaptive gate cut average response latency by ~40% and average cost by ~55%. The classifier was a tiny fine-tuned BERT model running locally. Pattern from Ardan; production tuning ours.


Credit & reference. This post is field notes on example 21 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example21-adaptive-retrieval/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.

← Back to all posts