Three RAG philosophies
Naive RAG. Every query → retrieve → answer. Simple; uniform cost; sometimes retrieves when it shouldn’t (chitchat) and sometimes doesn’t retrieve enough (the first hits weren’t great).
Self-RAG. The LLM is trained (or prompted) to emit special tokens that signal “I should retrieve now” or “I have what I need.” Retrieval becomes a runtime decision the LLM makes.
CRAG (Corrective RAG). Retrieve normally; then a small classifier scores the retrieved content as “correct / ambiguous / incorrect.” If incorrect, do a web search (or alternate corpus) to correct. If ambiguous, mix.
When each pays back
Self-RAG is the right pick when: - Query distribution mixes chitchat with substantive questions (“hi” doesn’t need retrieval; “what’s the refund policy” does). - LLM cost dwarfs retrieval cost; saving retrieval is cheap, saving LLM calls is expensive. - You have a model that supports the special tokens or can be prompted into the pattern reliably.
CRAG is the right pick when: - Retrieval quality is variable. Some queries get great hits; others get nonsense. - You have a fallback corpus (web search, second-tier KB) that’s worth the cost on the bad-retrieval cases. - You can run a small classifier cheaply (a fine-tuned small model, or a prompted small model).
What CRAG looks like
hits := retriever.Search(ctx, query, k=5)
score := classifier.Score(ctx, query, hits)
switch {
case score.Class == "correct":
return llm.Answer(ctx, query, hits)
case score.Class == "incorrect":
fallback := webSearch.Search(ctx, query)
return llm.Answer(ctx, query, fallback)
case score.Class == "ambiguous":
merged := mergeHits(hits, webSearch.Search(ctx, query))
return llm.Answer(ctx, query, merged)
}
The classifier is the load-bearing part. Cheap to run; precise enough; tunable. We used a fine-tuned BERT for one engagement; a Gemini Flash prompt for another. Both worked.
Production numbers
For a customer-support copilot:
- Naive RAG: 78% correct answers
- Self-RAG: 81% correct, 22% reduction in LLM calls (chitchat skipped retrieval)
- CRAG: 87% correct, 31% increase in cost (web search on misses)
The right choice depends on what you optimise for. Cost-sensitive deployment → Self-RAG. Quality-sensitive → CRAG. Cheapest baseline that works → Naive.
Combine them
The “Adaptive RAG” pattern combines both: Self-RAG for the retrieve-or-not decision, then CRAG for the retrieved content quality. Three classifier runs total (whether to retrieve, retrieved quality, answer-supported), all small.
Genie’s pkg/rag ships naive + hybrid + reranking. CRAG-style corrective retrieval is on the roadmap for the high-stakes use cases (Bancnet, KYC) where wrong retrieval has downstream cost.