The vector mismatch problem
In a RAG system, the documents are answers. The queries are questions. An embedding model maps similar-meaning text to similar vectors — but a question and its answer aren’t worded similarly.
Question: “What are the symptoms of vitamin D deficiency?” Answer in the corpus: “Hypocalcaemia, bone pain, muscle weakness, fatigue, and depression are clinical features of inadequate cholecalciferol levels.”
The question vector and the answer vector aren’t very close. Retrieval misses or rank-orders poorly.
What HyDE does
- Ask the LLM to generate a hypothetical answer to the question.
- Embed the hypothetical answer.
- Retrieve documents similar to the hypothetical answer.
- Pass the retrieved documents to the LLM for the real answer.
Even when the hypothetical answer is wrong on facts, its shape — vocabulary, structure, technical level — matches real answers in the corpus better than the question did.
What it looks like
func hydeRetrieve(ctx context.Context, query string) []Chunk {
// Step 1: hypothetical answer
hypothetical, _ := llm.Generate(ctx,
fmt.Sprintf("Write a 3-sentence answer to: %s\n\nAnswer:", query))
// Step 2: embed the hypothetical
hypEmb := embed.Generate(ctx, hypothetical)
// Step 3: retrieve on the hypothetical
return db.NearestK(ctx, hypEmb, k=5)
}
Two LLM calls per query (hypothetical + final answer) versus one for plain RAG. The cost-to-quality trade is usually favourable.
When the win is biggest
- Medical / scientific / legal domains where the question’s vocabulary and the answer’s vocabulary differ sharply.
- Multilingual retrieval where the question is in one language and the corpus is in another (the hypothetical answer can match the corpus’s language).
- Short questions over long answers. “What’s the policy?” is too vague for direct retrieval; a hypothetical answer disambiguates by domain.
When it doesn’t help
- Direct lookup questions (“show me document XYZ”). The question’s vocabulary matches the corpus already.
- Very large corpora where retrieval is already near-perfect. The marginal improvement isn’t worth the extra LLM call.
Numbers from a client engagement
For Bancnet’s customer-support copilot, switching from direct retrieval to HyDE improved citation-relevance scores by about 18% on a 200-question evaluation set. Latency went up by ~120ms per query (the hypothetical generation), which the team accepted given the quality bump.
Cost: ~30% more LLM tokens per query. For a low-volume specialist workload, fine. For a high-volume general-purpose chatbot, the per-token cost matters more.
The rule I use: try HyDE if retrieval quality is your bottleneck. If your bottleneck is latency or cost, skip.