HyDE — generate a hypothetical answer to improve retrieval

The vector mismatch problem

In a RAG system, the documents are answers. The queries are questions. An embedding model maps similar-meaning text to similar vectors — but a question and its answer aren’t worded similarly.

Question: “What are the symptoms of vitamin D deficiency?” Answer in the corpus: “Hypocalcaemia, bone pain, muscle weakness, fatigue, and depression are clinical features of inadequate cholecalciferol levels.”

The question vector and the answer vector aren’t very close. Retrieval misses or rank-orders poorly.

What HyDE does

Ask the LLM to generate a hypothetical answer to the question.
Embed the hypothetical answer.
Retrieve documents similar to the hypothetical answer.
Pass the retrieved documents to the LLM for the real answer.

Even when the hypothetical answer is wrong on facts, its shape — vocabulary, structure, technical level — matches real answers in the corpus better than the question did.

What it looks like

func hydeRetrieve(ctx context.Context, query string) []Chunk {
    // Step 1: hypothetical answer
    hypothetical, _ := llm.Generate(ctx,
        fmt.Sprintf("Write a 3-sentence answer to: %s\n\nAnswer:", query))

    // Step 2: embed the hypothetical
    hypEmb := embed.Generate(ctx, hypothetical)

    // Step 3: retrieve on the hypothetical
    return db.NearestK(ctx, hypEmb, k=5)
}

Two LLM calls per query (hypothetical + final answer) versus one for plain RAG. The cost-to-quality trade is usually favourable.

When the win is biggest

Medical / scientific / legal domains where the question’s vocabulary and the answer’s vocabulary differ sharply.
Multilingual retrieval where the question is in one language and the corpus is in another (the hypothetical answer can match the corpus’s language).
Short questions over long answers. “What’s the policy?” is too vague for direct retrieval; a hypothetical answer disambiguates by domain.

When it doesn’t help

Direct lookup questions (“show me document XYZ”). The question’s vocabulary matches the corpus already.
Very large corpora where retrieval is already near-perfect. The marginal improvement isn’t worth the extra LLM call.

Numbers from a client engagement

For Bancnet’s customer-support copilot, switching from direct retrieval to HyDE improved citation-relevance scores by about 18% on a 200-question evaluation set. Latency went up by ~120ms per query (the hypothetical generation), which the team accepted given the quality bump.

Cost: ~30% more LLM tokens per query. For a low-volume specialist workload, fine. For a high-volume general-purpose chatbot, the per-token cost matters more.

The rule I use: try HyDE if retrieval quality is your bottleneck. If your bottleneck is latency or cost, skip.