·2 min read·← All posts
RAG HyDE Retrieval

The vector mismatch problem

In a RAG system, the documents are answers. The queries are questions. An embedding model maps similar-meaning text to similar vectors — but a question and its answer aren’t worded similarly.

Question: “What are the symptoms of vitamin D deficiency?” Answer in the corpus: “Hypocalcaemia, bone pain, muscle weakness, fatigue, and depression are clinical features of inadequate cholecalciferol levels.”

The question vector and the answer vector aren’t very close. Retrieval misses or rank-orders poorly.

What HyDE does

  1. Ask the LLM to generate a hypothetical answer to the question.
  2. Embed the hypothetical answer.
  3. Retrieve documents similar to the hypothetical answer.
  4. Pass the retrieved documents to the LLM for the real answer.

Even when the hypothetical answer is wrong on facts, its shape — vocabulary, structure, technical level — matches real answers in the corpus better than the question did.

What it looks like

func hydeRetrieve(ctx context.Context, query string) []Chunk {
    // Step 1: hypothetical answer
    hypothetical, _ := llm.Generate(ctx,
        fmt.Sprintf("Write a 3-sentence answer to: %s\n\nAnswer:", query))

    // Step 2: embed the hypothetical
    hypEmb := embed.Generate(ctx, hypothetical)

    // Step 3: retrieve on the hypothetical
    return db.NearestK(ctx, hypEmb, k=5)
}

Two LLM calls per query (hypothetical + final answer) versus one for plain RAG. The cost-to-quality trade is usually favourable.

When the win is biggest

When it doesn’t help

Numbers from a client engagement

For Bancnet’s customer-support copilot, switching from direct retrieval to HyDE improved citation-relevance scores by about 18% on a 200-question evaluation set. Latency went up by ~120ms per query (the hypothetical generation), which the team accepted given the quality bump.

Cost: ~30% more LLM tokens per query. For a low-volume specialist workload, fine. For a high-volume general-purpose chatbot, the per-token cost matters more.

The rule I use: try HyDE if retrieval quality is your bottleneck. If your bottleneck is latency or cost, skip.

← Back to all posts