askill
hyde-retrieval

hyde-retrievalSafety 90Repository

HyDE (Hypothetical Document Embeddings) for improved semantic retrieval. Use when queries don't match document vocabulary, retrieval quality is poor, or implementing advanced RAG patterns.

0 stars
1.2k downloads
Updated 2/5/2026

Package Files

Loading files...
SKILL.md

HyDE (Hypothetical Document Embeddings)

Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.

The Problem

Direct query embedding often fails due to vocabulary mismatch:

Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance

The Solution

Instead of embedding the query, generate a hypothetical answer document:

Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
   messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology

Implementation

from openai import AsyncOpenAI
from pydantic import BaseModel, Field

class HyDEResult(BaseModel):
    """Result of HyDE generation."""
    original_query: str
    hypothetical_doc: str
    embedding: list[float]

async def generate_hyde(
    query: str,
    llm: AsyncOpenAI,
    embed_fn: callable,
    max_tokens: int = 150,
) -> HyDEResult:
    """Generate hypothetical document and embed it."""

    # Generate hypothetical answer
    response = await llm.chat.completions.create(
        model="gpt-4o-mini",  # Fast, cheap model
        messages=[
            {"role": "system", "content":
                "Write a short paragraph that would answer this query. "
                "Use technical terminology that documentation would use."},
            {"role": "user", "content": query}
        ],
        max_tokens=max_tokens,
        temperature=0.3,  # Low temp for consistency
    )

    hypothetical_doc = response.choices[0].message.content

    # Embed the hypothetical document (not the query!)
    embedding = await embed_fn(hypothetical_doc)

    return HyDEResult(
        original_query=query,
        hypothetical_doc=hypothetical_doc,
        embedding=embedding,
    )

With Caching

from functools import lru_cache
import hashlib

class HyDEService:
    def __init__(self, llm, embed_fn):
        self.llm = llm
        self.embed_fn = embed_fn
        self._cache: dict[str, HyDEResult] = {}

    def _cache_key(self, query: str) -> str:
        return hashlib.md5(query.lower().strip().encode()).hexdigest()

    async def generate(self, query: str) -> HyDEResult:
        key = self._cache_key(query)

        if key in self._cache:
            return self._cache[key]

        result = await generate_hyde(query, self.llm, self.embed_fn)
        self._cache[key] = result
        return result

Per-Concept HyDE (Advanced)

For multi-concept queries, generate HyDE for each concept:

async def batch_hyde(
    concepts: list[str],
    hyde_service: HyDEService,
) -> list[HyDEResult]:
    """Generate HyDE embeddings for multiple concepts in parallel."""
    import asyncio

    tasks = [hyde_service.generate(concept) for concept in concepts]
    return await asyncio.gather(*tasks)

Overview

ScenarioUse HyDE?
Abstract/conceptual queriesYes
Exact term searchesNo (use keyword)
Code snippet searchesNo
Natural language questionsYes
Vocabulary mismatch suspectedYes

Fallback Strategy

async def hyde_with_fallback(
    query: str,
    hyde_service: HyDEService,
    embed_fn: callable,
    timeout: float = 3.0,
) -> list[float]:
    """HyDE with fallback to direct embedding on timeout."""
    import asyncio

    try:
        async with asyncio.timeout(timeout):
            result = await hyde_service.generate(query)
            return result.embedding
    except TimeoutError:
        # Fallback to direct query embedding
        return await embed_fn(query)

Performance Tips

  • Use fast model (gpt-4o-mini, claude-3-haiku) for generation
  • Cache aggressively (queries often repeat)
  • Set tight timeouts (2-3s) with fallback
  • Keep hypothetical docs concise (100-200 tokens)
  • Combine with query decomposition for best results

Related Skills

  • rag-retrieval - Core RAG patterns that HyDE enhances for better retrieval
  • embeddings - Embedding models used to embed hypothetical documents
  • query-decomposition - Complementary technique for multi-concept queries
  • semantic-caching - Cache HyDE results to avoid repeated LLM calls

Key Decisions

DecisionChoiceRationale
Generation modelgpt-4o-mini / claude-3-haikuFast and cheap for hypothetical doc generation
Temperature0.3Low temperature for consistent, factual hypothetical docs
Max tokens100-200Concise docs match embedding sweet spot
Timeout with fallback2-3 secondsGraceful degradation to direct query embedding

References

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/12/2026

A high-quality, comprehensive guide to implementing Hypothetical Document Embeddings (HyDE) for RAG. Includes clear code examples, caching strategies, and performance considerations.

90
95
90
95
95

Metadata

Licenseunknown
Version1.0.0
Updated2/5/2026
Publishermajiayu000

Tags

llm