GraphRAG — when a knowledge graph beats vector search

Where vector RAG falls short

Vector search retrieves chunks similar to the query. Great for “what does the contract say about late fees” (chunk-local). Bad for “what’s the relationship between the parties named in the contract” (corpus-spanning).

The second question needs to know that party A and party B are both named, that they’re in a vendor-customer relationship, and that there are other clauses elsewhere governing that relationship. Vector search doesn’t model relationships — only similarities.

What GraphRAG does

GraphRAG (Microsoft Research’s name; the pattern existed earlier):

Entity extraction. Run the corpus through an LLM to identify entities (people, companies, concepts) and the relationships between them.
Community detection. Cluster entities into communities (Leiden algorithm, typically).
Community summaries. LLM-generated summary of each community.
At query time. Decide whether the query is global (use community summaries) or local (use vector retrieval). Hybrid for “spans both.”

What it looks like

Corpus → Entity extraction → Entity graph + Relationships
                                    ↓
                             Community detection
                                    ↓
                             Community summaries
                                    ↓
                             Stored alongside vector index

Query → classifier → if global: search community summaries
                    if local: vector retrieval
                    if hybrid: both, merged

The wins

For a private-equity client analysing acquisition targets, vector search returned passages about each company. GraphRAG returned: “Company A and B share three board members; A’s revenue depends on B (52% of A’s COGS); B is undergoing regulatory scrutiny in the EU.” None of those facts existed in a single chunk; they emerged from the graph.

For a legal-discovery use case, GraphRAG found that two parties had a prior contractual relationship that wasn’t named in the immediate documents but was inferred from the entity graph built across the broader corpus.

When NOT to use it

Small corpus. GraphRAG’s ingestion cost is significant; for a few hundred documents, it doesn’t pay back.
High-velocity corpus. Re-running entity extraction on every update is expensive; if your corpus changes hourly, the graph is always stale.
Question shape is uniformly chunk-local. “What does this say about X” — vector wins.

Genie’s roadmap

The current pkg/rag ships hybrid search (vector + BM25 + RRF). GraphRAG is the next-tier addition for the legal and compliance workloads we have customers asking about. The implementation slot is pkg/rag/graphrag/; the entity extraction layer uses Gemini or the local Ollama model depending on the sovereignty class of the corpus.

For your project: if you’re answering “what’s the relationship between X and Y” or “summarise this corpus thematically” questions, GraphRAG is worth the ingestion cost. If you’re answering “what does the document say about X,” vector is enough.