Knowledge Graphs + LLMs: Why This Pairing Is Stalling in 2025 The combination of knowledge graphs and large language models (LLMs) in 2023–2025 was promoted as the next step in the evolution of question‑answering and search systems. The idea sounded appealing: structured knowledge graphs (graph‑based knowledge bases) would supply LLMs with verified facts and links between entities, reducing "hallucinations" and enabling models to reason along chains of facts. However, as of September 2025, such pairings remain mostly laboratory experiments, far from large‑scale production solutions. In this article, in plain language, we'll look at why integrating knowledge graphs with LLMs has not yet lived up to expectations, what objective research results exist, and why many experts are skeptical. We will pay special attention to examples from the legal domain and technical literature, where these systems struggle the most. Why try to combine knowledge graphs with LLMs at all? Because each approach has its strengths. A knowledge graph stores information as nodes (objects) and edges (relationships), i.e., in a strict structured form. Such a graph easily answers simple factual queries (for example, "Who is the CEO of company X?"), provides source traceability, and supports complex links between disparate data. On the other hand, an LLM is a model trained on a gigantic text corpus that can process natural language and generate coherent answers. LLMs are flexible in understanding phrasing, but they do not guarantee reliability—they can invent facts (hallucinate) and do not know up‑to‑date data that appeared after training. A natural proposal emerged: use the knowledge graph as the source of facts, and the LLM as the "frontend" that generates human‑readable answers based on those facts. This hybrid was named Graph‑enhanced RAG (Retrieval‑Augmented Generation with a graph), or GraphRAG for short. In a GraphRAG scheme, the system first retrieves related nodes and facts in the graph database in response to a query, then the LLM takes this subset and composes the final answer. It was assumed that: Major players and startups embraced the idea. For example, Microsoft in 2024 released a GraphRAG library for developers. Well‑known graph platforms Neo4j and TigerGraph integrated with LLM agents, trying to adapt their offerings to the new AI wave. Vector databases also began converging: in 2024–2025, Weaviate and the frameworks LlamaIndex and LangChain added modules for working with knowledge graphs. It seemed the market was preparing to move from simple vector search to more "intelligent" graph knowledge. Despite high expectations, in practice the pairing of LLMs with knowledge graphs shows mixed results. Most existing solutions are early‑stage prototypes, often built for academic papers or demos, but not proven effective in the field. Moreover, more and more experts voice healthy skepticism. Here's why. First, research benchmarks show that GraphRAG yields gains only in certain scenarios. For tasks requiring multi‑step reasoning (multi‑hop questions), graphs can indeed help—the model more easily finds a chain of related facts. But for most simple one‑fact queries, traditional RAG based on text search works better. In a recent study (Han et al., 2025), researchers conducted the first systematic comparison of RAG vs. GraphRAG on several standard QA datasets. The results are unambiguous: for single‑hop questions, RAG outperforms GraphRAG, and the advantages of the graph approach emerge only on specially constructed composite questions. For example, on Natural Questions (Wikipedia‑based QA), classic RAG (vector text search) with an Llama‑70B model achieved F1 ≈ 68%, while the GraphRAG variant using triples only scored just ≈ 28–34% F1. Even hybrid GraphRAG (triples + text) reached only ≈ 54% F1 there—noticeably worse than the approach without a graph. Only on HotpotQA (multi‑fact questions) did an advanced GraphRAG variant slightly surpass RAG (≈ 64.6% vs 63.9% F1 on Llama‑70B). In other words, graph methods do not provide substantial gains on most real‑world questions and sometimes even worsen metrics compared to "naive" text search. As the authors note, RAG and GraphRAG are complementary: each has its strengths, but neither fully replaces the other. Second, there are impressive enthusiast case studies, but they should be treated cautiously. For example, Lettria (an AWS partner) claimed that in its internal tests GraphRAG raised answer accuracy from ~50% to 80%, processing complex documents from finance, medicine, engineering, and law. However, this result was obtained on a special hybrid system (graph + vectors) with manual expert evaluation of answers. It's important to understand that such dramatic superiority of graphs is not observed in open academic competitions. On the contrary, independent experiments often record only a minor gain—or parity at best. As one practitioner aptly put it, "graph RAG is still great for blog posts, but still works poorly in the real world". This wry remark reflects the mood of part of the community: there are nice demos and concepts, but real‑world payoff is scarce so far. Why has it been so hard to bring the knowledge‑graph + LLM pairing to life? Here are the main hurdles (many identified in developers' hands‑on attempts and confirmed by studies): To sum up: GraphRAG has turned out to be complex, costly, and temperamental. Even its proponents admit this. For example, Verdantix experts note that GraphRAG pays off only in very complex, multi‑step tasks requiring meticulous audit of reasoning—and only if a company has the resources to maintain an up‑to‑date knowledge graph. In all simpler cases, the "naive" RAG based on vector search remains far more practical and cheaper. In other words, the game is worth the candle only where deep, explicit modeling of relationships is unavoidable; for routine questions, it's needless overhead. It's especially telling to consider Graph+LLM in the context of legal documents and technical literature—these are exactly the areas where such solutions are often attempted, and where their limitations show. Take the legal field. One would think law is an ideal candidate for a knowledge graph: there are clear entities (laws, articles, courts, cases, parties) and relationships between them (citation, repeal, mention, etc.). Unsurprisingly, there have been attempts to build legal assistants based on graphs of statutes or case law. In practice, however, the language of statutes and court decisions is too complex for raw automatic parsing. LLMs without fine‑tuning get confused by legal phrasing. They can extract basic links (e.g., "Ivanov appears in case No. X"), but struggle to capture specifics: who represents whom, what decision was issued, which precedents are cited and with what effect. For the graph to truly reflect important links (e.g., "Case A cites Case B on jurisdiction" or "Article 10 of Law Y amended paragraph 2 of Article 5 of Law Z"), you must manually define the required entity types and relations for the model. That effectively means hand‑crafting an ontology for each sub‑field of law—extremely labor‑intensive. Moreover, legal consulting demands completeness and accuracy. A missed link between cases or a wrong association between statutes can lead to an incorrect conclusion. Imagine a system answering a lawyer's query based on an unverified graph: it might overlook a key precedent or, conversely, cite an unrelated fact. As a result, such products today are used at most as assistants that provide draft references—but not as autonomous search systems. A lawyer must be present to check and correct the AI's results. Without that, by common consensus, such "Graph+LLM" solutions are non‑viable in law—the cost of error is too high and too much manual effort is needed at the knowledge‑preparation stage. No wonder researchers say outright: in critical systems (law, finance, medicine), automatically extracted graphs must be carefully validated by hand. Otherwise, they cannot be trusted. The situation is similar with technical literature—say, scientific papers or engineering manuals. Here a knowledge graph could link concepts (materials, methods, experimental results) and help answer complex questions like: "Which 2021 studies examined the impact of temperature on alloy X, and what conclusions did they reach?" To some extent, such systems have been built (e.g., for navigating medical research). But the difficulties start with the fact that scientific text is rich in nuances that are hard to formalize as triples. An automatic parser will extract entities (names of materials, processes) and perhaps standalone facts (who invented what, which parameter equals what). But logical relations—causal links, constraints, assumptions of the study—escape simple structuring. The result is a graph that is either too shallow (only general facts, with subtleties lost) or bloated to monstrous complexity as it tries to account for all dependencies. For example, to represent an entire experimental cycle for a material, you'd create dozens of nodes (sample, temperature, result, analysis method, etc.) and edges between them. An LLM can try to generate such a structure, but the likelihood of errors is huge. The outcome is either an incomplete graph or one noisy with extra nodes, making retrieval difficult. Of course, in both science and engineering, manually curated knowledge graphs are useful (as fact databases). But automatically building them over large text corpora is a problem not yet solved at a satisfactory quality level. That's why search systems over scientific papers still rely on a combination of classic keyword search and semantic (vector) search, sometimes adding manual labeling. Attempts to create a fully automated Graph+LLM literature search remain experiments of the "bold but naive" variety. Companies are in no hurry to invest, recognizing the amount of fine‑tuning required and the limited payoff. Against this backdrop, many in the industry ask: is this a dead‑end branch? It seems we're trying to apply tools from the pre‑LLM era of NLP (structured knowledge bases, ontologies), developed under a very different understanding of language, to modern LLMs that operate on probabilistic text distributions. Conceptually, an LLM is a "statistical oracle" that predicts the most likely answer based on a huge latent store of knowledge learned from data. Vector databases emerged as a way to partly compensate for the oracle's limitations with up‑to‑date data, but they too turned out to be a palliative: pure vector search suffers from loss of exact details and low recall. Adding graphs attempts to compensate for the vector approach's existing limitations (lack of explicit links, logical inference) with even more complex layers. The result is a wobbly tower of technologies: a neural network, with a vector index on top, and a graph on top of that… It's all hard to maintain and explain. Unsurprisingly, business reacted warily. If a solution requires such a complex infrastructure and doesn't deliver a breakthrough in quality, is it worth investing in? Many think no. Common sense suggests: before building multi‑story combinations of LLMs and old technologies, we should improve the models themselves or find simpler hybrid methods. For example, instead of a bulky ontology‑style knowledge graph, teams increasingly use pragmatic approaches: a combination of classic search (BM25) and embeddings to boost recall, or storing "memory" as raw text, which LLMs can effectively filter. The RAG trend in 2024 showed that we didn't have to discard old methods entirely: vector DBs were combined with good old keywords and ranking, and that worked. Perhaps graphs will also find their niche—for example, in complex autonomous agent systems that need to store long‑term knowledge about their environment (that's exactly where Verdantix saw GraphRAG's potential). But these are specific cases that prove the rule: 90% of tasks are solved perfectly well by simpler means. It's also worth mentioning maintenance and freshness. A knowledge graph is not a one‑time data dump; it requires constant enrichment, error correction, and fact updates. In an era where data get stale quickly, companies find it hard to justify large amounts of manual work for a graph if they can do without it. LLMs, on the other hand, are good in that they can be retrained on fresh data or use updates via RAG. As a result, the out‑of‑the‑box graph approach has inherited all the downsides of the old approach (need for experts, static nature) and added new ones (technical complexity). It's no surprise that investors and product managers are cool to funding large Graph+LLM initiatives, preferring solutions that deliver tangible ROI here and now. Skepticism is further fueled by the fact that big breakthrough claims often don't hold up outside demo conditions—as we saw above in the numbers. While the knowledge graph + LLM pairing is conceptually attractive, in reality it has run into many obstacles. It's September 2025, and we still don't see broad industrial adoption of GraphRAG: almost all such systems are either prototypes or niche deployments with a huge amount of manual work. Healthy skepticism is justified. Essentially, Graph+LLM today is a laboratory experiment with an uncertain outcome. There are serious signs that trying to combine new generative models with methods born in the pre‑LLM era may be a dead‑end. At least for now, the gains from the graph do not offset the associated complexity and risks. This does not mean the idea is doomed: research continues, improved algorithms for building graphs are appearing, and hybrid strategies for choosing between RAG and GraphRAG by task are being devised. There will likely be niches (high‑stakes agent systems, detailed knowledge modeling in specific domains) where such solutions will find use. However, expecting a knowledge graph + LLM to become a universal solution for search and analytics is naive, at least at the current technology level. Without expert involvement and thorough validation, such systems are not ready for the real world, and with expert involvement they lose a significant share of their raison d'être. To sum up, knowledge graphs + LLMs have not become the "Holy Grail" of AI systems. Rather, the industry has come full circle and returned to recognizing the value of simpler, more reliable approaches. As developers joke, the path from keyword search to vector RAG, then to graph RAG, has been a useful journey that revealed the limitations of each method. Perhaps the future will bring a new generation of knowledge representations (some speak of "ontologies 2.0" or semantic bases like the aforementioned SemDB). But investing effort in a solution cobbled together from past‑era technologies without convincing progress is a dubious proposition. Skepticism is warranted: as of 2025, the graph+LLM combo is still more about experiments and papers than about ready‑made products. Until new evidence proves otherwise, we should approach this direction with caution and a sober assessment of risks. In recent years, large language models (LLMs) such as ChatGPT have surged into the legal field. Lawyers in the United States, Europe, and around the world are experimenting with these AI tools to automate parts of their work... Hello, colleagues. If, like me, you have devoted your life to defending human rights, you know this scene all too well. A desk buried in papers. A laptop with dozens of tabs open. And a phone whose... AI Copilot for Lawyers: A Review of Vector Search, Knowledge Graphs, and a New Approach Open‑Source Legal AI Assistants (RAG) In recent years, numerous open projects have appeared that showcase... See how AdmissusCase can help you prepare stronger applications, find relevant precedents, and win more cases.

Introduction

The combination of knowledge graphs and large language models (LLMs) in 2023–2025 was promoted as the next step in the evolution of question‑answering and search systems. The idea sounded appealing: structured knowledge graphs (graph‑based knowledge bases) would supply LLMs with verified facts and links between entities, reducing "hallucinations" and enabling models to reason along chains of facts. However, as of September 2025, such pairings remain mostly laboratory experiments, far from large‑scale production solutions. In this article, in plain language, we'll look at why integrating knowledge graphs with LLMs has not yet lived up to expectations, what objective research results exist, and why many experts are skeptical. We will pay special attention to examples from the legal domain and technical literature, where these systems struggle the most.

The Promises of Integrating Knowledge Graphs and LLMs

Why try to combine knowledge graphs with LLMs at all? Because each approach has its strengths. A knowledge graph stores information as nodes (objects) and edges (relationships), i.e., in a strict structured form. Such a graph easily answers simple factual queries (for example, "Who is the CEO of company X?"), provides source traceability, and supports complex links between disparate data. On the other hand, an LLM is a model trained on a gigantic text corpus that can process natural language and generate coherent answers. LLMs are flexible in understanding phrasing, but they do not guarantee reliability—they can invent facts (hallucinate) and do not know up‑to‑date data that appeared after training.

A natural proposal emerged: use the knowledge graph as the source of facts, and the LLM as the "frontend" that generates human‑readable answers based on those facts. This hybrid was named Graph‑enhanced RAG (Retrieval‑Augmented Generation with a graph), or GraphRAG for short. In a GraphRAG scheme, the system first retrieves related nodes and facts in the graph database in response to a query, then the LLM takes this subset and composes the final answer. It was assumed that:

The graph structure would enable multi‑step reasoning—the LLM could follow chains of links to find answers to complex questions where data are scattered across different documents.
LLM answers would become more accurate and verifiable, since the model would rely on facts from the graph rather than on hidden associations inside its neural network.
Hallucinations would decrease: if a fact is not in the graph, the model is less likely to make it up.

Major players and startups embraced the idea. For example, Microsoft in 2024 released a GraphRAG library for developers. Well‑known graph platforms Neo4j and TigerGraph integrated with LLM agents, trying to adapt their offerings to the new AI wave. Vector databases also began converging: in 2024–2025, Weaviate and the frameworks LlamaIndex and LangChain added modules for working with knowledge graphs. It seemed the market was preparing to move from simple vector search to more "intelligent" graph knowledge.

Lab Successes vs. Reality

Despite high expectations, in practice the pairing of LLMs with knowledge graphs shows mixed results. Most existing solutions are early‑stage prototypes, often built for academic papers or demos, but not proven effective in the field. Moreover, more and more experts voice healthy skepticism. Here's why.

First, research benchmarks show that GraphRAG yields gains only in certain scenarios. For tasks requiring multi‑step reasoning (multi‑hop questions), graphs can indeed help—the model more easily finds a chain of related facts. But for most simple one‑fact queries, traditional RAG based on text search works better. In a recent study (Han et al., 2025), researchers conducted the first systematic comparison of RAG vs. GraphRAG on several standard QA datasets. The results are unambiguous: for single‑hop questions, RAG outperforms GraphRAG, and the advantages of the graph approach emerge only on specially constructed composite questions.

For example, on Natural Questions (Wikipedia‑based QA), classic RAG (vector text search) with an Llama‑70B model achieved F1 ≈ 68%, while the GraphRAG variant using triples only scored just ≈ 28–34% F1. Even hybrid GraphRAG (triples + text) reached only ≈ 54% F1 there—noticeably worse than the approach without a graph. Only on HotpotQA (multi‑fact questions) did an advanced GraphRAG variant slightly surpass RAG (≈ 64.6% vs 63.9% F1 on Llama‑70B). In other words, graph methods do not provide substantial gains on most real‑world questions and sometimes even worsen metrics compared to "naive" text search. As the authors note, RAG and GraphRAG are complementary: each has its strengths, but neither fully replaces the other.

Second, there are impressive enthusiast case studies, but they should be treated cautiously. For example, Lettria (an AWS partner) claimed that in its internal tests GraphRAG raised answer accuracy from ~50% to 80%, processing complex documents from finance, medicine, engineering, and law. However, this result was obtained on a special hybrid system (graph + vectors) with manual expert evaluation of answers. It's important to understand that such dramatic superiority of graphs is not observed in open academic competitions. On the contrary, independent experiments often record only a minor gain—or parity at best. As one practitioner aptly put it, "graph RAG is still great for blog posts, but still works poorly in the real world". This wry remark reflects the mood of part of the community: there are nice demos and concepts, but real‑world payoff is scarce so far.

Key Problems of Graph+LLM Systems

Why has it been so hard to bring the knowledge‑graph + LLM pairing to life? Here are the main hurdles (many identified in developers' hands‑on attempts and confirmed by studies):

Speed: A graph‑based RAG system in reality is very slow. Building and traversing the graph takes time. Reports say answers can take tens of seconds or minutes—unacceptable for production. One experimenter tried to run GraphRAG locally on a laptop using Microsoft's open‑source solution—and found that indexing ~20 files took 36 hours! He concluded the process was "not fit for real use," considering there are thousands of documents in a work corpus.
Cost: Automatically extracting a graph from text relies on LLM calls (e.g., GPT‑4) to parse each fragment and find links. This is expensive. Estimates show that on moderate data volumes, building a graph can cost tens of thousands of dollars just in API bills. In addition, graph databases themselves in the cloud (not to mention enterprise licenses) incur costs. In short, "chatting" with GPT about every paragraph is too pricey for most companies. Enthusiasts look for workarounds (as in the SemDB project, where 80–90% of work is kept local without LLM calls), but there's no universal solution.
Data Scalability: A knowledge graph is good for a small, well‑structured set of facts but does not scale well to large document corpora. An automatically built graph often splits into clusters ("islands") of connected nodes. The more data, the more such clusters—and the harder it is to merge them and process queries crossing multiple regions of the graph. Scaling such a system is a non‑trivial engineering task. Some approaches (e.g., community‑based GraphRAG) try to structure a text‑derived graph hierarchically, but that adds yet another layer of complexity. By comparison, vector DBs scale more easily: adding new documents is simply adding new embedding vectors without restructuring the entire system.
Accuracy and Completeness: The most delicate issue is the quality of the automatically built graph. Modern LLMs can indeed extract "triples" (subject–predicate–object) and facts from text. But they do so with errors and omissions. Researchers note that even the best models reliably extract only the most obvious entities and relationships, missing subtler details. In specialized domains (legal texts, scientific articles), an out‑of‑the‑box model simply lacks the necessary vocabulary and gets confused by domain phrasing. Serious fine‑tuning is required for each such task. Without it, the automatically built graph will be incomplete. And incompleteness means the LLM won't find an answer simply because the needed fact wasn't added as nodes/edges. Ironically, sometimes good old full‑text search will find the answer, while the graph won't—because the answer is in the text, but never "landed" in the graph.
Fragility (Validation Required): If an error or false link creeps into the knowledge graph, the LLM will also err on that basis. A graph does not guarantee truth—it's just structured data. Without manual verification, such a system cannot be trusted. Knowledge engineers have always painstakingly cleaned and checked corporate graphs, and with LLMs this need hasn't gone away. Automation sped up graph collection, but expert validation remains the bottleneck. Essentially, we're back to the original problem: for a system's answer to be reliable, a human expert must first ensure the knowledge base is correct. Otherwise, the LLM+graph combo can confidently present a demonstrably incorrect answer, citing bogus nodes. In domains requiring accuracy (law, medicine, finance), that makes the system unacceptable without constant oversight.

To sum up: GraphRAG has turned out to be complex, costly, and temperamental. Even its proponents admit this. For example, Verdantix experts note that GraphRAG pays off only in very complex, multi‑step tasks requiring meticulous audit of reasoning—and only if a company has the resources to maintain an up‑to‑date knowledge graph. In all simpler cases, the "naive" RAG based on vector search remains far more practical and cheaper. In other words, the game is worth the candle only where deep, explicit modeling of relationships is unavoidable; for routine questions, it's needless overhead.

Example: Legal Texts and Technical Documentation

It's especially telling to consider Graph+LLM in the context of legal documents and technical literature—these are exactly the areas where such solutions are often attempted, and where their limitations show.

Take the legal field. One would think law is an ideal candidate for a knowledge graph: there are clear entities (laws, articles, courts, cases, parties) and relationships between them (citation, repeal, mention, etc.). Unsurprisingly, there have been attempts to build legal assistants based on graphs of statutes or case law. In practice, however, the language of statutes and court decisions is too complex for raw automatic parsing. LLMs without fine‑tuning get confused by legal phrasing. They can extract basic links (e.g., "Ivanov appears in case No. X"), but struggle to capture specifics: who represents whom, what decision was issued, which precedents are cited and with what effect. For the graph to truly reflect important links (e.g., "Case A cites Case B on jurisdiction" or "Article 10 of Law Y amended paragraph 2 of Article 5 of Law Z"), you must manually define the required entity types and relations for the model. That effectively means hand‑crafting an ontology for each sub‑field of law—extremely labor‑intensive.

Moreover, legal consulting demands completeness and accuracy. A missed link between cases or a wrong association between statutes can lead to an incorrect conclusion. Imagine a system answering a lawyer's query based on an unverified graph: it might overlook a key precedent or, conversely, cite an unrelated fact. As a result, such products today are used at most as assistants that provide draft references—but not as autonomous search systems. A lawyer must be present to check and correct the AI's results. Without that, by common consensus, such "Graph+LLM" solutions are non‑viable in law—the cost of error is too high and too much manual effort is needed at the knowledge‑preparation stage. No wonder researchers say outright: in critical systems (law, finance, medicine), automatically extracted graphs must be carefully validated by hand. Otherwise, they cannot be trusted.

The situation is similar with technical literature—say, scientific papers or engineering manuals. Here a knowledge graph could link concepts (materials, methods, experimental results) and help answer complex questions like: "Which 2021 studies examined the impact of temperature on alloy X, and what conclusions did they reach?" To some extent, such systems have been built (e.g., for navigating medical research). But the difficulties start with the fact that scientific text is rich in nuances that are hard to formalize as triples. An automatic parser will extract entities (names of materials, processes) and perhaps standalone facts (who invented what, which parameter equals what). But logical relations—causal links, constraints, assumptions of the study—escape simple structuring. The result is a graph that is either too shallow (only general facts, with subtleties lost) or bloated to monstrous complexity as it tries to account for all dependencies. For example, to represent an entire experimental cycle for a material, you'd create dozens of nodes (sample, temperature, result, analysis method, etc.) and edges between them. An LLM can try to generate such a structure, but the likelihood of errors is huge. The outcome is either an incomplete graph or one noisy with extra nodes, making retrieval difficult.

Of course, in both science and engineering, manually curated knowledge graphs are useful (as fact databases). But automatically building them over large text corpora is a problem not yet solved at a satisfactory quality level. That's why search systems over scientific papers still rely on a combination of classic keyword search and semantic (vector) search, sometimes adding manual labeling. Attempts to create a fully automated Graph+LLM literature search remain experiments of the "bold but naive" variety. Companies are in no hurry to invest, recognizing the amount of fine‑tuning required and the limited payoff.

Skepticism and Investment Questions

Against this backdrop, many in the industry ask: is this a dead‑end branch? It seems we're trying to apply tools from the pre‑LLM era of NLP (structured knowledge bases, ontologies), developed under a very different understanding of language, to modern LLMs that operate on probabilistic text distributions. Conceptually, an LLM is a "statistical oracle" that predicts the most likely answer based on a huge latent store of knowledge learned from data. Vector databases emerged as a way to partly compensate for the oracle's limitations with up‑to‑date data, but they too turned out to be a palliative: pure vector search suffers from loss of exact details and low recall. Adding graphs attempts to compensate for the vector approach's existing limitations (lack of explicit links, logical inference) with even more complex layers. The result is a wobbly tower of technologies: a neural network, with a vector index on top, and a graph on top of that… It's all hard to maintain and explain. Unsurprisingly, business reacted warily. If a solution requires such a complex infrastructure and doesn't deliver a breakthrough in quality, is it worth investing in? Many think no.

Common sense suggests: before building multi‑story combinations of LLMs and old technologies, we should improve the models themselves or find simpler hybrid methods. For example, instead of a bulky ontology‑style knowledge graph, teams increasingly use pragmatic approaches: a combination of classic search (BM25) and embeddings to boost recall, or storing "memory" as raw text, which LLMs can effectively filter. The RAG trend in 2024 showed that we didn't have to discard old methods entirely: vector DBs were combined with good old keywords and ranking, and that worked. Perhaps graphs will also find their niche—for example, in complex autonomous agent systems that need to store long‑term knowledge about their environment (that's exactly where Verdantix saw GraphRAG's potential). But these are specific cases that prove the rule: 90% of tasks are solved perfectly well by simpler means.

It's also worth mentioning maintenance and freshness. A knowledge graph is not a one‑time data dump; it requires constant enrichment, error correction, and fact updates. In an era where data get stale quickly, companies find it hard to justify large amounts of manual work for a graph if they can do without it. LLMs, on the other hand, are good in that they can be retrained on fresh data or use updates via RAG. As a result, the out‑of‑the‑box graph approach has inherited all the downsides of the old approach (need for experts, static nature) and added new ones (technical complexity). It's no surprise that investors and product managers are cool to funding large Graph+LLM initiatives, preferring solutions that deliver tangible ROI here and now. Skepticism is further fueled by the fact that big breakthrough claims often don't hold up outside demo conditions—as we saw above in the numbers.

Conclusions

While the knowledge graph + LLM pairing is conceptually attractive, in reality it has run into many obstacles. It's September 2025, and we still don't see broad industrial adoption of GraphRAG: almost all such systems are either prototypes or niche deployments with a huge amount of manual work. Healthy skepticism is justified. Essentially, Graph+LLM today is a laboratory experiment with an uncertain outcome. There are serious signs that trying to combine new generative models with methods born in the pre‑LLM era may be a dead‑end. At least for now, the gains from the graph do not offset the associated complexity and risks.

This does not mean the idea is doomed: research continues, improved algorithms for building graphs are appearing, and hybrid strategies for choosing between RAG and GraphRAG by task are being devised. There will likely be niches (high‑stakes agent systems, detailed knowledge modeling in specific domains) where such solutions will find use. However, expecting a knowledge graph + LLM to become a universal solution for search and analytics is naive, at least at the current technology level. Without expert involvement and thorough validation, such systems are not ready for the real world, and with expert involvement they lose a significant share of their raison d'être.

To sum up, knowledge graphs + LLMs have not become the "Holy Grail" of AI systems. Rather, the industry has come full circle and returned to recognizing the value of simpler, more reliable approaches. As developers joke, the path from keyword search to vector RAG, then to graph RAG, has been a useful journey that revealed the limitations of each method. Perhaps the future will bring a new generation of knowledge representations (some speak of "ontologies 2.0" or semantic bases like the aforementioned SemDB). But investing effort in a solution cobbled together from past‑era technologies without convincing progress is a dubious proposition. Skepticism is warranted: as of 2025, the graph+LLM combo is still more about experiments and papers than about ready‑made products. Until new evidence proves otherwise, we should approach this direction with caution and a sober assessment of risks.

Knowledge Graphs + LLMs: Why This Pairing Is Stalling in 2025

Listen to this article

Introduction

The Promises of Integrating Knowledge Graphs and LLMs

Lab Successes vs. Reality

Key Problems of Graph+LLM Systems

Example: Legal Texts and Technical Documentation

Skepticism and Investment Questions

Conclusions

Related Articles

Using Tools Like ChatGPT in Legal Practice: A Guide for Lawyers

How to Turn a Case File's Chaos into a Ready‑to‑File UN Complaint: A Practical Guide to Working with JURIS AI

AI Copilot for Lawyers: A Review of Vector Search, Knowledge Graphs, and a New Approach

Transform Your Human Rights Practice