So I've done a ton of work in this area.
Few learnings I've collected:
1. Lexical search with BM25 alone gives you very relevant results if you can do some work during ingestion time with an LLM.
2. Embeddings work well only when the size of the query is roughly on the same order of what you're actually storing in the embedding store.
3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.
So combining all 3 learnings, we landed on a knowledge decomposition and extraction step very similar to yours. But we stick a metaprompter to essentially auto-generate the domain / entity types.
LLMs are naively bad at identifying the correct level of granularity for the decomposed knowledge. One trick we found is to ask the LLM to output a mermaid.js mindmap to hierarchically break down the input into a tree. At the end of that output, ask the LLM to state which level is the appropriate root for a knowledge node.
Then the node is used to generate questions that could be answered from the knowledge contained in this node. We then index the text of these questions and also embed them.
You can directly match the user's query from these questions using purely BM25 and get good outputs. But a hybrid approach works even better, though not by that much.
Not using LLMs are query time also means we can hierarchically walk down the root into deeper and deeper nodes, using the embedding similiarity as a cost function for the traversal.
PageRank for better centrality seems neat, but it still doesn't address the probably unsolvable flaw with RAG, the reason why RAG basically can't work. All RAG DBs under-perform expectations because RAG fundamentally can't find relationships between words necessary to find the information the user cares about. Weird right, isn't this what the 'attention' mechanism is supposed to be good for? It just isn't good enough.
Example: Say you're searching an article and you want to know what occupation a mentioned person has, let's say the person 'Sharon,' is mentioned to have attended several physical chemistry conferences but her occupation is never explicitly mentioned. There's a very good chance every single rag approach will fail to return correct results, will fail to make this connection between 'occupation' attends conference, type of conference and infers 'chemist'. I could go on, but this sort of error is pervasive along all types of information when trying to retrieve with RAG. In the end, solutions like the above seem to just sort of reinvent other query methods, SQL, pagerank etc, with extra steps... there's little point in vectorization at that point...
This is very cool, I signed up and uploaded a few docs (PDFs) to the dashboard
Our Use case: We have been looking at farming out this work (analyzing complaince documents (manufacturing paperwork) for our AI Startup however we need to understand the potential scale this can operate under and the cost model for it to be useful to us
We will have about 300K PDF documents per client and expect about a 10% change in that document set, month to month -any GraphRag system has to handle documents at scale - we can use S3 as an igestion mechanism but have to understand the cost and processing time needed for the system to be ready to use duiring:
1. inital loading 2. regular updates -how do we delete data from system for example
cool framework btw..
Super interesting, thanks for sharing. How large a corpus of domain specific text do you need to obtain a useful knowledge graph?
Aider has been doing PageRank on the call graph of code repos since forever. All non trivial code has lots of graph structure to support PageRank. So it works really well to find the most relevant context in the project related to the currently active task.
This is cool! How is the graph stored and queried? I’m familiar with graph databases, but I don’t see that as a dependency.
Have you tried the sciphi triplex model for extraction? I’ve tried to do some extraction before, but got inconsistent results if I extracted the chunks multiple times consecutively.
Cool idea. IMHO traditional information retrieval is the way to go with RAG. Vector search is nice but also slow and expensive and people seem to use it as magic pixie dust. It works nice for unstructured data but not necessarily that well for structured data.
And unless tuned very well, vector search is not actually a whole lot better than a good old well tuned query. Putting everything together, the practice of turning structured data into unstructured data just so you can do vector search or prompt engineering on it, which I've seen teams do, feels a bit backwards. It kind of works but there are probably smarter ways to get the same results. Graph RAG is essentially about making use of structure of data. Whether that's through SQL joins or by querying some graph database doesn't really matter much.
There is probably some value into teaching LLMs how to query as well; or letting them interface with existing search/query APIs. And you can compensate for poor ranking with larger context sizes and simply fetch a few hundred or even more results with multiple queries. It's going to be a lot faster and cheaper than vector search to scale that.
Looks great. But being burned by other "abstractions", e.g. LangChain, I'm weary of the oversimplification. How are you not going to make those same mistakes?
Do you have any retrieval and generation metric scores (eg, KILT or NQ datasets)?
I know benchmark datasets are not the be-all-end-all, but a halfway decent score and inference-time, would really help sell your framework (or help engineers make the choice).
In any case, very cool work, I built a lot of RAG pipelines as freelance NLP engineer and I will try this out.
Hi,
I’m currently building a Q&A chatbot and facing challenges in addressing the following scenario:
When a user asks:
"What do you mean in your previous statement?"
How does your framework handle retrieving the correct small subset of "raw knowledge" and integrating it into the LLM for a relevant response?
Without relying on external frameworks, I’ve struggled with this issue - https://www.reddit.com/r/LocalLLaMA/comments/1gtzdid/d_optim...
I’d love to know how your framework solves this and whether it can streamline the process.
Thank you!
Are there any serious LLM engineering communities that have emerged post-hype cycle? Spaces where people are actually pushing boundaries with exploratory engineering, not just theorizing. Somewhere the focus is on testing limits and validating novel approaches - figuring out what's genuinely achievable with this tech.
I assume there's got to be, but I don't have the capacity these days to root around and find it, and I'm genuinely worried about missing out on some really cool shit.
I'm curious how the implementation compares with LightRAG (https://github.com/HKUDS/LightRAG) ?
I might be the wrong target audience (I do have a great interest in this, but I am not doing it at a professional level) but I feel the GitHub could explain things a bit better — now I need to go read someone else's research paper to see what you guys are doing!
(Also readme says see examples folder but it's basically empty?)
How does domain and example queries help construct the knowledge graph, or is that just context for executing queries.
Very cool. Have you considered whether incorporating any of the new-ish unsupervised or semi-supervised keyphrase extraction algorithms could give this a boost? Teket (graph-based) and sifrank come to mind, but I’m also wondering if AutoPhrase + an LLM could be powerful.
So I went ahead and tried running the example script with "A CHRISTMAS CAROL" using the "meta-llama-3.1-8b-instruct" and "text-embedding-nomic-embed-text-v1.5" models locally. How long should it take to extract the subgraphs with this kind of setup?
Cool! But I'm confused on your pricing. The github page says first 100 requests are free but the landing page says to self host if you want to use for free. I signed up and used the dashboard but I don't see a billing section or option to upgrade the account.
Do you fear that some big company will just host your system for cheaper than you can if you catch a lot of success?
That is, the same thing that Amazon did to Mongo will happen to you?
Do you think working in the open enables you to spend more time on engineering and less on sales and marketing?
Looking forward to someone adapting this for Obsidian and other similar tools. As a low-effort user of Obsidian I would love to reap the benefits of appropriate knowledge graphs, but don't want to put that much effort into creating one myself.
What solutions are folks using to solve queries like "How many of these 1000 podcast transcripts have a positive view of Hillary Clinton"? Seems like you would need a way to map reduce and count? And some kind of agent assigner/router on top of it?
What is the difference to HippoRAG, which seems to be the same approach but came our earlier?
So what is the answer to "Who is Scrooge?" and is it different / better than another approach?
( Like whole thing in contenxt window for instance? )
Is this approach just for cost savings or does it help get better answers and how so?
Could you share a specific example?
Thanks for releasing this! Have you gotten a chance to run any multi-hop retrieval benchmarks?
It would be very useful to be able to compare this method to other establishes RAG techniques
Just out of interest: why is every python file prefixed with an underscore? I’ve never seen it before. Is it to avoid collisions with package imports? e.g. “types”
How is it at answering broad questions? Communities and clustering were specifically for that purpose of agglomerating, right?
Could this be used for context retrieval and generative understanding of codebases?
It looks awfully similar to nano graphrag, but I fail to see any credits to it.
does FastRAG integrate with other graph databases like neo4J ?
Neat, we are doing something similar with cognee, but are letting users define graph ingestion, generation, and retrieval themselves instead of making assumptions: https://github.com/topoteretes/cognee
Can this method return references to source documents?
Can this be used with LLMs other than the OpenAI API?
Is there a way to use this just as a retriever?
is there any sense of tenancy?
From what I can tell, at least given the examples is that there is one global graph.
Thanks!
Gosh I miss the days Google were using pagerank and not whatever the heck kind of crap their service has turned into.
Please tell me I’m not the only one that sees the irony in AI relying on classic search.
Obviously LLMs are good at some semantic understanding of the prompt context and are useful, but the irony is hilarious
What is RAG, please?
Wonder why this all - here on HN - is not part of the readme .md which says absolutely nothing about how and why this all would work.
The whole approach to representing the work, including the writing here, screams marketing, and the paid offering is the only thing made absolutely clear about it.
p.s. I absolutely understand why a knowledge graph is essential and THE right approach for RAG, and particularly when vector DBS on their own are subpar. But so do know many others and from the way the repo is presented it absolutely gives no clue why yours is _something_ in respect to other/common-sense graph-RAG-somethings.
You see, there are hundreds of smart people out there who can easily come to conclusion data needs to be presented as knowledge in graph-ontological way and then feed the context with only the relevant subgraph. Like, you could’ve said so much rather than asking .0084 cents or whatever for APIs as the headline of a presumably open repo.
Since when does “good old PageRank” demand an OpenAI API key?
“You may not: use Output to develop models that compete with OpenAI” => they’re gonna learn from you and you can’t learn from them.
Glad we’re all so cool with longterm economic downfall of natural humans. Our grandkids might not be so glad about it!
PageRank is one of several interesting centrality metrics that could be applied to a graph to influence RAG on structural data, another one is Triangle Centrality which counts triangles around nodes to figure out their centrality based on the concept that triangles close relationships into a strong bond, where open bonds dilute centrality by drawing weight away from the center:
https://arxiv.org/abs/2105.00110
The paper shows high efficiency compared to other centralities like PageRank, however in some research using the GraphBLAS I and my coauthors found that TC was slower on a variety of sparse graphs than our sparse formulation of PR for graphs up to 1.8 billion edges, but that TC appears to scale better as graphs get larger and is likely more efficient in the trillion edge realm.
https://fossies.org/linux/SuiteSparse/GraphBLAS/Doc/The_Grap...