HNPWA with Next.js

Show HN: PageIndex – Vectorless RAG

>"Retrieval based on reasoning — say goodbye to approximate semantic search ("vibe retrieval"

How is this not precisely "vibe retrieval" and much more approximate, where approximate in this case is uncertainty over the precise reasoning?

Similarity with conversion to high-dimensional vectors and then something like kNN seems significantly less approximate, less "vibe" based, than this.

This also appears to be completely predicated on pre-enrichment of the documents by adding structure through API calls to, in the example, openAI.

It doesn't at all seem accurate to:

1: Toss out mathematical similarity calculations

2: Add structure with LLMs

3: Use LLMs to traverse the structure

4: Label this as less vibe-ish

Also for any sufficiently large set of documents, or granularity on smaller sets of documents, scaling will become problematic as the doc structure approaches the context limit of the LLM doing the retrieval.

ineedasername | 5 days ago

So if I understand this correctly it goes over every possible document with an LLM each time someone performs a search?

I might have misunderstood of course.

If so, then the use cases for this would be fairly limited since you'd have to deal with lots of latency and costs. In some cases (legal documents, medical records, etc) it might be worth it though.

An interesting alternative I've been meaning to try out is inverting this flow. Instead of using an LLM at time of searching to find relevant pieces to the query, you flip it around: at time of ingesting you let an LLM note all of the possible questions that you can answer with a given text and store those in an index. You could them use some traditional full-text search or other algorithms (BM25?) to search for relevant documents and pieces of text. You could even go for a hybrid approach with vectors on top or next to this. Maybe vectors first and then more ranking with something more traditional.

What appeals to me with that setup is low latency and good debug-ability of the results.

But as I said, maybe I've misunderstood the linked approach.

mosselman | 5 days ago

So, this has already been done plenty, Serena MCP and Codanna MCP both do this with AST source graphs, Codanna even gives hints in the MCP response to guide the agent to walk up/down the graph. There might be some small efficiency gain in having a separate agent walk the graph in terms of context savings, but you also lose solution fidelity, so I'm not sure it's a win. Also, it's not a replacement for RAG, it's just another piece in the pipeline that you merge over (rerank+cut or llm distillate).

CuriouslyC | 5 days ago

Not sure if I fully understand it, but this seems highly inefficient?

Instead of using embeddings which are easy to make a cheap to compare, you use summarized sections of documents and process them with an LLM? LLM's are slower and more expensive to run.

mikeve | 5 days ago

My approach in "LLM-only RAG for small corpora" [0] was to mechanically make an outline version of all the documents _without_ an LLM, feed that to an LLM with the prompt to tell which docs are likely relevant, and then feed the entirety of those relevant docs to a second LLM call to answer the prompt. It only works with markdown and asciidoc files, but it's surprisingly solid for, for example, searching a local copy of the jj or helix docs. And if the corpus is small enough and your model is on the cheap side (like Gemini 2.5 Flash), you can of course skip the retrieval step and just send the entire thing every time.

[0]: https://crespo.business/posts/llm-only-rag/

dcre | 5 days ago

There's good reasons to do this. Embedding similarity is _not_ a reliable method of determining relevance.

I did some measurements and found you can't even really tell if two documents are "similar" or not. Here: https://joecooper.me/blog/redundancy/

One common way is to mix approaches. e.g. take a large top-K from ANN on embeddings as a preliminary shortlist, then run a tuned LLM or cross encoder to evaluate relevance.

I'll link here these guys' paper which you might find fun: https://arxiv.org/pdf/2310.08319

At the end of the day you just want a way to shortlist and focus information that's cheaper, computationally, and more reliable, than dumping your entire corpus into a very large context window.

So what we're doing is fitting the technique to the situation. Price of RAM; GPU price; size of dataset; etc. The "ideal" setup will evolve as the cost structure and model quality evolves, and will always depend on your activity.

But for sure, ANN-on-embedding as your RAG pipeline is a very blunt instrument and if you can afford to do better you can usually think of a way.

thatjoeoverthr | 5 days ago

The thing is — for very long documents, it's actually pretty hard for humans to find things, even with a hierarchical structure. This is why we made indexes — the original indexes! — on paper. What you're saying makes pretty hard assumptions about document content, and of course doesn't start to touch multiple documents.

My feeling is that what you're getting at is actually the fact that it's hard to get semantic chunks and when embedding them, it's hard to have those chunks retain context/meaning, and then when retrieving, the cosine similarity of query/document is too vibes-y and not strictly logical.

These are all extremely real problems with the current paradigm of vector search. However, my belief is that one can fix each of these problems vs abandoning the fundamental technology. I think that we've only seen the first generation of vector search technology and there is a lot more to be built.

At Vectorsmith, we have some novel takes on both the comptuation and storage architecture for vector search. We have been working on this for the last 6 months and have seen some very promising resutls.

Fundamentally my belief is that the system is smarter when it mostly stays latent. All the steps of discretization that are implied in a search system like the above lose information in a way that likely hampers retrieval.

joshua_s_penman | 5 days ago

> It moves RAG away from approximate "semantic vibes" and toward explicit reasoning about where information lives. That clarity can help teams trust outputs and debug workflows more effectively.

Wasn't this a feature of RAGs, though? That they could match semantics instead of structure, while us mere balls of flesh need to rely on indexes. I'd be interested in benchmarks of this versus traditional vector-based RAGs, is something to that effect planned?

mvieira38 | 5 days ago

Very cool. These days I’m building RAG over a large website, and when I look at the results being fed into the LLM, most of them are so silly it’s surprising the LLM even manages to extract something meaningful. Always makes me wonder if it’s just using prior knowledge even though it’s instructed not to do so (which is hacky).

I like your approach because it seems like a very natural search process, like a human would navigate a website to find information. I imagine the tradeoff is performance of both indexing and search, but for some use cases (like mine) it’s a good sacrifice to make.

I wonder if it’s useful to merge to two approaches. Like you could vectorize the nodes in the tree to give you a heuristic that guides the search. Could be useful in cases where information is hidden deep in a subtree, in a way that the document’s structure doesn’t give it away.

brap | 5 days ago

The folks who are using RAG, what's the SOTA for extracting text from pdf documents? I have been following discussions on HN and I have seen a few promising solutions that involve converting pdf to png and then doing extraction. However, for my application this looks a bit risky because my pdfs have tons of tables and I can't afford to get in return incorrect of made up numbers.

The original documents are in HTML format and although I don't have access to them I can obtain them if I want. Is it better to just use these HTML documents instead? Previously I tried converting HTML to markdown and then use these for RAG. I wasn't too happy with the result although I fear I might be doing something wrong.

malshe | 5 days ago

I have a RAG built on 10000+ docs knowledge base. On vector store, of course (Qdrant - hybrid search). It work smoothly and quite reliable.

I wonder how this "vectorless" engine would deal with this. Simply, I can't see this tech scalable.

huqedato | 5 days ago

Sounds a bit like generative retrieval (e.g. this Google paper here: https://arxiv.org/abs/2202.06991)

Koaisu | 5 days ago

This will scale when you have a single/a small set of document(s) and want your questions answered.

When you have a question and you don't know which of the million documents in your dataspace contains the answer - I'm not sure how this approach will perform. In that case we are looking at either feeding an enormously large tree as context to LLM or looping through potentially thousands of iterations between a tree & a LLM.

That said, this really is a good idea for a small search space (like a single document).

lewisjoe | 5 days ago

A suspicious lack of any performance metrics on the many standard RAG/QA benchmarks out there, except for their highly fine-tuned and dataset-specific MAFIN2.5 system. I would love the see this approach vs. a similarly well-tuned structured hybrid retriever (vector similarity + text matching) which is the common way of building domain-specific RAG. The FinanceBench GPT4o+Search system never mentions what the retrieval approach is [1,2], so I will have to assume it is the dumbest retriever possible to oversell the improvement.

PageIndex does not state to what degree the semantic structuring is rule-based (document structure) or also inferred by an ML model, in any case structuring chunks using semantic document structure is nothing new and pretty common, as is adding generated titles and summaries to the chunk nodes. But I find it dubious that prompt-based retrieval on structured chunk metadata works robustly, and if it does perform well it is because of the extra work in prompt-engineering done on chunk metadata generation and retrieval. This introduces two LLM-based components that can lead to highly variable output versus a traditional vector chunker and retriever. There are many more knobs to tune in a text prompt and an LLM-based chunker than in a sentence/paragraph chunker and a vector+text similarity hybrid retriever.

You will have to test retrieval and generation performance for your application regardless, but with so many LLM-based components this will lead to increased iteration time and cost vs. embeddings. Advantage of PageIndex is you can make it really domain-specific probably. Claims of improved retrieval time are dubious, vector databases (even with hybrid search) are highly efficient, definitely more efficient that prompting an LLM to select relevant nodes.

1. https://pageindex.ai/blog/Mafin2.5 2. https://github.com/VectifyAI/Mafin2.5-FinanceBench

gillesjacobs | 5 days ago

[deleted]

| 5 days ago

>Instead of relying on vector databases or artificial chunking, it builds a hierarchical tree structure from documents and uses reasoning-based tree search to locate the most relevant sections.

So are we are creating create for each document on the fly ? even if its a batch process then dont you think we are pointing back to something which is graph (approximation vs latency sort of framework)

Looks like you are talking more in line of LLM driven outcome where "semantic" part is replaced with LLM intelligence.

I tried similar approaches few months back but those often results in poor scalablity, predictiablity and quality.

gogeta_99999 | 5 days ago

I did something like this myself. Take a large PDF, summarize each page. Make sure to have the titles of previous 3 pages, it helps with consistency and detecting transitions from one part to another. Then you take all page summaries in a list, and do another call to generate the table of contents. When you want to use it you add the TOC in the prompt and use a tool to retrieve sections on demand. This works better than embeddings which are blind to relations and larger context.

It was for a complex scenario of QA on long documents, like 200 page earning reports.

visarga | 5 days ago

https://en.wikipedia.org/wiki/Retrieval-augmented_generation

guerby | 5 days ago

Context and prompt engineering is the most important of AI, hands down.

There are plenty of lightweight retrieval options that don't require a separate vector database (I'm the author of txtai [https://github.com/neuml/txtai], which is one of them).

It can be as simple this in Python: you pass an index operation a data generator and save the index to a local folder. Then use that for RAG.

dmezzetti | 5 days ago

We've done this for a while with cognee, where we have graph completition retrieval that does that + many other things like weighting, self improving feedback and more https://github.com/topoteretes/cognee

vasa_ | 4 days ago

This is good for applications where a background queue based RAG is acceptable. You upload a file, set the expectation to the user that you're processing it and needs more time for a few hours and then after X hours you deliver them. Great for manuals, documentation and larger content.

But for on-demand, near instant RAG (like say in a chat application), this won't work. Speed vs accuracy vs cost. Cost will be a really big one.

neya | 5 days ago

vectorless rag? I think I have one of those in my kitchen

monster_truck | 5 days ago

an effective "vectorless RAG" is to have an LLM write search queries against the documents. e.g. if you store your documents in postgres, allow the LLM to construct a regex string that will find relevant matches. If you were searching for “Martin Luther King Jr.”, it might write something like:

    SELECT id, body
    FROM docs
    WHERE body ~* E'(?x)                                     -- x = allow whitespace/comments
      (?:\\m(?:dr|rev(?:erend)?)\\.?\\M[\\s.]+)?             -- optional title: Dr., Rev., Reverend
      (                                                      -- name forms
        (?:\\mmartin\\M[\\s.]+(?:\\mluther\\M[\\s.]+)?\\mking\\M)  -- "Martin (Luther)? King"
      | (?:\\mm\\.?\\M[\\s.]+(?:\\ml\\.?\\M[\\s.]+)?\\mking\\M)     -- "M. (L.)? King" / "M L King"
      | (?:\\mmlk\\M)                                       -- "MLK"
      )
      (?:[\\s.,-]*\\m(?:jr|junior)\\M\\.?)*                  -- optional suffix(es): Jr, Jr., Junior
    ';

mritchie712 | 5 days ago

This seems really interesting but I can't quite figure out if this is like a SaaS product or an OSS library? The code sample seems to indicate that it uses some sort of "client" to send the document somewhere and then wait to retrieve it later.

But the home page doesn't indicate any sort of sign up or pricing.

So I'm a little confused.

edit Ok I found a sign up flow, but the verification email never came :(

rco8786 | 5 days ago

I don't see this scaling: https://deepwiki.com/search/how-is-the-tree-formed-and-tra_9...

I'd do some large scale benchmarks before doubling down on this approach.

esafak | 5 days ago

This is like semantic version of B+ trees.

cantor_S_drug | 5 days ago

Looks like this should scale spectacularly poorly.

Might be useful for a few hundred documents max though.

jdthedisciple | 5 days ago

What about latency?

koakuma-chan | 5 days ago

I just realized that the whole Hacker News discussion is formalized as a tree, and I am using my eyes to tree search through the tree to retrieve ideas from the insightful comments.

mingtianzhang | 5 days ago

I let a boot do a free text search over and indexed database. Works ok. I've also tried keyword based retrieval and vector search.

I've found all leave something to be desired, sadly.

nathan_compton | 5 days ago

"Human-like Retrieval: Simulates how human experts navigate and extract knowledge from complex documents." - pretty sure I use control-f when I look for stuff

geedzmo | 5 days ago

Second attempt to get away from vectors and embeddings I’ve seen here recently. Are people really struggling that much with their RAG systems?

petesergeant | 5 days ago

[deleted]

| 5 days ago

Unrelated: why is chat search in Claude so bad?

dr_dshiv | 5 days ago

[dead]

kruxigt | 5 days ago

thinking if this is related to llms.txt?

raytang | 5 days ago

[dead]

bigdickfounder | 5 days ago