Show HN: A Vectorless LLM-Native Document Index Method

github.com

LLMinformation retrievalnatural language processing

Discussion (3 comments)

Showing 3 comments

avereveard

3 months ago

1 reply

What happen when the TOC is too long? How does the index handles near misses? How do you disambiguate between close titles? What happens if the documents are not in a strict hierarchy?

Seems very situational.

mingtianzhang

3 months ago

1 reply

Hi, thanks for your inspiring questions.

1. What happens when the TOC is too long? -- This is why we choose the tree structure. If the ToC is too long, it will do a hierarchy search, which means search over the father level nodes first and then select one node, and then search its child nodes.

2. How does the index handle near misses, and how do you disambiguate between close titles? For each node, we generate a description or summary to give more information rather than just titles.

3. For documents that are not in a hierarchy, it will just become a list structure, which you can still look through.

We also write down how it can combine with a reasoning process and give some comparisons to Vector DB, see https://vectifyai.notion.site/PageIndex-for-Reasoning-Based-....

We found our MCP service works well in general financial/legal/textbook/research paper cases, see https://pageindex.ai/mcp for some examples.

We do agree in some cases, like recommendation systems, you need semantic similarity and Vector DB, so I wouldn't recommend this approach. Keen to learn more cases that we haven't thought through!

avereveard

3 months ago

thanks!

Resources