Show HN: A Vectorless LLM-Native Document Index Method
github.comSeems very situational.
1. What happens when the TOC is too long? -- This is why we choose the tree structure. If the ToC is too long, it will do a hierarchy search, which means search over the father level nodes first and then select one node, and then search its child nodes.
2. How does the index handle near misses, and how do you disambiguate between close titles? For each node, we generate a description or summary to give more information rather than just titles.
3. For documents that are not in a hierarchy, it will just become a list structure, which you can still look through.
We also write down how it can combine with a reasoning process and give some comparisons to Vector DB, see https://vectifyai.notion.site/PageIndex-for-Reasoning-Based-....
We found our MCP service works well in general financial/legal/textbook/research paper cases, see https://pageindex.ai/mcp for some examples.
We do agree in some cases, like recommendation systems, you need semantic similarity and Vector DB, so I wouldn't recommend this approach. Keen to learn more cases that we haven't thought through!