Ask HN: How would you architect a RAG system for 10M+ documents today?
No synthesized answer yet. Check the discussion below.
Use a strong embedding model and keep it fixed for a long period. Re-embedding 10M items is expensive. Run the embedding service behind a small API. Track model versions so you know the age of each vector. For retrieval, stay with hybrid search. Use BM25 and an ANN index. Join both results and rerank with a cross encoder. This produces better output than single-mode retrieval and avoids the overhead of graph systems.
LightRAG and GraphRAG help only when you have clear entities and relations across your corpus. They also bring heavy maintenance work. Entity extraction, graph updates, and link revisions add load with small gains unless your domain needs multi hop lookup. Most teams skip graph methods until they confirm real gaps in answer quality. Hybrid retrieval with reranking delivers stable output at this scale with lower upkeep.
Keep RAG logic inside one service. The chat layer sends a query. The service runs the retrieval steps, builds a context window, and sends a compact prompt to the model. This avoids coupling the LLM tier with indexes or pipelines. Incremental updates remain simple because all maintenance lives in the ingestion workers.
At this scale, the main threats are drift between PostgreSQL and the indexes, slow indexing pipelines, and uneven query latency. Run periodic checks that compare PostgreSQL counts with your search indexes. Log every query with its retrieved chunks and model output. Replay those logs after each change to catch regressions. Use sharding only when a single node reaches memory limits.
The short answer is to keep the system simple. PostgreSQL for storage. A vector index for embeddings. A lexical index for text. A small rerank model. A thin RAG service that binds them together. Graph methods stay out unless your domain depends on structured relations. Regular updates, a clean pipeline, and strong logging will give you a system that behaves well with 10M documents and grows without surprises.
2 more comments available on Hacker News