Nov 25, 2025 at 6:42 AM EST

Ask HN: Scaling local FAISS and LLM RAG system (356k chunks)architectural advice

1 points

0 comments

Mood

informative

Sentiment

neutral

Category

ask_hn

Key topics

Faiss

Rag

Local-Ai

Scaling

Architecture

I’ve been building a local-only AI assistant for security analysis that uses a FAISS vector index and a local model for reasoning over parsed tool output. The current system works well, but I’m running into scaling issues as the dataset grows. Current setup: ~356k chunks FAISS (Flat index) 384-d MiniLM embeddings llama-cpp-python for inference Metadata stored in a single pickle file (~1.5GB) Tool outputs (Nmap/YARA/Volatility/etc.) parsed into structured JSON before querying

Problems I’m running into:

Metadata pickle file loads entirely into RAM

No incremental indexing — have to rebuild the FAISS index from scratch

Query performance degrades with concurrent use

Want to scale to 1M+ chunks but not sure FAISS + pickle is the right long-term architecture

My questions for those who’ve scaled local or offline RAG systems:

How do you store metadata efficiently at this scale?

Is there a practical pattern for incremental FAISS updates?

Would a vector DB (Qdrant, Weaviate, Milvus) be a better fit for offline use?

Any lessons learned from running large FAISS indexes on consumer hardware?

Not looking for product feedback — just architectural guidance from people who’ve built similar systems.

Discussion Activity

Light discussion

First comment

1h

Peak period

1

Hour 2

Avg / period

1

Comment distribution2 data points

Loading chart...

Based on 2 loaded comments

Key moments

01Story posted
Nov 25, 2025 at 6:42 AM EST
4h ago
Step 01
02First comment
Nov 25, 2025 at 7:56 AM EST
1h after posting
Step 02
03Peak activity
1 comments in Hour 2
Hottest window of the conversation
Step 03
04Latest activity
Nov 25, 2025 at 10:45 AM EST
28m ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 46044872Type: storyLast synced: 11/25/2025, 11:44:08 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.