I Built an Open-Weights Memory System That Reaches 80.1% on the Locomo Benchmark
Mood
informative
Sentiment
positive
Category
startup_launch
Key topics
Benchmark: LoCoMo (10 runs × 10 conversation sets) Average accuracy: 80.1% Setup: full isolation across all 10 conv groups (no cross-contamination, no shared memory between runs)
Architecture (all open weights except answer generation)
1. Dense retrieval
BGE-large-en-v1.5 (1024d)
FAISS IndexFlatIP
Standard BGE instruction prompt: “Represent this sentence for searching relevant passages.”
2. Sparse retrieval
BM25 via classic inverted index
Helps with low-embedding-recall queries and keyword-heavy prompts
3. MCA (Multi-Component Aggregation) ranking A simple gravitational-style score combining:
keyword coverage
token importance
local frequency signal MCA acts as a first-pass filter to catch exact-match questions. Threshold: coverage ≥ 0.1 → keep top-30
4. Union strategy Instead of aggressively reducing the union, the system feeds 112–135 documents directly to a re-ranker. In practice this improved stability and prevented loss of rare but crucial documents.
5. Cross-Encoder reranking
bge-reranker-v2-m3
Processes the full union (rare for RAG pipelines, but worked best here)
Produces a final top-k used for answer generation
6. Answer generation
GPT-4o-mini, used only for the final synthesis step
No agent chain, no tool calls, no memory-dependent LLM logic
Performance
<3 seconds per query on a single RTX 4090
Deterministic output between runs
Reproducible test harness (10×10 protocol)
Why this worked
Three things seemed to matter most:
MCA-first filter to stabilize early recall
Not discarding the union before re-ranking
Proper dense embedding instruction, which massively affects BGE performance
Notes
LoCoMo remains one of the hardest public memory benchmarks: 5,880 multi-hop, temporal, negation-rich QA pairs derived from human–agent conversations. Would be interested to compare with others working on long-term retrieval, especially multi-stage ranking or cross-encoder heavy pipelines.
Github: https://github.com/vac-architector/VAC-Memory-System
Memory System: I built an open-weights memory system that reaches 80.1% on the LoCoMo benchmark
Snapshot generated from the HN discussion
Discussion Activity
No activity data yet
We're still syncing comments from Hacker News.
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Discussion hasn't started yet.
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.