Not Hacker News Logo

Not

Hacker

News!

Home
Hiring
Products
Companies
Discussion
Q&A
Users
Not Hacker News Logo

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Hiring
  • Products
  • Companies
  • Discussion
  • Q&A

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.

Not Hacker News Logo

Not

Hacker

News!

Home
Hiring
Products
Companies
Discussion
Q&A
Users
  1. Home
  2. /Discussion
  3. /80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline
  1. Home
  2. /Discussion
  3. /80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline
23h agoPosted Nov 25, 2025 at 9:17 PM EST

80.1 % on Locomo Long-Term Memory Benchmark with a Pure Open-Source Rag Pipeline

ViktorKuz
1 points
0 comments

Mood

excited

Sentiment

positive

Category

research

Key topics

Long-Term Memory Benchmark
Rag Pipeline
AI Research
Agent Memory Systems
I just pushed the current SOTA on the LoCoMo long-term memory benchmark for agents: 80.1 % accuracy using only:

-BGE-large-en-v1.5 (1024d) + FAISS -Custom “MCA” gravitational ranking (keyword coverage + importance + frequency) -BM25 sparse retrieval -Direct Cross-Encoder reranking (bge-reranker-v2-m3) on the full union (~120-150 docs) -Gpt-4o-mini only for final answer generation and judging (everything else is open weights or classic)

Repo: https://github.com/vac-architector/VAC-Memory-System Key tricks that finally broke 80% : -MCA-first filter (coverage ≥ 0.1 → top-30) — catches exact-keyword questions early -Feeding the entire union straight into Cross-Encoder (112–135 documents) instead of pre-filtering -Proper query instruction for BGE-large (the classic “Represent this sentence for searching relevant passages”) The whole pipeline runs in < 3s per query on a single RTX 4090. LoCoMo is currently the hardest public long-term memory benchmark (5.880 real human–agent conversations, multi-hop, temporal, negation, etc.).

Beating Mem0 official baseline by ~12–14 pp with fully open components feels pretty good. Would love feedback, especially from people who are also grinding on agent memory systems.

Discussion Activity

No activity data yet

We're still syncing comments from Hacker News.

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 46053464Type: storyLast synced: 11/26/2025, 2:18:08 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

View on HN
Not Hacker News Logo

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Hiring
  • Products
  • Companies
  • Discussion
  • Q&A

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.