Refrag Explained

Posted3 months ago

CShorten

1 points

0 comments

Techstory

excitedpositive

Debate

0/100

Vector DatabasesLLM InferenceRag SystemsAI Optimization

Key topics

Vector Databases

LLM Inference

Rag Systems

AI Optimization

REFRAG from Meta Superintelligence Labs is a SUPER exciting breakthrough that may spark the second summer of Vector Databases! REFRAG illustrates how Database Systems are becoming even more integral to LLM inference!

By making clever use of how context vectors are integrated with LLM decoding, REFRAG is able to make TTFT (Time-to-First-Token) 31X faster and TTIT (Time-to-Iterative-Token) 3X faster, overall improving LLM throughput by 7x!! REFRAG is also able to process much longer input contexts than standard LLMs!

How does it work?

Most of the RAG systems today that are built with Vector Databases, such as Weaviate, throw away the associated vector with retrieved search results, only making use of the text content. REFRAG instead passes these vectors to the LLM, instead of the text content!

This is further enhanced with a fine-grained chunk encoding strategy, and a 4-stage training algorithm that includes a selective chunk expansion policy trained with GRPO / PPO.

Here is my review of the paper! I hope you find it useful!

YouTube: https://www.youtube.com/watch?v=Ek0tZootK00

The post enthusiastically introduces REFRAG, a breakthrough from Meta Superintelligence Labs that significantly improves LLM inference speed and throughput, and provides a YouTube review of the paper.

Snapshot generated from the HN discussion

Discussion Activity

No activity data yet

We're still syncing comments from Hacker News.

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 45504378Type: storyLast synced: 11/17/2025, 11:08:16 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

View on HN