The Fluid Substrate: Streaming 1tb Models From Nvme via Io_uring

Posted24 days ago

Doug_Bitterbot

2 points

1 comments

zenodo.orgResearchstory

informativeneutral

Debate

0/100

High-Performance ComputingStorage OptimizationAI Research

Key topics

High-Performance Computing

Storage Optimization

AI Research

Discussion Activity

Light discussion

First comment

N/A

Peak period

Start

Avg / period

Key moments

01Story posted
Dec 9, 2025 at 4:33 PM EST
24 days ago
Step 01
02First comment
Dec 9, 2025 at 4:33 PM EST
0s after posting
Step 02
03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03
04Latest activity
Dec 9, 2025 at 4:33 PM EST
24 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

Doug_BitterbotAuthor

24 days ago

We’ve been working on Federated Learning (FL) for autonomous agents and hit a hard bottleneck: standard FL (sending gradients back and forth) is bandwidth-heavy and requires massive edge compute.

We wrote this paper to propose an architectural shift we call Fluid Federated Learning (FFL).

The core engineering contributions are:

Prism Protocol: We implemented a "Software-Defined Memory" architecture. It uses io_uring to stream sparse, random projections of model weights directly from NVMe storage to the GPU.

This allows us to process "Virtual Batches" of terabyte-scale models on commodity hardware by exploiting the Johnson-Lindenstrauss lemma (Holographic Slicing).

Federated State-Space Duality (F-SSD): Instead of averaging gradients (which is slow and leaky), we exploit the duality between Transformers and SSMs (like Mamba) to federate the Recurrent States.

The Result: We can run massive foundation models on edge devices with limited VRAM by treating the SSD as a "slow" memory tier without destroying optimization fidelity.

I’m curious if anyone here has experimented with io_uring for model serving? We found the async I/O overhead to be negligible compared to the memory gains, but wondering if there are better ways to handle the sparse projections.

Happy to answer questions on the implementation.

View full discussion on Hacker News

ID: 46210986Type: storyLast synced: 12/9/2025, 10:50:31 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN