Last activity 4h agoPosted Nov 26, 2025 at 4:18 PM EST

A Distributed Inference Framework Enabling Running Models Exceeding Total Memory

driaforall

1 points

1 comments

Mood

informative

Sentiment

positive

Discussion Activity

Light discussion

First comment

N/A

Peak period

Hour 1

Avg / period

Key moments

01Story posted
Nov 26, 2025 at 4:18 PM EST
4h ago
Step 01
02First comment
Nov 26, 2025 at 4:18 PM EST
0s after posting
Step 02
03Peak activity
1 comments in Hour 1
Hottest window of the conversation
Step 03
04Latest activity
Nov 26, 2025 at 4:18 PM EST
4h ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

driaforallAuthor

4h ago

Today we are shipping dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory.

We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.

https://github.com/firstbatchxyz/dnet?tab=readme-ov-file

In alpha, we ship a pipelined-ring strategy inspired by PRIMA.CPP. dnet’s solver (distilp) extends it so devices can punch above memory: layers stream from disk mid-round and overlap with compute, so total model size can exceed total cluster RAM.

Please let us know if you have any questions or feedback!

View full discussion on Hacker News

ID: 46062439Type: storyLast synced: 11/26/2025, 9:20:07 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN