Not Hacker News Logo

Not

Hacker

News!

Home
Hiring
Products
Companies
Discussion
Q&A
Users
Not Hacker News Logo

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Hiring
  • Products
  • Companies
  • Discussion
  • Q&A

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.

Not Hacker News Logo

Not

Hacker

News!

Home
Hiring
Products
Companies
Discussion
Q&A
Users
  1. Home
  2. /Discussion
  3. /A Distributed Inference Framework Enabling Running Models Exceeding Total Memory
  1. Home
  2. /Discussion
  3. /A Distributed Inference Framework Enabling Running Models Exceeding Total Memory
Last activity 4h agoPosted Nov 26, 2025 at 4:18 PM EST

A Distributed Inference Framework Enabling Running Models Exceeding Total Memory

driaforall
1 points
1 comments

Mood

informative

Sentiment

positive

Category

startup_launch

Key topics

Distributed Systems
Machine Learning
AI Research

Distributed Inference: A Distributed Inference Framework for Running Models Exceeding Total Memory

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

N/A

Peak period

1

Hour 1

Avg / period

1

Key moments

  1. 01Story posted

    Nov 26, 2025 at 4:18 PM EST

    4h ago

    Step 01
  2. 02First comment

    Nov 26, 2025 at 4:18 PM EST

    0s after posting

    Step 02
  3. 03Peak activity

    1 comments in Hour 1

    Hottest window of the conversation

    Step 03
  4. 04Latest activity

    Nov 26, 2025 at 4:18 PM EST

    4h ago

    Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)
Showing 1 comments
driaforallAuthor
4h ago
Today we are shipping dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory.

We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.

https://github.com/firstbatchxyz/dnet?tab=readme-ov-file

In alpha, we ship a pipelined-ring strategy inspired by PRIMA.CPP. dnet’s solver (distilp) extends it so devices can punch above memory: layers stream from disk mid-round and overlap with compute, so total model size can exceed total cluster RAM.

Please let us know if you have any questions or feedback!

View full discussion on Hacker News
ID: 46062439Type: storyLast synced: 11/26/2025, 9:20:07 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read ArticleView on HN
Not Hacker News Logo

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Hiring
  • Products
  • Companies
  • Discussion
  • Q&A

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.