Not

Hacker

News!

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

Home
Hiring
Products
Companies
Discussion
Q&A
Privacy Policy

Resources

Visit Hacker News
HN API
Modal cronjobs
Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2026 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.

LLM Inference | Trending Topic on Hacker News | Not Hacker News!

Not

Hacker

News!

Home
Discussion
LLM Inference

LLM Inference

19 stories

•

24h: 0%

•

7d: 0

•

199 comments

Top contributors:alexandercheema CShorten simonpure jxmorris12 alecco

Stories

Related Stories

19 stories tagged with llm inference

Defeating Nondeterminism in LLM Inference

345130 commentsby jxmorris12

Posted4 months agoActiveabout 1 month ago

Adaptive-Learning Speculator System (atlas): Faster LLM Inference

19847 commentsby alecco

Posted3 months agoActiveabout 1 month ago

Nvidia Dgx Spark and Apple MAC Studio = 4x Faster LLM Inference with Exo 1.0

6120 commentsby edelsohn

Posted3 months agoActiveabout 1 month ago

Inferencer – Run and Deeply Control Local AI Models (macos Release)

151 commentsby xcreate

Posted3 months agoActiveabout 1 month ago

Clustering Nvidia Dgx Spark and M3 Ultra MAC Studio for 4x Faster LLM Inference

81 commentsby alexandercheema

Posted3 months agoActiveabout 1 month ago

Clustering Nvidia Dgx Spark and M3 Ultra MAC Studio for 4x Faster LLM Inference

50 commentsby alexandercheema

Posted3 months agoActiveabout 1 month ago

T-Mac: Low-Bit LLM Inference on Cpu/npu with Lookup Table

50 commentsby nateb2022

Posted3 months agoActiveabout 1 month ago

Benchmarking Prefill–decode Ratios: Fixed Vs. Dynamic

50 commentsby latchkey

Posted3 months agoActiveabout 1 month ago

Interview with the Lead Author of Refrag (meta)

40 commentsby CShorten

Posted2 months agoActiveabout 1 month ago

Where to Buy or Rent Gpus for LLM Inference: the 2026 GPU Procurement Guide

30 commentsby sherlockxu

Posted2 months agoActiveabout 1 month ago

Distributed Storage System to 8x LLM Inference, GPU Training Efficiency

30 commentsby hackerpanda123

Posted2 months agoActiveabout 1 month ago

Combining Nvidia Dgx Spark and Apple MAC Studio for 4x Faster LLM Inference

30 commentsby simonpure

Posted3 months agoActiveabout 1 month ago

H100 Pcie – 1.86 Tb/s Memcpy Roofline and 8× Uplift

30 commentsby GPUrouter

Posted4 months agoActiveabout 1 month ago

Tilert: Tile-Based Runtime for Ultra-Low-Latency LLM Inference

10 commentsby simonpure

Postedabout 1 month agoActiveabout 1 month ago

Scheduling in LLM Inference

10 commentsby somnial

Postedabout 2 months agoActiveabout 1 month ago

Refrag Explained

10 commentsby CShorten

Posted3 months agoActiveabout 1 month ago

Reducing Cold Start Latency for LLM Inference with Nvidia Run:ai Model Streamer

10 commentsby tanelpoder

Posted4 months agoActiveabout 1 month ago

Vllm with Torch.compile: Efficient LLM Inference on Pytorch

10 commentsby matt_d

Posted4 months agoActiveabout 1 month ago

Vllm: Anatomy of a High-Throughput LLM Inference System

10 commentsby vinhnx

Posted4 months agoActiveabout 1 month ago

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

Home
Hiring
Products
Companies
Discussion
Q&A
Privacy Policy

Resources

Visit Hacker News
HN API
Modal cronjobs
Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2026 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.