Not
Hacker
News
!
Home
Hiring
Products
Discussion
Q&A
Users
LLM Inference | Trending Topic on Hacker News | Not Hacker News!
Not
Hacker
News
!
Home
Hiring
Products
Discussion
Q&A
Users
Home
/
Discussion
/
LLM Inference
Back to Discussion
LLM Inference
Loading...
19 stories
•
24h:
0%
•
7d: 0
•
199 comments
Top contributors:
alexandercheema
CShorten
simonpure
jxmorris12
alecco
Stories
Related Stories
19 stories tagged with llm inference
Defeating Nondeterminism in LLM Inference
345
130 comments
by jxmorris12
Posted
4 months ago
Active
about 1 month ago
LLM inference
determinism
nondeterminism
AI
Adaptive-Learning Speculator System (atlas): Faster LLM Inference
198
47 comments
by alecco
Posted
3 months ago
Active
about 1 month ago
LLM inference
speculative decoding
AI optimization
Nvidia Dgx Spark and Apple MAC Studio = 4x Faster LLM Inference with Exo 1.0
61
20 comments
by edelsohn
Posted
3 months ago
Active
about 1 month ago
LLM Inference
Nvidia DGX
Apple Mac Studio
AI Hardware
Inferencer – Run and Deeply Control Local AI Models (macos Release)
15
1 comments
by xcreate
Posted
3 months ago
Active
about 1 month ago
Artificial Intelligence
macOS app
LLM inference
Clustering Nvidia Dgx Spark and M3 Ultra MAC Studio for 4x Faster LLM Inference
8
1 comments
by alexandercheema
Posted
3 months ago
Active
about 1 month ago
LLM Inference
Nvidia DGX
Apple M3 Ultra
Clustering Nvidia Dgx Spark and M3 Ultra MAC Studio for 4x Faster LLM Inference
5
0 comments
by alexandercheema
Posted
3 months ago
Active
about 1 month ago
LLM Inference
Nvidia DGX
Distributed Computing
T-Mac: Low-Bit LLM Inference on Cpu/npu with Lookup Table
5
0 comments
by nateb2022
Posted
3 months ago
Active
about 1 month ago
LLM inference
CPU/NPU optimization
AI acceleration
Benchmarking Prefill–decode Ratios: Fixed Vs. Dynamic
5
0 comments
by latchkey
Posted
3 months ago
Active
about 1 month ago
benchmarking
LLM inference
AI performance optimization
Interview with the Lead Author of Refrag (meta)
4
0 comments
by CShorten
Posted
2 months ago
Active
about 1 month ago
Vector Databases
LLM Inference
RAG-based Decoding
Where to Buy or Rent Gpus for LLM Inference: the 2026 GPU Procurement Guide
3
0 comments
by sherlockxu
Posted
2 months ago
Active
about 1 month ago
GPU procurement
LLM inference
AI hardware
cloud computing
Distributed Storage System to 8x LLM Inference, GPU Training Efficiency
3
0 comments
by hackerpanda123
Posted
2 months ago
Active
about 1 month ago
distributed storage
LLM inference
GPU training
Combining Nvidia Dgx Spark and Apple MAC Studio for 4x Faster LLM Inference
3
0 comments
by simonpure
Posted
3 months ago
Active
about 1 month ago
Nvidia DGX
Apple Mac Studio
LLM Inference
Hardware Acceleration
H100 Pcie – 1.86 Tb/s Memcpy Roofline and 8× Uplift
3
0 comments
by GPUrouter
Posted
4 months ago
Active
about 1 month ago
GPU Optimization
LLM Inference
CUDA Kernels
Tilert: Tile-Based Runtime for Ultra-Low-Latency LLM Inference
1
0 comments
by simonpure
Posted
about 1 month ago
Active
about 1 month ago
LLM inference
low-latency
tile-based runtime
AI optimization
Scheduling in LLM Inference
1
0 comments
by somnial
Posted
about 2 months ago
Active
about 1 month ago
LLM Inference
Scheduling Algorithms
Machine Learning
Refrag Explained
1
0 comments
by CShorten
Posted
3 months ago
Active
about 1 month ago
Vector Databases
LLM Inference
RAG Systems
AI Optimization
Reducing Cold Start Latency for LLM Inference with Nvidia Run:ai Model Streamer
1
0 comments
by tanelpoder
Posted
4 months ago
Active
about 1 month ago
LLM Inference
NVIDIA
AI Optimization
Vllm with Torch.compile: Efficient LLM Inference on Pytorch
1
0 comments
by matt_d
Posted
4 months ago
Active
about 1 month ago
PyTorch
LLM inference
AI optimization
machine learning
Vllm: Anatomy of a High-Throughput LLM Inference System
1
0 comments
by vinhnx
Posted
4 months ago
Active
about 1 month ago
LLM inference
high-throughput systems
AI optimization