Back to Home11/19/2025, 10:34:35 AM

Show HN: Small hardware box that runs local LLMs and exposes an OpenAI API

mjupp1

2 points

0 comments

Mood

calm

Sentiment

positive

Category

tech

Key topics

LLMs

AI Hardware

OpenAI API

I’ve been building a small hardware box that runs local LLMs like Mistral, Qwen and Llama, and exposes an OpenAI compatible API on your local network. There’s no cloud, no login system and no telemetry. I built it because a lot of small firms want ChatGPTbstyle tools but can’t use cloud AI for privacy or compliance reasons, and most don’t want to deal with GPU servers, drivers, Docker or model configs. The aim is to make local AI feel as simple as plugging in a router.

Right now the box boots into a very simple web UI where you choose a model and start using it. The API follows the OpenAI format for chat completions and embeddings. It can run different models depending on the hardware you pick, either a Jetson Orin Nano or an x86 mini-PC with a GPU. It stores data locally, supports basic RAG indexing and only exposes itself on the LAN by default.

A few things still aren’t working. There’s no multi-user rate limiting yet. The RAG quality is basic and I’m still improving chunking and reranking. The Orin runs hot under heavy load, so thermal performance needs work. It’s also still a prototype rather than a finished consumer product.

On the technical side, it runs containerized model servers using Ollama and some custom runners. Models load through GGUF or TensorRT-LLM depending on the hardware. The API layer follows the OpenAI spec. The RAG pipeline uses local embeddings and a vector database. The software stack is a mix of TypeScript and Python.

I’m looking for feedback from anyone who has built or deployed local inference before. I’m trying to understand what thermal and power issues you’ve run into, whether a drop-in OpenAI compatible box is actually useful to small teams, what hardware setups I should consider, and any honest critiques of the idea.

The author is showcasing a small hardware box that runs local LLMs and exposes an OpenAI API, but there is no discussion or feedback from the community yet.

Snapshot generated from the HN discussion

Discussion Activity

No activity data yet

We're still syncing comments from Hacker News.

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 45977950Type: storyLast synced: 11/19/2025, 1:00:08 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN