Control LLM Spend and Access with Any-LLM-Gateway
Postedabout 2 months agoActiveabout 2 months ago
blog.mozilla.aiTechstory
calmpositive
Debate
0/100
Large Language ModelsArtificial IntelligenceAPI Gateway
Key topics
Large Language Models
Artificial Intelligence
API Gateway
The post introduces any-LLM-gateway, a tool to control LLM spend and access, with no discussion or comments provided.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
7d
Peak period
25
168-180h
Avg / period
25
Comment distribution25 data points
Loading chart...
Based on 25 loaded comments
Key moments
- 01Story posted
Nov 12, 2025 at 1:06 PM EST
about 2 months ago
Step 01 - 02First comment
Nov 19, 2025 at 2:01 PM EST
7d after posting
Step 02 - 03Peak activity
25 comments in 168-180h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 20, 2025 at 12:12 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45903485Type: storyLast synced: 11/20/2025, 7:50:24 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
litellm is a great library, but one team using litellm-proxy reported having many issues with it to me. I haven't tried it yet.
This one has very little on monitoring and no reference to OTEL in the docs
We are actively looking to switch away from it, so it was nice to stumble on a post like this. Something so simple as a proxy with budgeting for keys should not be such a tangled mess.
Bugs include but are not limited to multiple ways budget limits aren't enforced, parameter handling issues, configuration / state mismatches etc...
What makes this worse is if you come to the devs with the problem, a solution and even a PR it's very difficult to get them to understand or action it - let alone see critical things like major budget blowouts as a priority.
This is a classic case of an over enthusiastic engineer who says yes / raises hand to everything, but doesnt do any one thing properly. At some point, you have to sit down and tell them to focus on one thing and do it properly.
It shows how to use it async or sync, and even handles using async in a sync context.
It's hard to write a good CLI without also writing most of a Python API, and llm went the rest of the way by documenting it. I think llm has the best docs of the Python API of the three.
I couldn't find something, so I rolled a version together based on redis and job queues. It works decently well, but I'd prefer to use something better if it exists.
Does anyone know of something like this that isn't completely over engineered / abstracted?
1 more comments available on Hacker News