Adaptive LLM Routing Under Budget Constraints
Posted4 months agoActive4 months ago
arxiv.orgTechstoryHigh profile
calmmixed
Debate
60/100
LLM OptimizationCost ReductionAI Research
Key topics
LLM Optimization
Cost Reduction
AI Research
The paper proposes an adaptive LLM routing algorithm under budget constraints, sparking discussion on the effectiveness and practicality of such approaches in real-world applications.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
49m
Peak period
38
0-3h
Avg / period
8.7
Comment distribution78 data points
Loading chart...
Based on 78 loaded comments
Key moments
- 01Story posted
Sep 1, 2025 at 12:57 PM EDT
4 months ago
Step 01 - 02First comment
Sep 1, 2025 at 1:46 PM EDT
49m after posting
Step 02 - 03Peak activity
38 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 3, 2025 at 4:37 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45094421Type: storyLast synced: 11/20/2025, 7:50:26 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Aka Wisdom. No, LLMs don't have that. Me neither, I usually have to step in the rabbit holes in order to detect them.
Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.
But why does this paper impact your thinking on it? It is about budget and recognizing that different LLMs have different cost structures. It's not really an attempt to improve LLM performance measured absolutely.
It's mostly hand waving, hype and credulity, and unproven claims of scalability right now.
You can't move the goal posts because they don't exist.
Doesn't mean there aren't practical definitions depending on the context.
In essence, teaching an AI using recources meant for humans, and nothing more, would be considered AGI. That could be a practical definition, without needing much more rigour.
There is indeed no evidence we'll get there. But there is also no evidence LLM's should work as well as they do
It'll be a while until the ability to move the goalposts of "actual intelligence" is exhausted entirely.
And most would have accept the recommendation because the model sold it as less common tactic, while sounding very logical.
Once you've started to argue with an LLM you're already barking up the wrong tree. Maybe you're right, maybe not, but there's no point in arguing it out with an LLM.
So many people just want to believe, instead of the reality of LLMs being quite unreliable.
Personally it's usually fairly obvious to me when LLMs are bullshitting probably because I have lots of experience detecting it in humans.
In this case I just happened to be domain expert and knew it was wrong. It would have required significant effort to verify everything with some less experienced person.
And the kind of automation brought by LLMs is decidely different than automation in the past which almost always created new (usually better) jobs. LLMs won't do this (at least to extent where it would matter) I think. Most people in ten years will have worse jobs (more physically straining, longer hours, less pay) unless there will be a political intervention.
arxiv is essentially a blog under an academic format, popular amongst asian and south asian academic communities
currently you can launder reputation with it, just like “white papers” in the crypto world allowed for capital for some time
this ability will diminish as more people catch on
While technically true why would you want to use it when OpenAI itself provides a bunch of many times cheaper and better models?
To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output. Google takes steps to protect your privacy as part of this process. This includes disconnecting this data from your Google Account, API key, and Cloud project before reviewers see or annotate it. Do not submit sensitive, confidential, or personal information to the Unpaid Services."
Reference: https://ai.google.dev/gemini-api/terms
So the answer is no then, because I don't put reasoning and non-reasoning models in the same ballpark when it comes to token usage. You can just turn off reasoning.
I heard the best way is through valuations
Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)
Academics are pretty creative at naming their creations
So far, my experience has been that it's just too early for most people / applications to worry about cost - at most, I've seen AI to be accountable for 10% of cloud costs. But very curious if others have other experiences.
Obviously we don't use the super expensive ones like GPT4.5 or so. But we don't really bother with mini models, because GPT4.1 etc.. are cheap enough.
Stuff like speech to text etc.. are still way more expensive, and yes there we do focus on cost optimization. We have no large scale image generation use cases (yet)