A Guide to Local Coding Models

Posted12 days agoActive10 days ago

mpweiher

597 points

350 comments

aiforswes.comTech DiscussionstoryHigh profile

informativepositive

Debate

20/100

Coding ParadigmsExpense_managementAI

Key topics

Coding Paradigms

Expense_management

The debate around local coding models is heating up, with a recent guide sparking discussion on the cost-effectiveness of self-hosting versus relying on cloud services like Claude. Commenters are sharing their experiences and tips on setting up local models, with some suggesting affordable hardware configurations, such as dual 3060 GPUs, to achieve decent performance. While some users are optimistic about the potential of local models, others point out that their effectiveness depends on the specific use case and data they're trained on, with those farther from the training data requiring more specificity to get good results. As the capabilities of local models continue to improve, the conversation is highlighting the trade-offs between cost, performance, and customization.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

33m

Peak period

0-6h

Avg / period

17.8

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 21, 2025 at 3:55 PM EST
12 days ago
Step 01
02First comment
Dec 21, 2025 at 4:28 PM EST
33m after posting
Step 02
03Peak activity
73 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 24, 2025 at 1:15 AM EST
10 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (350 comments)

Showing 160 comments of 350

nzeid

12 days ago

1 reply

I appreciate the author's modesty but the flip-flopping was a little confusing. If I'm not mistaken, the conclusion is that by "self-hosting" you save money in all cases, but you cripple performance in scenarios where you need to squeeze out the kind of quality that requires hardware that's impractical to cobble together at home or within a laptop.

I am still toying with the notion of assembling an LLM tower with a few old GPUs but I don't use LLMs enough at the moment to justify it.

a_victorp

12 days ago

3 replies

If you ever do it, please make a guide! I've been toying with the same notion myself

suprjami

12 days ago

1 reply

If you want to do it cheap, get a desktop motherboard with two PCIe slots and two GPUs.

Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16 tok/sec. The limitation is VRAM for large context. 1000 lines of code is ~20k tokens. 32k tokens is is ~10G VRAM.

Expensive tier is dual 3090 or 4090 or 5090. You'd be able to run 32B Q8 with large context, or a 70B Q6.

For software, llama.cpp and llama-swap. GGUF models from HuggingFace. It just works.

If you need more than that, you're into enterprise hardware with 4+ PCIe slots which costs as much as a car and the power consumption of a small country. You're better to just pay for Claude Code.

le-mark

12 days ago

1 reply

I was going to post snark such as “you could use the same hardware to also lose money mining crypto” then realized there are a lot of crypto miners out their that could probably make more money running tokens then they do on crypto. Does such a market place exist?

hackstack

12 days ago

2 replies

This is essentially vast.ai, no?

MrDrMcCoy

12 days ago

1 reply

A quick glance at their homepage says they run in "secure datacenters", so no.

gkbrk

12 days ago

1 reply

Then you glanced too quickly, vast.ai absolutely has non-datacenter GPUs.

https://vast.ai/hosting#gpu-farms-homelabs

MrDrMcCoy

11 days ago

Very interesting, thanks! Definitely something to consider for my environment.

le-mark

12 days ago

No I’m implying a service that would match crypto miners (or anyone with a bunch of gpus) and people looking to rent gpus.

satvikpendem

12 days ago

1 reply

Jeff Geerling has (not quite but sort of) guides: https://news.ycombinator.com/item?id=46338016

a96

12 days ago

Also worth looking is stuff from Donato Capitella : https://github.com/kyuz0 https://www.youtube.com/@donatocapitella https://llm-chronicles.com/ etc

whitehexagon

12 days ago

SimonW used to have more articles/guides on local LLM setup, at least until he got the big toys to play with, but well worth looking through his site. Although if you are in parts of Europe, the site is blocked at weekends, something to do with the great-firewall of streamed sports.

https://simonwillison.net/

Indeed, his self hosting inspired me to get Qwen3:32B ollama working locally. Fits nicely on my M1 pro 32GB (running Asahi). Output is a nice read-along speed and I havent felt the need for anything more powerful.

I'd be more tempted with a maxed out M2 Ultra as an upgrade, vs tower with dedicated GPU cards. The unified memory just feels right for this task. Although I noticed the 2nd hand value of those machine jumped massively in the last few months.

I know that people turn their noses up at local LLM's, but it more than does the job for me. Plus I decided a New Years Resolution of no more subscriptions / Big-AdTech freebies.

cloudhead

12 days ago

1 reply

In my experience the latest models (Opus 4.5, GPT 5.2) Are _just_ starting to keep up with the problems I'm throwing at them, and I really wish they did a better job, so I think we're still 1-2 years away from local models not wasting developer time outside of CRUD web apps.

OptionOfT

12 days ago

1 reply

Eh, these things are trained on existing data. The further you are from that the worse the models get.

I've noticed that I need to be a lot more specific in those cases, up to the point where being more specific is slowing me down, partially because I don't always know what the right thing is.

cloudhead

12 days ago

1 reply

For sure, and I guess that's kind of my point -- if the OP says local coding models are now good enough, then it's probably because he's using things that are towards the middle of the distribution.

dkdcio

12 days ago

1 reply

similar for me —- also how do you get the proper double dashes —- anyway, I’d love to be able to run CLI agents fully local, but I don’t see it being good enough (relative to what you can get for pretty cheap from SOTA models) anytime soon

cloudhead

12 days ago

1 reply

What’s wrong with your keyboard haha

dkdcio

12 days ago

iphone :/ I see others with the same problem too, oh well, at least people won’t accuse me of being an LLM probably

simonw

12 days ago

29 replies

> I realized I looked at this more from the angle of a hobbiest paying for these coding tools. Someone doing little side projects—not someone in a production setting. I did this because I see a lot of people signing up for $100/mo or $200/mo coding subscriptions for personal projects when they likely don’t need to.

Are people really doing that?

If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic. The OpenAI one in particular is a great deal, because Codex is charged a whole lot lower than Claude.

The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

hamdingers

12 days ago

1 reply

And as a hobbyist the time to sign up for the $20/month plan is after you've spent $20 on tokens at least a couple times.

YMMV based on the kinds of side projects you do, but it's definitely been cheaper for me in the long run to pay by token, and the flexibility it offers is great.

iOSThrowAway

12 days ago

I spent $240 in one week through the API and realized the $20/month was a no-brainer.

__mharrison__

12 days ago

2 replies

I'm convinced the $20 gpt plus plan is the best plan right now. You can use Codex with gpt5.2. I've been very impressed with this.

(I also have the same MBP the author has and have used Aider with Qwen locally.)

baq

12 days ago

1 reply

bit the bullet this week and paid for a month of claude and a month of chatgpt plus. claude seems to have much lower token limits, both aggregate and rate-limited and GPT-5.2 isn't a bad model at all. $20 for claude is not enough even for a hobby project (after one day!), openai looks like it might be.

InsideOutSanta

12 days ago

1 reply

I feel like a lot of the criticism the GPT-5.x models receive only applies to specific use cases. I prefer these models over Anthropic's because they are less creative and less likely to take freedoms interpreting my prompts.

Sonnet 4.5 is great for vibe coding. You can give it a relatively vague prompt and it will take the initiative to interpret it in a reasonable way. This is good for non-programmers who just want to give the model a vague idea and end up with a working, sensible product.

But I usually do not want that, I do not want the model to take liberties and be creative. I want the model to do precisely what I tell it and nothing more. In my experience, te GPT-5.x models are a better fit for that way of working.

deaux

12 days ago

A lot of the criticism from GPT-5.x models stems from the fact they're dog slow so you end up paying with your own time.

andix

12 days ago

1 reply

From my personal experience it's around 50:50 between Claude and Codex. Some people strongly prefer one over the other. I couldn't figure out yet why.

I just can't accept how slow codex is, and that you can't really use it interactively because of that. I prefer to just watch Claude code work and stop it once I don't like the direction it's taking.

asabla

12 days ago

From my point of view, you're either choosing between instruction following or more creative solutions.

Codex models tend to be extremely good at following instructions, to the point that it won't do any additional work unless you ask it to. GPT-5.1 and GPT-5.2 on the other hand is a little bit more creative.

Models from Anthropics on the other hand is a lot more loosy goosy on the instructions, and you need to keep an eye on it much more often.

I'm using models interchangeably from both providers all the time depending on the task at hand. No real preference if one is better then the other, they're just specialized on different things

wyre

12 days ago

1 reply

Me. Currently using Claude Max for personal coding projects. I've been on Claude's $20 plan and would run out of tokens. I don't want to give my money to OpenAI. So far these projects have not returned their value back to me, but I am viewing it as an investment in learning best pratices with these coding tools.

ssss11

12 days ago

Me too. I couldn’t build an app that I hope to publish with the $20 plan. The sunk cost will either be reaped back once live, or it’s truly sunk and I’ll move on…..

satvikpendem

12 days ago

2 replies

> If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic.

> The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

These are the same people, by and large. What I have seen is users who purely vibe code everything and run into the limits of the $20/m models and pay up for the more expensive ones. Essentially they're trading learning coding (and time, in some cases, it's not always faster to vibe code than do it yourself) for money.

maddmann

12 days ago

2 replies

If this is the new way code is written then they are arguably learning how to code. Jury is still out though, but I think you are being a bit dismissive.

satvikpendem

12 days ago

I wouldn't change definitions like that just because the technology changed, I'm talking about the ability to analyze control flow and logic, not necessarily put code on the screen. What I've seen from most vibe coders is that they don't fully understand what's going on. And I include myself, I tried it for a few months and the code was such garbage after a while that I scrapped it and redid it myself.

dns_snek

12 days ago

[delayed]

cmrdporcupine

12 days ago

4 replies

I've been a software developer for 25 years, and 30ish years in the industry, and have been programming my whole life. I worked at Google for 10 of those years. I work in C++ and Rust. I know how to write code.

I don't pay $100 to "vibe code" and "learn to program" or "avoid learning to program."

I pay $100 so I can get my projects done without having to hire people.

beepbooptheory

12 days ago

3 replies

Why would you ever hire someone to help with a personal open source project?

fragmede

12 days ago

1 reply

because we want to support open source? Even if you're independence maximalist, you still pay other people in your life to do things for you at some point. If you've got the money and the desire but not the time, why does that not seem reasonable to you?

cmrdporcupine

12 days ago

Frankly I almost consider it a duty to use these agents -- which have harvested en masse from open source software (including GPL!) without permission -- to produce open source / free software.

Restoring a bit of balance to things.

wredcoll

12 days ago

Depends on if the goal is to solve a problem (by writing code) or the goal is to write code (maybe solving a problem)

cmrdporcupine

12 days ago

I wouldn't, but I can pay Claude

satvikpendem

12 days ago

2 replies

I'm talking about the general trend, not the exceptions. How much of the code do you manually write with the 100 dollar subscription? Vibe coding is a descriptive, not a prescriptive, label.

cmrdporcupine

12 days ago

1 reply

"How much of the code do you manually write"

I review all of it, but hand write little of it. It's bizarre how I've ended up here, but yep.

That said, I wouldn't / don't trust it with something from scratch, I only trust it to do that because I built -- by hand -- a decent foundation for it to start from.

satvikpendem

12 days ago

1 reply

Sure, you're like me, you're not a vibe coder by the actual definition then. Still, the general trend I see is that a lot of actual vibe coders do try to get their product working, code quality be damned. Personally, same as you, I stopped vibe coding and actually started writing a lot of architecture and code myself first then allowing the LLM to fill in the features so to speak.

kasey_junk

12 days ago

1 reply

The issue is that your claim was that if you are using up tokens you are probably vibe coding.

But I’ve not found that to be true at all. My actually engineered processes where I care the most is where I push tokens the hardest. Mostly because I’m using llms in many places in the sdlc.

When I’m vibing it’s just a single agent sort of puttering along. It uses much fewer tokens.

satvikpendem

11 days ago

> The issue is that your claim was that if you are using up tokens you are probably vibe coding.

I said "by and large" ie generally speaking. As I mentioned before, the exception does not invalidate the trend. I assume HN is more heavily weighted towards non-vibe-coders using up tokens like me and you but again, that's the exception to what I see online elsewhere.

someguyiguess

11 days ago

How much assembly do you manually write?

Programming has always been about levels of abstraction, and the people who see LLM-generated code as “cheating” are the same people who argued that you can’t write good code with a compiler. Luddites, who will time-and-time again be proven wrong by the passage of time.

codetiger

12 days ago

Came here to write something similar (Of course, other than working in Google) and saw your comments reflecting my views. Yes, Its worth pending $200/month on Claude to get my personal project ideas come to life with better quality and finish.

calenti

11 days ago

Well you did hire some(thing)...for $100/month.

smcleod

12 days ago

2 replies

On a $20/mo plan doing any sort of agentic coding you'll hit the 5hr window limits in less than 20 minutes.

andix

12 days ago

1 reply

It really depends. When building a lot of new features it happens quite fast. With some attention to context length I was often able to go for over an hour on the 20$ claude plan.

If you're doing mostly smaller changes, you can go all day with the 20$ Claude plan without hitting the limits. Especially if you need to thoroughly review the AI changes for correctness, instead of relying on automated tests.

allenu

12 days ago

I find that I use it on isolated changes where Claude doesn’t really need to access a ton of files to figure out what to do and I can easily use it without hitting limits. The only time I hit the 4-5 hour limit is when I’m going nuts on a prototype idea and vibe coding absolutely everything, and usually when I hit the limit, I’m pretty mentally spent anyway so I use it as a sign to go do something else. I suppose everyone has different styles and different codebases, but for me I can pretty easily stay under the limit without that it’s hard to justify $100 or $200 a month.

simonw

12 days ago

2 replies

With Codex it only happened to me once in my 4.5hr session here: https://simonwillison.net/2025/Dec/15/porting-justhtml/

Claude Code is a whole lot less generous though.

alostpuppy

12 days ago

1 reply

For sure. On one project I kept using codex just to see where the wall was. Took a long time.

deaux

12 days ago

It helps that Codex is so much slower than Anthropic models, a 4.5 hours Codex session might as well be a 2 hour Claude Code one. I use both extensively FWIW.

stuaxo

12 days ago

This is useful info.

I havent tried agentic coding as I havent set it up in a container yet, and not going to yolo my system (doing stuff via chat and a utility to copy and paste directories and files got me pretty far over the last year and a half).

haritha-j

12 days ago

1 reply

I’ve been using vs code copilot pro for a few months and never really had any issue, once you hit the limit for one model, you generally still have a bunch more models to choose from. Unless I was vibe coding massive amounts of code without looking to testing, it’s hard to imagine I will run out of all the available pro models.

deaux

12 days ago

1 reply

Copilot Pro works with a total requests budget rather than per-model limits unless something changed. Could you explain?

haritha-j

12 days ago

Oh wow, you're absolutely correct. In my head i recall this being different, I think i've confused myself about either when I was trialling antigravity, or the system they had earlier in this year where you would get notifications that you've used up a given model, at least for a limited time. I feel like the latter was a thing, but you've now made me question my memory, so wouldn't swear by it.

minimaxir

12 days ago

1 reply

Claude 4.5 Opus on Claude Code's $20 plan is funny because you get about 3 prompts before you hit the session limit.

If I wasn't only using it for side projects I'd have to cough up the $200 out of necessity.

port3000

12 days ago

Just get the $100 plan? (5X). I code most of the day and hit the 5-hour limit a couple of times a week, and never hit the weekly limit.

uneekname

12 days ago

2 replies

Yes, we are doing that. These tools help make my personal projects come to life, and the money is well worth it. I can hit Claude Code limits within an hour, and there's no way I'm giving OpenAI my money.

_delirium

12 days ago

1 reply

I've found I can do a few hours a day on the $20/mo Google plan. I don't think Gemini is quite as good as Claude for my uses, but it's good enough and you get a lot of use of even gemini-3-pro for your $20.

deaux

12 days ago

4 replies

Huge caveat: For the $20/mo subscription Google hasn't made clear if they train on your data.

https://geminicli.com/docs/faq/

> What is the privacy policy for using Gemini Code Assist or Gemini CLI if I’ve subscribed to Google AI Pro or Ultra?

> To learn more about your privacy policy and terms of service governed by your subscription, visit Gemini Code Assist: Terms of Service and Privacy Policies.

> https://developers.google.com/gemini-code-assist/resources/p...

The last page only links to generic Google policies. If they didn't train on it, they could've easily said so, which they've done in other cases - e.g. for Google Studio and CLI they clearly say "If you use a billed API key we don't train, else we train". Yet for the Pro and Ultra subscriptions they don't say anything.

If any Googlers read this, and you don't train on paying Pro/Ultra, you need to state this clearly somewhere as you've done with other products. Until then the assumption should be that you do train on it.

versteegen

12 days ago

2 replies

BTW if you use Gemini via Github Copilot (the $10/mo plan is good value for money), Google does not train on user data:

https://docs.github.com/en/copilot/reference/ai-models/model...

ayewo

12 days ago

1 reply

Thanks for those links. GitHub Copilot looks like a good deal at $10/mo for a range of models.

I originally thought they only supported the previous generation models i.e. Claude Opus 4.1 and Gemini 2.5 Pro based on the copy on their pricing page [1] but clicking through [2] shows that they support far more models.

[1] https://github.com/features/copilot#pricing

[2] https://github.com/features/copilot/plans#compare

versteegen

11 days ago

Yes, it's a great deal especially because you get access to such a wide range of models, including some free ones, and they only rate limit for a couple minutes at a time, not 5 hours. And if you go over the monthly limit you can just buy more at $0.04 a request instead of needing to switch to a higher plan. The big downside is the 128k context windows.

Lately Copilot have been getting access to new frontier models the same day they release elsewhere. That wasn't the case months ago (GPT 5.1). But annoyingly you have to explicitly enable each new model.

deaux

11 days ago

Yeah Github of course has proper enterprise agreements with all the models they offer and they include a no-training clause. The $10/mo plan is probably the best value for money out there currently along with Codex $20/mo (if you can live with GPT's speed).

lostmsu

12 days ago

1 reply

[delayed]

deaux

11 days ago

I was only talking about training so you're probably right about retention - I care more about training.

w23j

12 days ago

That's the main reason, why I hope Google does not win this AI war.

_delirium

12 days ago

That's good to know, thanks. In my case nearly 100% of my code ends up public on GitHub, so I assume everyone's code models are training on it anyway. But would be worth considering if I had proprietary codebases.

someguyiguess

11 days ago

My thoughts exactly. The $100 Claude subscription is the sweet spot for me. I signed up for the $20 at first and got irritated constantly hitting access limits. Then I bought the $200 subscription but never even hit 1/4 of my allocation. So the $100 would be perfect.

And this is for hobby / portfolio projects.

mudkipdev

12 days ago

5 replies

Claude's $20 plan should be renamed to "trial plan". Try Opus and you will reach your limit in 10 minutes. With Sonnet, if you aren't clearing the context very often, you'll hit it within a few hours.

lelele

12 days ago

2 replies

> With Sonnet, if you aren't clearing the context very often, you'll hit it within a few hours.

Do you mean that users should start a new chat for every new task, to save tokens? Thanks.

jfreds

12 days ago

1 reply

Short answer is yes. Not only is it more token-friendly and potentially lower latency, it also prevents weird context issues like forgetting Rules, compacting your conversation and missing relevant details, etc.

bitexploder

12 days ago

Yep. I have Claude snapshot to a markdown doc with key points and resume and iterate. Saves so many tokens.

stuaxo

12 days ago

Yes, it also helps keep it focused.

kxrm

12 days ago

1 reply

> Try Opus and you will reach your limit in 10 minutes.

That hasn't been true with Opus 4.5. I usually hit my limit after an hour of intense sessions.

deaux

12 days ago

1 reply

Daily limit? Weekly limit? Hitting a weekly limit after an hour still doesn't seem very productive.

throwthrowuknow

12 days ago

1 reply

Session limit that resets after 5 hours timed from the first message you sent. Most people I’ve seen report between 1 to 2 hours of dev time using Opus 4.5 on the Pro plan before hitting it unless you’re feeding in huge files and doing a bad job of managing your context.

deaux

12 days ago

1 reply

Okay, that sounds pretty reasonable for a $20 subscription.

throwthrowuknow

12 days ago

Yeah it’s really not too bad but it does get frustrating when you hit the session limit in the middle of something. I also add $20 of extra usage so I can finish up the work in progress cleanly and have Opus create some notes so we can resume when the session renews. Gotta be careful with extra usage though because you can easily use it up if the context is getting full so it’s best to try to work in small independent chunks and clear the context after each. It’s more work but helps both with usage and Opus performs better when you aren’t pushing the context window to the max.

throwthrowuknow

12 days ago

3 replies

I half agree, but it should be called “Hobbiest” since that’s what it’s good for. 10 minutes is hyperbolic, I average 1h30m even when using plan mode first and front loading the context with dev diaries, git history, milestone documents and important excerpts from previous conversations. Something tells me your modules might be too big and need refactoring. That said, it’s a pain having to wait hours between sessions and jump when the window opens to make sure I stay on schedule and can get three in a day but that works ok for hobby projects since I can do other things in between. I would agree that if you’re using it for work you absolutely need Max so that should be what’s called the Pro plan but what can you do? They chose the names so now we just need to add disclaimers.

lodovic

12 days ago

2 replies

I actually get more mileage out of Claude using a Github Copilot subscription. The regulare Claude will give me an hour 90 minutes max before it reaches the cap. The Github version has a monthly limit for Claude requests (100 "premium requests") which I find mmuch easier to manage. I was about to switch to the max plan but this setup (both Claude pro and Github Copilot, totalling to 30 per month) was just enough for my needs. With a bonus that I can try some of the other model offerings as well.

ayewo

12 days ago

1 reply

In practice, how does switching between Claude and GitHub Copilot work?

1. Do you start off using the Claude Code CLI, then when you hit limits, you switch to the GitHub Copilot CLI to finish whatever it is you are working on?

2. Or, you spend most of your time inside VSCode so the model switching happens inside an IDE?

3. Or, you are more of a strict browser-only user, like antirez :)?

lodovic

10 days ago

I always start in the Claude CLI. Once I hit the token limit, I can do two things: either use Copilot Claude to finish the job, or pick up something completely different, and let the other task wait until the token limit resets. Most importantly, I'm never blocked waiting for the cap.

throwthrowuknow

12 days ago

Good to hear that’s working. When I was using copilot before Opus 4.5 came out I found it didn’t perform as well as Claude Code but maybe it works better now with 4.5 and the latest improvements to VSCode. I’ll have to try it again.

cdelsolar

12 days ago

The word is hobbyist btw, not that you're the source for this typo, it seems to have percolated downwards from the blog post through these comments.

ewoodrich

12 days ago

It may be hyperbolic for Sonnet (more like 1 hour of heavy use) but not for Opus. Just a few hours ago I was curious and tried switching to Opus and hit the daily limit in about 15 minutes. Which was about what I was expecting considering how quickly I get rate limited on the Pro plan with moderate use on Sonnet after Anthropic's recent cuts.

bdangubic

12 days ago

the only thing that matters is whether or not you are getting your money’s worth. nothing else matters. if claude is worth $100 or $200 per month to you, it is an easy decision to pay. otherwise stick with $20 or nothing

socrateslee

12 days ago

Gemini 3 on Gemini CLI (free version) would meet quota limit for about 3-4 messages, but it will take much longer time since it responses pretty slow.

cmrdporcupine

12 days ago

1 reply

Codex $20 is a good deal but they have nothing inbetween $20 and $200.

The $20 Anthropic plan is only enough to wet my appetite, I can't finish anything.

I pay for $100 Anthropic plan, and keep a $20 Codex plan in my back pocket for getting it to do additional review and analysis overtop of what Opus cooks up.

And I have a few small $ of misc credits in DeepSeek and Kimi K2 AI services mainly to try them out, and for tasks that aren't as complicated, and for writing my own agent tools.

$20 Claude doesn't go very far.

KronisLV

12 days ago

1 reply

Idk why the gap is so big, surely a bunch of people would also pay 50$ a month across multiple vendors for medium amount of tokens.

cmrdporcupine

12 days ago

Indeed I would consider switching to Codex completely if a) they had a $100 or $50 membership b) they really worked on improving the CLI tool a lot more. It's about 4-6 months behind Claude Code

kristopolous

12 days ago

8 replies

I use local models + openrouter free ones.

My monthly spend on ai models is < $1

I'm not cheap, just ahead of the curve. With the collapse in inference cost, everything will be this eventually

Also I've put in my 30 years of tech learning so I might not need them as much as others. I'll basically do

    $ man tool | <how do I do this with the tool>

or even

    $ cat source | <find the flags and give me some documentation on how to use this>

Things I used to do intensively I now do lazily.

m4ck_

12 days ago

2 replies

Is your RAG manpages thing on github somewhere? I was thinking about doing something like that (it's high on my to-do list but I haven't actually done anything with llms yet.)

kristopolous

12 days ago

1 reply

I'll get it up soon, probably should. This little snippet will help you though:

   $ man --html="$(which markitdown)" <man page>

That goes man -> html -> markdown which is not only token efficient but also llms are pretty good at creating hierarchies from markdown

r-w

12 days ago

1 reply

I bet you could do the same thing with pandoc and skip serializing to HTML entirely.

mkesper

12 days ago

Apparently yes: https://pandoc.org/MANUAL.html#options

scottyeager

12 days ago

Not the OP, but I did release my source :D https://github.com/scottyeager/Pal

My tool can read stdin, send it to an LLM, and do a couple nice things with the reply. Not exactly RAG, but most man pages fit into the context window so it's okay.

12 days ago

2 replies

This is a completely different thing to AI coding models.

If you aren't using coding models you aren't ahead of the curve.

There are free coding models. I use them heavily. They are ok but only partial substitutes for frontier models.

kristopolous

12 days ago

You think I'm doing my own embedding databases with multiple models for different conceptual based rag queries with prompt injection and source code analysis and I haven't heard of vibe coding?

Alright.

kristopolous

12 days ago

I'm extremely familiar with them.

Some people, with some tasks, get great results

But me, with my tasks, I need to maintain provenance and accountability over the code. I can't just have AI fly by the seat of its pants.

I can get into lots of detail on this. If you have seen tools and setups I have done you'd realize why it doesn't work for me.

I've spent money, the results for me, with my tasks, have not been the right decision.

aquafox

12 days ago

3 replies

> I'll basically do

    $ man tool | <how do I do this with the tool>

or even $ cat source | <find the flags and give me some documentation on how to use this>

Could you please elaborate on this? Do I get this right that you can set up your your command line so that you can pipe something to a command that sends this something together with a question to an LLM? Or did you just mean that metaphorically? Sorry if this is a stupid question.

scottyeager

12 days ago

I'm not the OP, but I did build a tool that I use in the same way: https://github.com/scottyeager/Pal

Actually for many cases the LLM already knows enough. For more obscure cases, piping in a --help output is also sometimes enough.

mr_mitm

12 days ago

Yes, I use simonw's `llm` for that: https://github.com/simonw/llm

Example:

    $ man tar | llm "how do extract test.txt from a tar.gz"

__m

12 days ago

i guess op means: $ man tool | ai <how do I do this with the tool>

where ai could be a simple shell script combining the argument with stdin

fragmede

12 days ago

1 reply

> My monthly spend on ai models is < $1

> I'm not cheap

You're cheap. It's okay. We're all developers here. It's a safe space.

mathgeek

12 days ago

While I say this somewhat in jest, frugal is just cheap but with better value.

alfonsodev

12 days ago

I use llm from command line too, time to time, is just easier to do

llm 'output a .gitignore file for typical python project that I can pipe into the actual file ' > .gitignore

martin1975

11 days ago

this is the extent to what I use any LLM - they're really good at looking up just about anything, in natural language, and most of the time even the first hit, without reprompting, is a pretty decent answer. I used to have to sort thru things to get there, so there's definitely an upside to LLMs in this manner.

MuffinFlavored

11 days ago

> I'm not cheap, just ahead of the curve.

I'm not convinced.

I'm convinced you don't value your time. As Simon said, throw $20-$100/mo and get the best state of the art models with "near 0" setup and move on.

techwizrd

12 days ago

Have you looked at tldr/tealdeer[0]? It may do much of what you're looking for, albeit without LLM assistance.

0: https://tealdeer-rs.github.io/tealdeer/

bottlepalm

12 days ago

1 reply

When you pay $1000/month for health insurance and $2000/month for housing.. $200 for something you actually enjoy isn't so bad.

tempsaasexample

12 days ago

Would you be homeless for 3 days a month so that you could have 30 days of AI?

Not a serious question but I thought it's an interesting way of looking at value.

I used to sell cars in SF. Some people wouldn't negotiate over $50 on a $500 a month lease because their apartment was $4k anyway.

Other people WOULD negotiate over $50 because their apartment was $4k.

joshribakoff

12 days ago

2 replies

To me, it doesn’t matter how cheap open AI codex is because that tool just burns up tokens, trying to switch to the wrong version of node using NVM on my machine. It spirals in a loop and never makes progress, for me, no matter how explicitly or verbosely i prompt.

On the other hand, Claude has been nothing but productive for me.

I’m also confused why you don’t assume people have the intelligence to only upgrade when needed. Isn’t that what we’re all doing? Why would you assume people would immediately sign up for the most expensive plan that they don’t need?

c-hendricks

12 days ago

1 reply

Why is an LLM trying to switch node versions?

wredcoll

12 days ago

Because somewhere inside its little non-deterministic brain, the phrase "switch to node version xxx" was the most probable response to the previous context.

nineteen999

12 days ago

I spent about 45 mins trying to get both Claude and ChatGPT to help get Codex running on my machine (WSL2) and on a Linux NUC, they couldn't help me get it working so I gave up and went back to Claude.

Aurornis

12 days ago

3 replies

The limits for the $20/month plan can be reached in 10-20 minutes when having it explore large codebases with directed. It’s also easy to blow right through the quota if you’re not managing content well (waiting until it fills up and then auto-compacting, or even using /compact frequently instead of /clear or the equivalent in different tools).

For most of my work I only need the LLM to perform a structured search of the codebase or to refactor something faster than I can type, so the $20/month plan is fine for me.

But for someone trying to get the LLM to write code for them, I could see the $20/month plans being exhausted very quickly. My experience with trying “vibecoding” style app development, even with highly detailed design documents and even providing test case expected output, has felt like lighting tokens on fire at a phenomenal rate. If I don’t interrupt every couple of commands and point out some mistake or wrong direction it can spin seemingly for hours trying to deal with one little problem after another. This is less obvious when doing something basic like a simple React app, but becomes extremely obvious once you deviate from material that’s represented a lot in training materials.

sheepscreek

12 days ago

3 replies

Not for Codex. Not even for Gemini/Antigravity! I am truly shocked by how much mileage I can get out of them. I recently bought the $200/mo OpenAI subscription but could barely use 10% of it. Despite pretty much using it ALL the time. Now I’m using it a bit less but still way more than a typical dev and have not once hit my weekly limit.

With Gemini/Antigravity, there’s the added benefit of switching to Claude Code Opus 4.5 and Google is waaaay more generous than Claude.

So having subscribed to all three at their lowest subscriptions (for $60/mo) I get the best of each one and never run out of quota. I’ve also got a couple of open-source model subscriptions but I’ve barely had the chance to use them since Codex and Gemini got so good (and generous).

The fact that OpenAI is only spending 30% of their revenue on servers and inference despite being so generous is just mind boggling to me. I think the good times are likely going to last.

Aurornis

12 days ago

3 replies

> I recently bought the $200/mo OpenAI subscription but could barely use 10% of it

This entire comment is confusing. Why are you buying the $200/month plan if you’re only using 10% of it?

I rotate providers. My comment above applies to all of them. It really depends on the work you’re doing and the codebase. There are tasks where I can get decent results and barely make the usage bar move. There are other tasks where I’ve seen the usage bar jump over 20% for the session before I get any usable responses back. It really depends.

selcuka

12 days ago

1 reply

Not the same poster, but apparently they tried the $200/mo subscription, but after seeing they don't need it, they "subscribed to all three at their lowest subscriptions (for $60/mo)" instead.

Aurornis

12 days ago

1 reply

> but apparently they tried the $200/mo subscription, but after seeing they don't need it

This is why it’s confusing, though. Why start with the highest plan as the starting point when it’s so easy to upgrade?

1over137

12 days ago

1 reply

Because you’re rich?

sheepscreek

12 days ago

Not rich. I pay in Canadian dollars :(

I’m just a simple dude trying to optimize his life.

sheepscreek

12 days ago

1 reply

I bought it to try their Atlas agentic browser before it was open to Plus users. I convinced myself that I could use the additional capacity to multi-task and push through hard core problems without worrying about quota limits (I couldn’t, it’s way more than I could use - despite coding for 6-8 hrs a day).

For context, this was a few months ago when GPT 5 was still new and I was constantly hitting o3 limits. It was an experiment to see if it could pay for itself. It most certainly can but I realized that I just don’t need it.

wahnfrieden

12 days ago

1 reply

To use up the Pro tier plan you must close the loop so to speak - so that Codex knows how to test the quality of its output and incrementally inch toward its goals. This can be harder or easier depending on your project.

You should also queue up many "continue ur work" type messages.

sheepscreek

12 days ago

1 reply

I’m actively doing that for a fun side project - systematically rewriting SQLite in Rust. The goal is to preserve 100% compatibility, quirks and all. First I got it to run the native test harness, and now it’s basically doing TDD by itself. Have to say, with regular check-ins, it works quite well.

Note: I’m using the $20 plan for this! With codex-5.2-medium most of the time (previously codex-5.1-max-medium). For my work projects, Gemini 3 and Antigravity Claude Opus 4.5 are doing the heavy lifting at the moment, which frees up codex :) I usually have it running constantly in a second tab.

The only way I can now justify Pro is if I am developing multiple parallel projects with codex alone. But that isn’t the case for me. I am happier having a mix of agents to work with.

wahnfrieden

11 days ago

1 reply

I use 3-6 Codex agents in parallel within the same project

sheepscreek

11 days ago

1 reply

That is a good use-case as well and would definitely require a codex Pro subscription.

I've been doing something like this with the basic Gemini subscription using Antigravity. I end up hitting the Gemini 3 Pro High quota many times but then I can still use Claude Opus 4.5 on it!

wahnfrieden

11 days ago

I like Pro also for better access to 5.2 Pro which is indispensable for some problems and for producing specs/code samples. I use https://gitingest.com

sheepscreek

11 days ago

> I rotate providers. My comment above applies to all of them. It really depends on the work you’re doing and the codebase. There are tasks where I can get decent results and barely make the usage bar move. There are other tasks where I’ve seen the usage bar jump over 20% for the session before I get any usable responses back. It really depends.

Ah, I missed this part. Yes, this is basically what I would recommend today as well. Buy a couple of different frontier model provider basic subscriptions. See which works better on what problems. For me, I use them all. For someone else it might be codex alone. Ymmv but totally worth exploring!

12 days ago

1 reply

I do the same and agree this works well.

It's worth noting that the Claude subscription seems notably less than the others.

Also there are good free options for code review.

sellmesoap

11 days ago

My first try at LLM coding was with Claude, got back confusing results for a hello world++ type test and ran out of credits in a couple of hours, asked for a refund all the same day. I'm slowly teaching myself prompt engineering on qwen3-coder, it goes in circles much like claude was, but at least it's doing that at the cost of electricity at the wall, I already had a GPU.

jjromeo

12 days ago

Can confirm this is the way right now

JamesSwift

12 days ago

That has not been my experience with sonnet, and even so it is largely remedied by having better AI docs caching the results of that investigation for future use.

stuaxo

12 days ago

You'd think local models could explore a codename and build up a knowledge graph of it they could use to query it.

It could take longer, but save your subscription tokens.

someguyiguess

11 days ago

Maybe for very light work. But on the $20 subscription level I’d hit access limits every 3-4 hours.

asciii

12 days ago

leo dicaprio snapping gif

These kinds of articles should focus on use case because mileage may vary depending on maturity of idea, testing and host of other factors.

If the app, service, or whatever is unproven, that's a sunk cost on macbook vs. 4 weeks to validate an idea which is a pretty long time.

If the idea is sound then run it on macbook :)

shepherdjerred

12 days ago

I pay $200/mo just for Claude Code. I used Cursor for a while and used something like $600 in credits in Nov.

bonsai_spool

12 days ago

I also pay for the $100 as a research in biology dealing with a fair amount of data analysis in addition to bench work.

Incidentally, wondering if anyone has seen this approach of asking Claude to manage Codex:

https://www.reddit.com/r/codex/comments/1pbqt0v/using_codex_...

ncruces

12 days ago

What I find perplexing is the very respectful people that pay those subscriptions to produce clearly sub-par work I'm sure they wouldn't have done themselves.

And when pressed on “this doesn't make sense, are you sure this works?” they ask the model to answer, it gets it wrong, and they leave it at that.

RickyLahey

12 days ago

depending on your usecase $200/mo is often not much for a coding tool if you're using it for commercial purposes

in my experience cursor is nicer to work with the openai/anthropic cli tools

A4ET8a8uTh0_v2

12 days ago

Anecdata, buddy is paying claude for his personal stuff. But he is more brave about testing things in production as it were:D

didip

12 days ago

When you look at how capable Claude is, vs the salary of even a fresh graduate, combined with how expensive your time is… Even the maximum plan is a super good deal.

wahnfrieden

12 days ago

I regularly hit my limits on the $200/mo Codex plan (using medium reasoning). (I am using everything for production - these aren't toy ideas.)

SkyPuncher

12 days ago

Time is my limiting factor, especially on personal projects. To me, this makes any multiplying effect valuable.

When I consider it against my other hobbies, $100 is pretty reasonable for a month of supply. That being said, I wouldn’t do it every month. Just the months I need it.

Aeolun

12 days ago

> Are people really doing that?

Sure am. Capacity to finish personal projects has tripled for a mere $200/month. Would purchase again.

jwpapi

12 days ago

Not everybody is broke.

CSMastermind

12 days ago

If you're a hobbyist doing a side project, I'd start with Google and use anti-gravity, then only move to OpenAI when the project gets too complex for Gemini to handle things.

stronglikedan

12 days ago

> The OpenAI one in particular is a great deal, because Codex is charged a whole lot lower than Claude.

From what my team tells me, it's not a great deal since it's so far behind Claude in capabilities and IDE integration.

strangescript

12 days ago

this, provided you don't mind hopping around a lot, 5 20 dollar a month accounts will get you way more tokens typically, also good free models will show up from time to time on openrouter

simonw

12 days ago

1 reply

This story talks about MLX and Ollama but doesn't mention LM Studio - https://lmstudio.ai/

LM Studio can run both MLX and GGUF models but does so from an Ollama style (but more full-featured) macOS GUI. They also have a very actively maintained model catalog at https://lmstudio.ai/models

ZeroCool2u

12 days ago

1 reply

LMStudio is so much better than Ollama it's silly it's not more popular.

thehamkercat

12 days ago

2 replies

LMStudio is not open source though, ollama is

but people should use llama.cpp instead

behnamoh

12 days ago

1 reply

> LMStudio is not open source though, ollama is

and why should that affect usage? it's not like ollama users fork the repo before installing it.

thehamkercat

12 days ago

It was worth mentioning.

smcleod

12 days ago

2 replies

I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.

calgoo

12 days ago

Only issue I have found with llama.cpp is trying to get it working with my amd GPU. Ollama almost works out of the box, in docker and directly on my Linux box.

parthsareen

12 days ago

Desktop app is open-source now.

190 more comments available on Hacker News

View full discussion on Hacker News

ID: 46348329Type: storyLast synced: 12/24/2025, 8:50:23 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN