Opus 4.5 Is Not the Normal AI Agent Experience That I Have Had Thus Far

3d ago

Is that new though? Software has been hype and marketing driven forever.

ChrisbyMe

3d ago

3 replies

Mm this is my experience as well, but I'm not particularly worried about software engineering a whole.

If anything this example shows that these cli tools give regular devs much higher leverage.

There's a lot of software labor that is like, go to the lowest cost country, hire some mediocre people there and then hire some US guy to manage them.

That's the biggest target of this stuff, because now that US guy can just get equal or hight code in both quality and output without the coordination cost.

But unless we get to the point where you can do what I call "hypercode" I don't think we'll see SWEs as a whole category die.

Just like we don't understand assembly but still need technical skills when things go wrong, there's always value in low level technical skills.

adriand

3d ago

4 replies

> If anything this example shows that these cli tools give regular devs much higher leverage.

This is also my take. When the printing press came out, I bet there were scribes who thought, "holy shit, there goes my job!" But I bet there were other scribes who thought, "holy shit, I don't have to do this by hand any more?!"

It's one thing when something like weaving or farming gets automated. We have a finite need for clothes and food. Our desire for software is essentially infinite, or at least, it's not clear we have anywhere close to enough of it. The constraint has always been time and budget. Those constraints are loosening now. And you can't tell me that when I am able to wield a tool that makes me 10X more productive that that somehow diminishes my value.

edg5000

2d ago

The mechanization and scaling up of farming caused a tectonic shift from rural residents moving to cities to take on factory jobs as well as office and retail jobs. We saw this in China until very recently, since they had a bit of a slow start causing delayed full-scale industrialisation.

So a lot of people will end up doing something different. Some of it will be menial and be shit, and some of it will be high level. New hierarchies and industries will form. Hard to predict the details, but history gives us good parallels.

falkensmaize

2d ago

What diminishes your value is that suddenly everybody can (in theory anyway) do this work. There’s a push at my company to start letting designers do their own llm-assisted merge requests to front end projects. So now CEOs are greedily rubbing their hands together thinking maybe everybody but the plumber can be a “developer” now. I think it remains to be seen whether that’s true, but in the meantime it’s going to make getting and keeping a well-paying developer gig difficult.

names_are_hard

2d ago

> When the printing press came out, I bet there were scribes who thought, "holy shit, there goes my job!" But I bet there were other scribes who thought, "holy shit, I don't have to do this by hand any more?!"

I don't understand this argument. Surely the skill set involved in being a scribe isn't the same as being a printer, and possibly the the personality that makes a good scribe doesn't translate to being a good printer.

So I imagine many of the scribes lost their income, and other people made money on printing. Good for the folks who make it in the new profession, sucks for those who got shafted. How many scribes transitioned successfully to printers?

Genuinely asking, I don't know.

fragmede

3d ago

There was a previous edit that made reference to the water usage of AI datacenter that I'm responding to.

If AI datacenters' hungry need for energy gets us to nuclear power, which gets us the energy to run desalination plants as the lakes dry up because the Earth is warming, hopefully we won't die of thirst.

elzbardico

2d ago

In my experience, unless the US guy came from Stanford or some other similar place, there are plenty of mediocre US guys in software development.

techblueberry

2d ago

The question I've been wondering is..

I think for a while people have been talking about the fact that as all development tools have gotten better - the idea that a developer is a person who turns requirements into code is dead. You have to be able to operate at a higher level, be able to do some level of work to also develop requirements, work to figure out how to make two pieces of software work together, etc.

But the point is Obviously at an extreme end 1 CTO can't run google and probably not say 1 PM or Engineer per product, but what is the mental load people can now take on. Google may start hiring less engineers (or maybe what happens is it becomes more cuthroat, hire the same number of engineers but keep them much more shortly, brutal up or out.

But essentially we're talking about complexity and mental load - And so maybe it's essentially the same number of teams because teams exist because they're the right size, but teams are a lot smaller.

s-macke

3d ago

5 replies

Opus 4.5 has become really capable.

Not in terms of knowledge. That was already phenomenal. But in its ability to act independently: to make decisions, collaborate with me to solve problems, ask follow-up questions, write plans and actually execute them.

You have to experience it yourself on your own real problems and over the course of days or weeks.

Every coding problem I was able to define clearly enough within the limits of the context window, the chatbot could solve and these weren’t easy. It wasn’t just about writing and testing code. It also involved reverse engineering and cracking encoding-related problems. The most impressive part was how actively it worked on problems in a tight feedback loop.

In the traditional sense, I haven’t really coded privately at all in recent weeks. Instead, I’ve been guiding and directing, having it write specifications, and then refining and improving them.

Curious how this will perform in complex, large production environments.

lelanthran

3d ago

6 replies

> You have to experience it yourself on your own real problems and over the course of days or weeks.

How do you stop it from over-engineering everything?

petcat

3d ago

2 replies

This has always been my problem whether it's Gemini, openai or Claude. Unless you hand-hold it to an extreme degree, it is going to build a mountain next to a molehill.

It may end up working, but the thing is going to convolute apis and abstractions and mix patterns basically everywhere

3d ago

1 reply

Not in my experience - you need to build the fact that you don’t want it to do that into your design and specification.

petcat

3d ago

2 replies

Sure, I can tell it not to do that, but it doesn't know what that is. It's a je ne sais quoi.

I can't teach it taste.

dflock

3d ago

Recent Claude will just look at your code and copy what you've been doing, mostly, in an existing codebase - without being asked. In a new codebase, you can just ask it to "be conscice, keep it simple" or something.

[0] https://github.com/s-macke/coding-agent-benchmark

4h ago

The trick isn’t to tell it what not to do, it’s to tell it what to do. And give it examples and requirements.

spaceman_2020

3d ago

It's very good at following instructions. You can build dedicated agents for different tasks (backend, API design, database design) and make it follow design and coding patterns.

It's verbose by default but a few hours of custom instructions and you can make it code just like anyone

ryanchants

2d ago

1 reply

I have it propose several approaches, pick and choose from each, and remove what I don't want done. "Use the general structure of A, but use the validation structure of D. Using a view translation layer is too much, just rely on FastAPI/SQLModel's implicit view conversion."

dbbk

2d ago

The Plan mode already does this, it makes multiple plans and then synthesises them

s-macke

3d ago

Difficult and it really depends on the complexity. I definitely work in a spec-driven way, with a step-by-step implementation phase. If it goes the wrong way I prefer to rewrite the spec and throw away the code.

myvoiceismypass

3d ago

I personally try to narrow scope as much as possible to prevent this. If a human hands me a PR that is not digestible size-wise and content-wise (to me), I am not reviewing and merging it. Same thing with what claude generates with my guidance.

verdverm

2d ago

Instructions, in the system prompt for not doing that

Once more people realize how easy it is to customize and personalized your agent, I hope they will move beyond what cookie cutter Big AI like Anthropic and Google give you.

I suspect most won't though because (1) it means you have to write human language, communication, and this weird form of persuasion, (2) ai is gonna make a bunch of them lazy and big AI sold them on magic solutions that require no effort on your part (not true, there is a lot of customizing and it has huge dividends)

bdangubic

2d ago

“Everything Should Be Made as Simple as Possible, But Not Simpler” should be the ending of every prompt :)

jghn

3d ago

2 replies

I find my sweet spot is using the Claude web app as a rubber duck as well as feeding it snippets of code and letting it help me refine the specific thing I'm doing.

When I use Claude Code I find that it *can* add a tremendous amount of ability due to its ability to see my entire codebase at once, but the issue is that if I'm doing something where seeing my entire codebase would help that it blasts through my quota too fast. And if I'm tightly scoping it, it's just as easy & faster for me to use the website.

Because of this I've shifted back to the website. I find that I get more done faster that way.

fragmede

3d ago

1 reply

By "the website" do you mean you're copy pasting, or are you using the code system where Anthropic clones your code from GitHub and interacts with it in a VM/container for you.

jghn

3d ago

1 reply

Just pasting code snippets, and occasionally an entire file or two into the main claude.com site. I usually already know what I want and need, but just want to speed up the process on how to get there, and perhaps I missed something in the process.

zmmmmm

3d ago

Aider is pretty good way to automate that. You can use it with Claude models. It lets you be completely precise down to a single file, and sit in chat/code/review loop - but it does a lot of the chores, like generating commit messages etc while saving you the copy paste effort.

pigpop

1d ago

I've had similar experiences but I've been able to start using Claude Code for larger projects by doing some refactoring with the goal of making the codebase understandable by just looking at the interfaces. This along with instructions to prefer looking at the interface for a module unless working directly on the implementation of the module seems to allow further progress to be made within session limits.

jesse_dot_id

2d ago

This has also been my experience.

s-macke

3d ago

Just some examples I’ve already made public. More complex ones are in the pipeline. With [0], I’m trying to benchmark different coding-agents. With [1], I successfully reverse-engineered an old C64 game using Opus 4.5 only.

Yes, feel free to blame me for the fact that these aren’t very business-realistic.

[1] https://github.com/s-macke/weltendaemmerung

giancarlostoro

3d ago

> In the traditional sense, I haven’t really coded privately at all in recent weeks. Instead, I’ve been guiding and directing, having it write specifications, and then refining and improving them.

This is basically all my side projects.

waynenilsen

3d ago

2 replies

Once you get your setup bulletproof such that you can have multiple agents running at the same time that can run unit tests and close their own loops things get even faster. However you accomplish that. Not as easy as it sounds mostly (and absurdly) due to port collision. E2E testing with playwright is another leap.

3d ago

Just let it test in different containers? That’s not the hard part IMO.

koiueo

3d ago

Can't you, like, ask Claude to fix port collision for you? Duh

3d ago

6 replies

lol I can't believe we're doing this again. None of this is innovation. None of this is new. These are all things that already exist. I understand it's impressive that Opus could go through the tedious process on its own, especially considering other LLMs failed. However, none of this is going to improve people's lives. It will simply add more and more and more and more and more slop apps to an already tetra-slopified universe of apps. Do people not see how useless this is? Re-building things that most probably already exist, simply with your own little special flavour? Where are we going...

empiko

3d ago

1 reply

This is a natural response to software enshittification. You can hardly find an iOS app that is not plagued by ads, subscriptions, or hostile data collection. Now you can have your own small utilities that can work for you. This sort of personal software might be very valuable in the world where you are expected to pay 5$ to click any button.

3d ago

2 replies

Yeah sure but have you considered that the actual cost of running these models is actually much greater than whatever cost you might be shelling out for the ad-free apps? You're talking to someone who hates the slopification and enshittification of everything, so you don't need to convince me about that. However, everything I've seen described in the replies to my initial comment - while cute, and potentially helpful on a case-by-case basis, does NOT warrant the amount of resources we are pouring into AI right now. Not even fucking close. It'll all come crashing down, taxpayers the world over will be caught with the bag in their hands, and for what? So that we can all have a less robust version of an app that already exists but that has the colours we want and the button where we want it?

If AI cost nothing and wasn't absolutely decimating our economy, I'd find what you've shared cute. However, we are putting literally all of our eggs, and the next generation's eggs, and the one after that, AND the one after that, into this one thing, which, I'm sorry, is so far away from everything that keeps on being promised to us that I can't help but feel extremely depressed.

3d ago

1 reply

You are attempting to move the goalposts. There are two different points in this debate:

1) Modern LLMs are an inflection point for coding.

2) The current LLM ecosystem is unsustainable.

This submission discussion is only about #1, which #2 does not invalidate. Even if the ecosystem crashes, then open-source LLMs that leverage the same tricks Opus 4.5 does will just be used instead.

3d ago

3 replies

But it's only an inflection point if it's sustainable. When this comes crashing down, how many people are going to be buying $70k GPUs to run an open source model?

3d ago

2 replies

Checked your history. From a fellow skeptic, I know how hard it is to reason with people around here. You and I need to learn to let it go. In the end, the people at the top have set this up so that either way, they win. And we're down here telling the people at our level to stop feeding the monster, but told to fuck off anyways.

So cool bro, you managed to ship a useless (except for your specific use-case) app to your iphone in an hour :O

What I think this is doing is it's pitting people against the fact that most jobs in the modern economy (mine included btw) are devoid of purpose. This is something that, as a person on the far left, I've understood for a long time. However, a lot (and I mean a loooooot) of people have never even considered this. So when they find that an AI agent is able to do THEIR job for them in a fraction of the time, they MUST understand it as the AI being some finality to human ingenuity and progress given the self-importance they've attributed to themselves and their occupation - all this instead of realizing that, you know, all of our jobs are useless, we all do the exact same useless shit which is extremely easy to replicate quickly (except for a select few occupations) and that's it.

I'm sorry to tell anyone who's reading this with a differing opinion, but if AI agents have proven revolutionary to your job, you produced nothing of actual value for the world before their advent, and still don't. I say this, again, as someone who beyond their PhD thesis (and even then) does not produce anything of value to the world, while being paid handsomely for it.

3d ago

1 reply

> You and I need to learn to let it go.

Definitely, it’s an unhealthy fixation.

> I'm sorry to tell anyone who's reading this with a differing opinion, but if AI agents have proven revolutionary to your job, you produced nothing of actual value for the world before their advent, and still don't.

I agree with this, but I think my take on it is a lot less nihilistic than yours. I think people vastly undersell how much effort they put into doing something, even if that something is vibecoding a slop app that probably exists. But if people are literally prompting claude with a few sentences and getting revolutionary results, then yes, their job was meaningless and they should find something to do that they’re better at.

But what frustrates me the most about this whole hype wave isn’t just that the powers that be have bet the entire economy on a fake technology, it’s that it’s sucking all of of the air out of the room. I think most people’s jobs can actually provide value and there’s so much work to be done to make _real_ progress. But instead of actually improving the world, all the time, money, and energy is being thrown into such a wasteful technology that is actively making the world a worse place. I’m sure always been like this and I was just to naive to see it, but I much preferred it when at least the tech companies pretended they cared about the impact their products had on society rather than simply trying to extract the most value out of the same 5 ideas.

3d ago

3 replies

Yeah, I do tend to have a rather nihilistic view on things, so apologies.

I really think we're just cooked at this point. The amount of people (some great friends whom I respect) that have told me in casual conversation that if their LLM were taken from them tomorrow, they wouldn't know how to do their work (or some flavour of that statement) has made me realize how deep the problem is.

We could go on and on about this, but let's both agree to try and look inward more and attempt to keep our own things in order, while most other people get hooked on the absolute slop machine that is AI. Eventually, the LLM providers will need to start ramping up the costs of their subscriptions and maybe then will people start clicking that the shitty code that was generated for their pointless/useless app is not worth the actual cost of inference (which some conservative estimates put out to thousands of dollars per month on a subscription basis). For now, people are just putting their heads in the sand and assuming that physicists will somehow find a way to use quantum computers to speed up inference by a factor of 10^20 in the next years, while simultaneously slashing its costs (lol).

But hey, Opus 4.5 can cook up a functional app that goes into your emails and retrieves all outstanding orders - revolutionary. Definitely worth the many kWh and thousands of liters of water required, eh?

Cheers.

simonw

3d ago

1 reply

> For now, people are just putting their heads in the sand and assuming that physicists will somehow find a way to use quantum computers to speed up inference by a factor of 10^20 in the next years, while simultaneously slashing its costs (lol).

GPT-3 Da Vinci cost $20/million tokens for both input and output.

GPT-5.2 is $1.75/million for input and $14/million for output

I'd call that pretty strong evidence that they've been able to dramatically increase quality while slashing costs, over just the past ~4 years.

tuesdaynight

2d ago

Isn't that kind of related with the amount of money thrown at the field? If the economy gets worse for any reason, do you think that we can still expect these level of cutting costs in the future?

keeda

3d ago

1 reply

A couple of important points you should consider:

1. The AI water issue is fake: https://andymasley.substack.com/p/the-ai-water-issue-is-fake (This one goes into OCD-levels of detail with receipts to debunk that entire issue in all aspects.)

2. LLMs are far, far more efficient than humans in terms of resource consumption for a given task: https://www.nature.com/articles/s41598-024-76682-6 and https://cacm.acm.org/blogcacm/the-energy-footprint-of-humans...

The studies focus on a single representative task, but in a thread about coding entire apps in hours as opposed to weeks, you can imagine the multiples involved in terms of resource conservation.

The upshot is, generating and deploying a working app that automates a bespoke, boring email workflow will be way, way, wayyyyy more efficient than the human manually doing that workflow everytime.

Hope this makes you feel better!

D-Machine

2d ago

> 2. LLMs are far, far more efficient than humans in terms of resource consumption for a given task: https://www.nature.com/articles/s41598-024-76682-6 and https://cacm.acm.org/blogcacm/the-energy-footprint-of-humans...

I want to push back on this argument, as it seems suspect given that none of these tools are creating profit, and so require funds / resources that are essentially coming from the combined economic efforts of entire countries. I.e. the energy externalities here are monstrous and never factored into these things, even though these models could never have gotten off the ground if not for the massive energy expenditures that were (and continue to be) needed to sustain the funding for these things.

To simplify, LLMs haven't clearly created the value they have promised, but have eaten up massive amounts of value produced by anyone else. But that value produced by everyone else had energy costs too. Whether or not all this AI stuff ends up being more energy efficient than people needs to be measured on whether it actually delivers on its promises.

3d ago

> But hey, Opus 4.5 can cook up a functional app that goes into your emails and retrieves all outstanding orders - revolutionary. Definitely worth the many kWh and thousands of liters of water required, eh?

The thing is in a vacuum this stuff is actually kinda cool. But hundreds of billions in debt-financed capex that will never seen a return, and this is the best we’ve got? Absolutely cooked indeed.

christophilus

2d ago

> if AI agents have proven revolutionary to your job, you produced nothing of actual value for the world before their advent, and still don't.

This doesn’t logically follow. AI agents produce loads of value. Cotton picking was and still is useful. The cotton gin didn’t replace useless work. It replaced useful work. Same with agents.

simonw

3d ago

1 reply

> When this comes crashing down, how many people are going to be buying $70k GPUs to run an open source model?

If the AI thing does indeed come crashing down I expect there will be a whole lot of second-hand GPUs going for pennies on the dollar.

2d ago

1 reply

Ok, and then? Taking a one time discount on a rapidly depreciating asset doesn’t magically make this whole industry profitable, and it’s not like you’re going to start running a GB200 in your basement.

simonw

2d ago

Then I'll wait for a bunch of companies to spring up running those cheap GPUs in their data centers and selling me access to GLM-4.7 and friends.

Or I'll start one myself, if the market fails to provide!

https://news.ycombinator.com/newsguidelines.html

3d ago

I said open-source models, not locally-hosted models. Essentially, more power to inference providers such as Groq and Together AI who will be less affected by a crash as long as the demand for coding agents is there.

falloutx

2d ago

At this point it doesn't matter that much whether we use AI or not, the apps are not selling and they are being produced at an alarming rate.

The projects being submitted to product hunt is 4x the year before.

The market is shrinking rapidly because now more people make their own apps.

Even making a typo and landing on a website, there is good chance its selling more ai snake oil, yet none of these apps are feature complete and easily beaten by apps made by guys in 2010s. (tldr & sketchbook for the drawing space).

Only way to excite the investors is to fake the ARR by giving free trials and sell before the recurring event occurs.

jgbuddy

3d ago

1 reply

You are mad

tomhow

1d ago

When disagreeing, please reply to the argument instead of calling names.

https://news.ycombinator.com/newsguidelines.html

3d ago

1 reply

> Re-building things that most probably already exist, simply with your own little special flavour?

That describes half of the current unicorn startups nowadays.

christophilus

2d ago

More than half. What has anyone written that was truly new? Regardless, if you have an idea, you will build it out of some combination of conditionals, loops, and expressions… turns out agents are pretty good at those things, even when the idea you’re expressing is novel.

tomhow

1d ago

Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.

codepoet80

3d ago

I don't think you've used it. I used it intensely and mostly autonomously (with clear instructions, including how to measure good output) almost non-stop over the holidays. Its a new abstraction for programming -- it doesn't replace software developers, it gives them a more natural way to describe what they want.

Johanx64

3d ago

> none of this is going to improve people's lives.

I have some old borderline senile relatives writting apps (asking LLMs to write for it them) for their own personal use. Stuff they surely haven't done on their own (or had the energy to do). Their extent of programming background - shitty VBScript macros for excel.

It also helps people to pick up programming and helps with the initial push of getting started. Getting over the initial hump, getting something on the screen so to speak.

Most things people want from their computers are simple shit that LLMs usually manage quite well.

Good question whether or not this (outsourcing their thinking) actually just accelerates their senility or not.

As someone who likes to solve hard or interesting technical problems, I've long before LLMs often been disappointed that most of the time what people want from programmers is simple stupid shit (ie. stuff i dont find interesting to work on).

Papazsazsa

3d ago

2 replies

"Opus 4.5 feels to me like"

The article is fine opinion but at what point are we going to either:

a) establish benchmarks that make sense and are reliable, or

b) stop with the hypecycle stuff?

NewsaHackO

3d ago

1 reply

>establish benchmarks that make sense and are reliable

How aren't current LLM coding benchmarks reliable?

Papazsazsa

3d ago

1 reply

They're manipulated.

NewsaHackO

3d ago

Unless you are going to be more specific, that criticism applies to all benchmarks that are connected to a positive gain, not just AI coding benchmarks.

cardine

3d ago

> make sense and are reliable

If you can figure out how to create benchmarks that make sense, are reliable, correlate strongly to business goals, and don't get immediately saturated or contorted once known, you are well on your way to becoming a billionaire.

delduca

3d ago

2 replies

I agree, it wrote an entire NES emulator for me.

https://news.ycombinator.com/item?id=46443767

lawlessone

3d ago

2 replies

It cloned one of the many open source ones available is what you mean.

3d ago

1 reply

To be fair that’s what I’d have done had I had to build it. Use a lot of examples etc and build on what other people have done

koiueo

3d ago

4 replies

I assume, the purpose would be to learn how it's done. There's no place for this when you vibecode. And if not learning, what's the point of implementing something that already exists?

When I'm dying of dehydration because humanity has depleted all fresh water deposits, I'll think of you and your stupid NES emulator which is just an LLM-produced copy of many ones that had already existed.

3d ago

1 reply

The majority of open source software development is "implementing something that already exists", but with improvements, such as for specific use cases and constraints (like the original NES emulator) or by making it more performant. That's how the ecosystem mutates and grows, and it's worked well for decades.

lawlessone

3d ago

>The majority of open source software development is "implementing something that already exists"

I don't think open office/libre office etc have access to the source code for MS office and if they did MS would be on them like a rash.

https://andymasley.substack.com/p/the-ai-water-issue-is-fake

4h ago

You know, when you descend into that level of hyperbole, especially when it’s targeted at another person it really doesn’t help your case.

Hammershaft

2d ago

I'm not here to hype LLMs but they don't used an outsized share of fresh water, that's essentially a myth hyped by social media.

delduca

3d ago

Blame the game, not the player.

3d ago

1 reply

As long as you give it deterministic goals / test criteria (compiles, lints, tests, E2E tests, achieve 100% parity with existing solution etc) it will brute force its way to a solution. Codex will work for hours/days, even weeks sometimes, until it has finished. A person would never work this way, but since this just runs in the background, there’s no issue with this approach except if you need it fast.

xyzzy_plugh

3d ago

2 replies

No, it might figure out the solution but even after many days there's no assurance that it won't get stuck making the same mistakes over and over again, never getting closer to a solution. I've seen this many times.

3d ago

1 reply

Definitely have not seen that with Opus 4.5.

3d ago

Neither have I, personally, but I’ve seen reports this can happen on very hard problems, where the goal just cannot be reached from a local optimum. Getting unstuck by trying something new is something a watchdog agent could prompt it.

3d ago

Getting in a loop does still happen, yes. If you run codex in tmux and let another agent just occasionally check on progress, it can be prevented. That’s not even expensive - checking every 30 minutes suffices. The watchdog agent can then press Esc in tmux and send a message, maybe do some research to get it unstuck etc

falloutx

2d ago

Now ask it to create a NES game

lawlessone

3d ago

1 reply

Blogspam.

catoAppreciator

3d ago

blogslop

Herring

3d ago

1 reply

Me and Opus have a lot in common. We both hit our weekly limit on Monday at 10am.

michaelsalim

3d ago

2 replies

I use pay as you go for this very reason, so the limit is my pocket haha. It does make me conscious to keep it under $20 per month though.

square_usual

3d ago

1 reply

You're overpaying by a factor of 4, easily. I use `ccusage`'s statusline in claude code, and even with my personal $20/mo subscription I don't think there's been a single month where I didn't touch ~$80 of usage. I wasn't even abusing it as bad as some people tend to.

port3000

2d ago

1 reply

How do you manage that? /ccusage and --ccusage no longer work for me, I can only see the usage bars in /usage

square_usual

2d ago

1 reply

I followed this: https://ccusage.com/guide/statusline

port3000

1d ago

Ah thanks, didn't realise it was a 3rd party library, thought it was a claude native command

theshrike79

2d ago

You can use both btw. Get the $20 plan and turn on "extra usage" in billing. Then you can use the basic plan first and if it runs out, it uses token-based billing for the overflow.

kachapopopow

3d ago

2 replies

It's also the feeling I have, opus is not a ground-breaking model by any means.

However, Opus 4.5 is incredible when you give it everything it needs, a direction, what you have versus what you want and it will make it work, really, it will work. The code might me ugly, undesirable, would only work for that one condition, but with futher prompting you can evolve it and produce something that you can be proud of.

Opus is only as good as the user and the tools the user gives to it. Hmm, that's starting to sound kind-of... human...

edg5000

2d ago

1 reply

Opus can produce beatiful code. It can outcode a good programmer. But getting it to do this reliably is something I've gotten better at over the last year; it's a skill that took quite a bit of practice.

I now write very long specifications and this helps. I haven't figured out a bulletproof workflow, I think that will take years. But I often get just amazing code out of it.

kachapopopow

2d ago

there is a big difference between a good programmer and a programmer that gives a shit so I disagree, opus can not come close to the code quality that someone can create and at that point it is the person behind the wheel that is causing the good quality to manifest rather than the AI randomly stumbling upon it.

[0] https://gricha.dev/blog/the-highest-quality-codebase

3d ago

Off/nearshoring regularly produces worse code. I’ve seen it first hand.

on_the_train

3d ago

1 reply

Oh another run of new small apps. Why not unleash this oh so powerful tools not on a jira ticket written two years ago, targeting 3 different repos in an old legacy moloch, like actual work?

It's always just the "Fibonacci" equivalent

asmor

3d ago

Did some of that today. Extracting logic from Helm templates that read like 2000s PHP and moving it to a nushell script rendering values. Took a lot of guidance both in terms of making it test its own code and architectural/style decisions and I also use Sonnet, but it got there.

honeycrispy

3d ago

7 replies

A couple weeks ago I had Opus 4.5 go over my project and improve anything it could find. It "worked" but the architecture decisions it made were baffling, and had many, many bugs. I had to rewrite half of the code. I'm not an AI hater, I love AI for tests, finding bugs, and small chores. Opus is great for specific, targeted tasks. But don't ask it to do any general architecture, because you'll be soon to regret it.

thousand_nights

3d ago

2 replies

these models work best when you know what you want to achieve and it helps you get there while you guide it. "Improve anything you can find" sounds like you didn't really know

suzzer99

2d ago

1 reply

"Improve anything you can find" is like going to your mechanic and saying "I'm going on a long road trip, can you tell me anything that needs to be fixed?"

They're going to find a lot of stuff to fix.

blub

2d ago

Doing a vehicle check-up is a pretty normal thing to do, although in my case the mandatory (EU law) periodic ones are happening often enough that I generally don’t have to schedule something out of turn.

The few times I did go to a shop and ask for a check-up they didn’t find anything. Just an anecdote.

mcv

3d ago

As a tool to help developers I think it's really useful. It's great at stuff people are bad at, and bad at stuff people are good at. Use it as a tool, not a replacement.

tda

3d ago

1 reply

Instead you should prompt it to come up with suggestions, look for inconsistencies etc. Then you get a list, and you pick the ones you find promising. Then you ask Claude to explain what why and how of the idea. And only then you let it implement something.

hollowturtle

2d ago

And waste a lot of time reviewing and baby sitting

sothatsit

2d ago

1 reply

I like these examples that predictably show the weaknesses of current models.

This reminds me of that example where someone asked an agent to improve a codebase in a loop overnight and they woke up to 100,000 lines of garbage [0]. Similarly you see people doing side-by-side of their implementation and what an AI did, which can also quite effectively show how AI can make quite poor architecture decisions.

This is why I think the “plan modes” and spec driven development are so important effective for agents, because it helps to avoid one of their main weaknesses.

pugworthy

2d ago

2 replies

To me, this doesn't show the weakness of current models, it shows the variability of prompts and the influence on responses. Because without the prompt it's hard to tell what influenced the outcome.

I had this long discussion today with a co-worker about the merits of detailed queries with lots of guidance .md documents, vs just asking fairly open ended questions. Spelling out in great detail what you want, vs just generally describing what you want the outcomes to be in general then working from there.

His approach was to write a lot of agent files spelling out all kinds of things like code formatting style, well defined personas, etc. And here's me asking vague questions like, "I'm thinking of splitting off parts of this code base into a separate service, what do you think in general? Are there parts that might benefit from this?"

OccamsMirror

2d ago

1 reply

So which approach worked better?

pugworthy

2d ago

Challenging to answer, because we're at different levels of programming. I'm Senior / Architect type with many years of experience programming, and he's an ME using code to help him with data processing and analysis.

I have a hunch if you asked which approach we took based on background, you'd think I was the one using the detailed prompt approach and him the vague.

sothatsit

2d ago

It is definitely a weakness of current models. The fact that people find ways around those weaknesses does not mean the weaknesses do not exist.

Your approach is also very similar to spec driven development. Your spec is just a conversation instead of a planning document. Both approaches get ideas from your brain into the context window.

enraged_camel

2d ago

1 reply

>> A couple weeks ago I had Opus 4.5 go over my project and improve anything it could find. It "worked" but the architecture decisions it made were baffling, and had many, many bugs.

So you gave it an poorly defined task, and it failed?

NewsaHackO

2d ago

Exactly, imagine if someone gave you a 100k LOC project and said improve anything you can.

oncallthrow

3d ago

In my experience these models (including opus) aren’t very good at “improving” existing code. I’m not exactly sure why, because the code they produce themselves is generally excellent.

rleigh

2d ago

I've found it to be terrible when you allow it to be creative. Constrain it, and it does much better.

Have you tried the planning mode? Ask it to review the codebase and identify defects, but don't let it make any changes until you've discussed each one or each category and planned out what to do to correct them. I've had it refactor code perfectly, but only when given examples of exactly what you want it to do, or given clear direction on what to do (or not to do).

vbezhenar

3d ago

I'm using AI tools to find issues in my code. 9/10 of their suggestions are utter nonsense and fixing them would make my code worse. That said, there are real issues they're finding, so it's worth it.

I wouldn't be surprised to find out that they will find issues infinitely, if looped with fixes.

3d ago

1 reply

IMO codex produces working code slowly, while Opus produces superficially working code quickly. I like using Opus to drive codex sessions and checking its output. Clawdbot is really good at that but a long running Claude Code session with codex as sub agents should work well also.

The above is for vibe coding; for taking the wheel, I can only use Opus because I suck at prompting codex (it needs very specific instructions), and codex is also way too slow for pair programming.

NitpickLawyer

3d ago

1 reply

> I like using Opus to drive codex sessions and checking its output.

Why not the other way around? Have the quick brown fox churn out code, and have codex review it, guide changes, and loop?

I've actually gone one step further down the delegation. I use opus/gemini3 for plan, review, edit plan for a few steps. Then write it out to .md files. Then have GLM implement it (I got a cheap plan for like 28$ for a year on Christmas). Then have the code this produced reviewed and fixed if needed by opus. Final review by codex (for some reason it's very good at review, esp if you have solid checkboxes for it to check during review). Seems to work so far.

3d ago

I agree, codex is great at reviewing as well. I think that’s because code is the ideal description of what we want to achieve, and codex is good (only) when it knows what must be achieved, as verbosely as possible.

Currently I don’t let GLM or Opus near my codebases unsupervised because I’m convinced that the better the foundation, the better the end result will be. Is the first draft not pretty crappy with GLM?

OldGreenYodaGPT

3d ago

2 replies

Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.

Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.

For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.

We also had Claude Code create a bunch of ESLint automation, including custom ESLint rules and lint checks that catch and auto-handle a lot of stuff before it even hits review.

Then we take it further: we have a deep code review agent Claude Code runs after changes are made. And when a PR goes up, we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it.

On top of that, we’ve got like five other Claude Code GitHub workflow agents that run on a schedule. One of them reads all commits from the last month and makes sure docs are still aligned. Another checks for gaps in end-to-end coverage. Stuff like that. A ton of maintenance and quality work is just… automated. It runs ridiculously smoothly.

We even use Claude Code for ticket triage. It reads the ticket, digs into the codebase, and leaves a comment with what it thinks should be done. So when an engineer picks it up, they’re basically starting halfway through already.

There is so much low-hanging fruit here that it honestly blows my mind people aren’t all over it. 2026 is going to be a wake-up call.

(used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

dmbche

3d ago

2 replies

Oh! An ad!

OldGreenYodaGPT

3d ago

1 reply

lol does sound like and ad, but is true. Also forgot about hooks use hooks too! I just use voice to text then had claude reword it. Still my real world ideas

Rapzid

2d ago

Exactly what an ad would say.

savanaly

3d ago

The most effective kind of marketing is viral word of mouth from users who love your product. And Claude Code is benefiting from that dynamic.

hoten

3d ago

3 replies

Mind sharing the bill for all that?

OldGreenYodaGPT

3d ago

2 replies

My company pays for the team Claude code plan which is like $200 a month for each dev. The workflows cost like 10 - 50 cents a PR

blahblaher

3d ago

4 replies

It will have to quintuple or more to make business sense for Anthropic. Sure, still cheaper than a full time developer, but don't expect it to stay at $200 for a long time. And then, when you explain to your boss how amazing it is, and can do all this work so easily and quickly, it's when your boss start asking the real question: what am I paying you for?

bonesss

3d ago

1 reply

More fundamentally: assume a 10 to 30% bump in actual productivity, find a niche (editing software, CRUD frameworks, SharePoint 2.0, stock trading, betting, whatever), and assume you had Anthropics billions or openAIs billions or Microsoft’s billions or Googles billions.

Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts.

Lockheed-Martin could be, but isn’t, opening lemonade stands outside their offices… they don’t because of how buying a Ferrari works.

theshrike79

2d ago

[delayed]

senordevnyc

2d ago

All the evidence suggests that inference is quite profitable actually.

HDThoreaun

2d ago

Im not sure about this. What they really need is to get rid of the free tier and widespread adoption. Inference on the $200 plan seems to be profitable right now so they just need more users to amortize training costs.

benjiro

2d ago

A programmer, if we use US standards is probably $8000 per month. If you can get 30% more value out of that programmer (trust me, its WAY more then 30%), you gained $2400 of value. If you pay $200, $500, $1000 for that, its still a net positive. Ignoring the salary range of a actual senior...

LLMs do not result in bosses firing people, it results in more projects / faster completed projects, what in turn means more $$$ for a company.

square_usual

3d ago

It's $150, not a huge difference but worth noting that it's not the same ast the 20x Max plan.

6177c40f

3d ago

1 reply

Cheaper than hiring another developer, probably. My experience: for a few dollars I was able to extensively refactor a Python codebase in half a day. This otherwise would have taken multiple days of very tedious work.

blahblaher

3d ago

And that's what the C-suite wants to know. Prepare yourself to be replaced in the not so distant future. Hope you have a good "nest" to support yourself when you're inevitably fired.

aschobel

3d ago

i've never hit a limit with my $200 a month plan