2025: the Year in Llms

Posted6d agoActive2d ago

simonw

922 points

591 comments

simonwillison.netTech DiscussionstoryHigh profile

informativepositive

Debate

20/100

Large Language ModelsAI2025 Predictions

Key topics

Large Language Models

2025 Predictions

As the AI landscape continues to evolve, a post reflecting on "2025: The Year in LLMs" sparked a lively discussion, with commenters waxing nostalgic about the rapid progress in the field. Some reminisced about the slower pace of development in the past, joking about the days when a year's worth of progress in Java was just a vote on adding syntactic sugar, while others fondly recalled the optimism surrounding Rust. The conversation took a humorous turn as commenters playfully debated the relative dangers of "AI bros" versus "crypto bros," with some pointing out the environmental and economic impacts of cryptocurrency mining operations in rural America. Amidst the lighthearted banter, a few commenters also touched on more serious topics, like the economic implications of large-scale AI infrastructure development.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

56m

Peak period

107

0-12h

Avg / period

22.9

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 31, 2025 at 6:54 PM EST
6d ago
Step 01
02First comment
Dec 31, 2025 at 7:50 PM EST
56m after posting
Step 02
03Peak activity
107 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Jan 5, 2026 at 2:16 AM EST
2d ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (591 comments)

Showing 160 comments of 591

AndyNemmity

6d ago

1 reply

These are excellent every year, thank you for all the wonderful work you do.

tkgally

6d ago

1 reply

[delayed]

password4321

6d ago

1 reply

Don't forget you can pay Simon to keep up with less!

> At the end of every month I send out a much shorter newsletter to anyone who sponsors me for $10 or more on GitHub

https://simonwillison.net/about/#monthly

th0ma5

5d ago

I was assured by many on here that they makes no money from all of this and that's why it's okay for them to constantly spam the site?

waldrews

6d ago

4 replies

Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...

throwup238

6d ago

1 reply

> they voted to add some syntactic sugar to Java...

I remember when we just wanted to rewrite everything in Rust.

Those were the simpler times, when crypto bros seemed like the worst venture capitalism could conjure.

OGEnthusiast

6d ago

3 replies

Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.

SauntSolaire

6d ago

1 reply

Instead they were building crypto mining warehouses in rural America and propping up artificial currencies like BTC.

ryandrake

6d ago

1 reply

Crazy how the two most hyped and funded technologies of the decade were: energy wasting fake money for criminals and energy wasting plagiarism machines.

scotty79

6d ago

This sober look at reality costed you one million percent returns (from $1 to $100k) over last decade.

I wonder what's going to cost you being this sober about AI? Your career ar minimum, I'm guessing.

zahlman

6d ago

1 reply

Speaking of which, we never found out the details (strike price/expiration) of Michael Burry's puts, did we? It seems he could have made bank if he'd waited one more month...

kamranjon

6d ago

I think they expire in March 2026 if the NVIDIA stock drops to $140 a share? Something close to that I think.

mgfist

6d ago

3 replies

It's funny how people complain about the rust belt dying and factories leaving rural communities and so on, then when someone wants to build something that can provide jobs and tax revenue, everyone complains.

jakeydus

6d ago

1 reply

How many people are employed at the average data center? A few dozen? Versus a steel mill, that’s nothing. A chicken plant in Nebraska closed down this last month. 3200 people lost their jobs. You think Meta will fill it with GPUs and the whole town will have jobs again?

scotty79

6d ago

1 reply

Many more are employed while building it. And they will never stop building. It's modern version of rail. But instead of distances it will cover the area.

uxcolumbo

6d ago

1 reply

Will local folks get those jobs to build the data center?

And if so, what happens to those builders once the data center is built?

scotty79

4d ago

1 reply

> Will local folks get those jobs to build the data center?

Yes. At some point the demand will be so high that imported workers won't suffice and local population will need to be trained and hired.

> And if so, what happens to those builders once the data center is built?

They are going to be moved to a new place where the datacenters will need to be built next. Mobility if the workforce was often cited as one of the greatest strengths of US economy.

uxcolumbo

4d ago

1 reply

So local people in town 1 who are getting these jobs to build the data center will then have to move to town 2 to build a data center there? What happens to the local people in town 2 who are also looking for construction jobs?

scotty79

3d ago

Local people in town 2 share the same fate that people in town 1 alread had. If there's not enough imported workers, from town 1 or elsewere people from town 2 will need to be trained and employed.

More and more data centers (and power sources) are going to be built at the same time so more and more workers will be needed. This is going to be THE job. I think there are going to be many similarities with the age when railroads were being developed. Hopefully with less worker deaths this time.

lostlogin

6d ago

I’ve heard about the risk of AI leading to job losses and wealth concentration.

I haven’t heard about new businesses, job creation and growth in former industrial towns. What have I missed?

techpression

6d ago

As if any taxes will be paid to the areas affected, and add to that the billions in taxes used to subsidize everything before a single cent is a net positive.

nrhrjrjrjtntbt

6d ago

1 reply

More like 6 different new nosql databases and js frameworks.

dotancohen

6d ago

A Wordpress zero day and Linux not on the desktop. Netcraft confirms it.

crystal_revenge

6d ago

4 replies

That must have been a long time back. Having lived through the time when web pages were served through CGI and mobile phones only existed in movies, when SVMs where the new hotness in ML and people would write about how weird NNs were, I feel like I've seen a lot more concrete progress in the last few decades than this year.

This year honestly feels quite stagnant. LLMs are literally technology that can only reproduce the past. They're cool, but they were way cooler 4 years ago. We've taken big ideas like "agents" and "reinforcement learning" and basically stripped them of all meaning in order to claim progress.

I mean, do you remember Geoffrey Hinton's RBM talk at Google in 2010? [0] That was absolutely insane for anyone keeping up with that field. By the mid-twenty teens RBMs were already outdated. I remember when everyone was implementing flavors of RNNs and LSTMs. Karpathy's character 2015 RNN project was insane [1].

This comment makes me wonder if part of the hype around LLMs is just that a lot of software people simply weren't paying attention to the absolutely mind-blowing progress we've seen in this field for the last 20 years. But even ignoring ML, the world's of web development and mobile application development have gone through incredible progress over the last decade and a half. I remember a time when JavaScript books would have a section warning that you should never use JS for anything critical to the application. Then there's the work in theorem provers over the last decade... If you remember when syntactic sugar was progress, either you remember way further back than I do, or you weren't paying attention to what was happening in the larger computing world.

0. https://www.youtube.com/watch?v=VdIURAu1-aU

1. https://karpathy.github.io/2015/05/21/rnn-effectiveness/

handoflixue

6d ago

5 replies

> LLMs are literally technology that can only reproduce the past.

Funny, I've used them to create my own personalized text editor, perfectly tailored to what I actually want. I'm pretty sure that didn't exist before.

It's wild to me how many people who talk about LLM apparently haven't learned how to use them for even very basic tasks like this! No wonder you think they're not that powerful, if you don't even know basic stuff like this. You really owe it to yourself to try them out.

crystal_revenge

6d ago

3 replies

> You really owe it to yourself to try them out.

I've worked at multiple AI startups in lead AI Engineering roles, both working on deploying user facing LLM products and working on the research end of LLMs. I've done collaborative projects and demos with a pretty wide range of big names in this space (but don't want to doxx myself too aggressively), have had my LLM work cited in HN multiple times, have LLM based github projects with hundreds of stars, appeared on a few podcasts talking about AI etc.

This gets to the point I was making. I'm starting to realize that part of the disconnect between my opinions on the state of the field and others is that many people haven't really been paying much attention.

I can see if recent LLMs are your first intro to the state of the field, it must feel incredible.

CamperBob2

6d ago

3 replies

That's all very impressive, to be sure. But are you sure you're getting the point? As of 2025, LLMs are now very good at writing new code, creating new imagery, and writing original text. They continue to improve at a remarkable rate. They are helping their users create things that didn't exist before.

So it is absurdly incorrect to say "they can only reproduce the past."

crystal_revenge

6d ago

8 replies

I think the confusion is people's misunderstanding of what 'new code' and 'new imagery' mean. Yes, LLMs can generate a specific CRUD webapp that hasn't existed before but only based on interpolating between the history of existing CRUD webapps. I mean traditional Markov Chains can also product 'new' text in the sense that "this exact text" hasn't been seen before, but nobody would argue that traditional Markov Chains aren't constrained by "only producing the past".

This is even more clear in the case of diffusion models (which I personally love using, and have spent a lot of time researching). All of the "new" images created by even the most advanced diffusion models are fundamentally remixing past information. This is really obvious to anyone who has played around with these extensively because they really can't produce truly novel concepts. New concepts can be added by things like fine-tuning or use of LoRAs, but fundamentally you're still just remixing the past.

LLMs are always doing some form of interpolation between different points in the past. Yes they can create a "new" SQL query, but it's just remixing from the SQL queries that have existed prior. This still makes them very useful because a lot of engineering work, including writing a custom text editor, involve remixing existing engineering work. If you could have stack-overflowed your way to an answer in the past, an LLM will be much superior. In fact, the phrase "CRUD" largely exists to point out that most webapps are fundamentally the same.

A great example of this limitation in practice is the work that Terry Tao is doing with LLMs. One of the largest challenges in automated theorem proving is translating human proofs into the language of a theorem prover (often Lean these days). The challenge is that there is not very much Lean code currently available to LLMs (especially with the necessary context of the accompanying NL proof), so they struggle to correctly translate. Most of the research in this area is around improving LLM's representation of the mapping from human proofs to Lean proofs (btw, I personally feel like LLMs do have a reasonably good chance of providing major improvements in the space of formal theorem proving, in conjunction with languages like Lean, because the translation process is the biggest blocker to progress).

When you say:

> So it is absurdly incorrect to say "they can only reproduce the past."

It's pretty clear you don't have a solid background in generative models, because this is fundamentally what they do: model an existing probability distribution and draw samples from that. LLMs are doing this for a massive amount of human text, which is why they do produce some impressive and useful results, but this is also a fundamental limitation.

But a world where we used LLMs for the majority of work, would be a world with no fundamental breakthroughs. If you've read The Three Body Problem, it's very much like living in the world where scientific progress is impeded by sophons. In that world there is still some progress (especially with abundant energy), but it remains fundamentally and deeply limited.

threethirtytwo

6d ago

1 reply

> It's pretty clear you don't have a solid background in generative models, because this is fundamentally what they do

You don’t have a solid background.

This is fundamentally what humans and what every function in existence does as well. It is impossible to have any sort of deterministic function, process or anything produce new information from old information. You can combine information you can transform information you can lose information. But producing new information from deterministic intelligence is fundamentally impossible.

New information can literally only arise through stochastic processes. It must be randomly generated and then selected and filtered. That’s essentially what creativity is. There is literally no other logical way to generate “new information”. Purely random is never really useful so useful information arrives only after it is filtered.

LLMs do have stochastic aspects to them so we know for a fact it is generating new things and not just drawing on the past.

The fundamental limitation with LLMs is not that it can’t create new things. It’s that the context window is too small to create new things beyond that. Whatever it can create it is limited to the possibilities within that window.

jheez3

5d ago

1 reply

"You don’t have a solid background.

If you want to go around huffing and puffing your chest about a subject area, you kinda do fella. Credibility.

threethirtytwo

5d ago

1 reply

Not only is what he saying in direct contradiction to what people with credibility have said, but his claimed credentials can be utter bullshit.

This is the internet bro. Credibility is irrelevant because identities can never be verified. So the only thing that matters is the strength and rationality of an argument.

That’s the point of hacker news substantive content not some battle of comparison of credentials or useless quips (like yours) with zero substance. Say something worth reading if you have anything to say at all, otherwise nobody cares.

jheez3

5d ago

Sounds like you need some Kleenex ‘bro’.

signatoremo

6d ago

Lot of impressive points. They are also irrelevant. The majority of people also only extrapolate from the knowledge they acquired in the past. That’s why there is the concept of inventor, someone who comes up with new ideas. Many new inventions are also based on existing ideas. Is that the reason to dismiss those achievements?

Do you only take LLM seriously if it can be another Einstein?

Can you give examples of some recent, truly new, not in the past ideas?

PeterHolzwarth

6d ago

Just an innocent bystander here, so forgive me, but I think the flack you are getting is because you appear to be responding to claims that these tools will reinvent everything and introduce a new halcyon age of creation - when, at least on hacker news, and definitely in this thread, no one is really making such claims.

Put another way, and I hate to throw in the now over-used phrase, but I feel you may be responding to a strawman that doesn't much appear in the article or the discussion here: "Because these tools don't achieve a god-like level of novel perfection that no one is really promising here, I dismiss all this sorta crap."

Especially when you I think you are also admitting that the technology is quite a useful tool - a stance which I think represents the bulk of the feelings that supporters of the tech here on HN are describing.

uxcolumbo

6d ago

How do human brains create something novel and what will it take for AIs to do the same?

oedemis

6d ago

as architectures evolve, i think it can be that we learn more "side effects".. back in 2020 openai researchers said "GPT-3 is applied without any gradient updates or fine-tuning" the model emerges at a certain level of scale...

orbital-decay

6d ago

Yes, LLMs are pure functions, they map inputs to outputs, and mode collapse in current models makes the output distribution needle-like so even random sampling doesn't get you much variety. But so are you! You're also a pure function at each moment in time. There's no "magic of novelty" in humans. "Novel" is pretty ill-defined and subjective, and novelty is red herring. The real difference is that you have a lot more external input and sophistication/nuance than an LLM.

aoeusnth1

5d ago

> It's pretty clear you don't have a solid background in generative models, because this is fundamentally what they do: model an existing probability distribution and draw samples from that.

After post-training, this is definitively NOT what an LLM does.

throwaway7783

6d ago

Would you say that LLMs can discover patterns hitherto unknown? It would still be generating from the past, but patterns/connections not made before.

windexh8er

6d ago

6 replies

> They are helping their users create things that didn't exist before.

That is a derived output. That isn't new as in: novel. It may be unique but it is derived from training data. LLMs legitimately cannot think and thus they cannot create in that way.

Kerrick

6d ago

2 replies

That is a pedantic distinction. You can create something that didn't exist by combining two things that did exist, in a way of combining things that already existed. For example, you could use a blender to combine almond butter and sawdust. While this may not be "novel", and it may be derived from existing materials and methods, you may still lay claim to having created something that didn't exist before.

For a more practical example, creating bindings from dynamic-language-A for a library in compiled-language-B is a genuinely useful task, allowing you to create things that didn't exist before. Those things are likely to unlock great happiness and/or productivity, even if they are derived from training data.

windexh8er

6d ago

> That is a pedantic distinction. You can create something that didn't exist by combining two things that did exist, in a way of combining things that already existed.

This is the definition of a derived product. Call it a derivative work if we're being pedantic and, regardless, is not any level of proof that LLMs "think".

threethirtytwo

5d ago

Pedantic and not true. The LLM has stochastic processes involved. Randomness. That’s not old information. That’s newly generated stuff.

zingar

6d ago

1 reply

Could you give us an idea of what you’re hoping for that is not possible to derive from training data of the entire internet and many (most?) published books?

techpression

6d ago

1 reply

This is the problem, the entire internet is a really bad set of training data because it’s extremely polluted.

Also the derived argument doesn’t really hold, just because you know about two things doesn’t mean you’d be able to come up with the third, it’s actually very hard most of the time and requires you to not do next token prediction.

threethirtytwo

6d ago

1 reply

The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data. It can figure the world out just as we can figure it out when we are as well inundated with bullshit data. The pathways exist in the LLM but it won’t necessarily reveal that to you unless you tune it with RL.

ahtihn

6d ago

> The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data.

I don't believe they can. LLMs have no concept of truth.

What's likely is that the "truth" for many subjects is represented way more than fiction and when there is objective truth it's consistently represented in similar way. On the other hand there are many variations of "fiction" for the same subject.

6d ago

1 reply

What does "think" mean?

Why is that kind of thinking required to create novel works?

Randomness can create novelty.

Mistakes can be novel.

There are many ways to create novelty.

Also I think you might not know how LLMs are trained to code. Pre-training gives them some idea of the syntax etc but that only gets you to fancy autocomplete.

Modern LLMs are heavily trained using reinforcement data which is custom task the labs pay people to do (or by distilling another LLM which has had the process performed on it).

windexh8er

6d ago

> Also I think you might not know how LLMs are trained to code.

What's clear here is that you have zero idea what you're talking about while simultaneously telling me and poorly mansplaining.

closewith

6d ago

1 reply

[delayed]

windexh8er

6d ago

2 replies

Wow.

You’re using ‘derived’ to imply ‘therefore equivalent.’ That’s a category error. A cookbook is derived from food culture. Does an LLM taste food? Can it think about how good that cookie tastes?

A flight simulator is derived from aerodynamics - yet it doesn’t fly.

Likewise, text that resembles reasoning isn’t the same thing as a system that has beliefs, intentions, or understanding. Humans do. LLMs don't.

CamperBob2

5d ago

1 reply

Ask an LLM what's the difference between a human brain and an LLM. If an LLM could "think" it wouldn't give you the answer it just did.

I imagine that sounded more profound when you wrote it than it did just now, when I read it. Can you be a little more specific, with regard to what features you would expect to differ between LLM and human responses to such a question?

Right now, LLM system prompts are strongly geared towards not claiming that they are humans or simulations of humans. If your point is that a hypothetical "thinking" LLM would claim to be a human, that could certainly be arranged with an appropriate system prompt. You wouldn't know whether you were talking to an LLM or a human -- just as you don't now -- but nothing would be proved either way. That's ultimately why the Turing test is a poor metric.

windexh8er

5d ago

> Right now, LLM system prompts are strongly geared towards not claiming that they are humans or simulations of humans. If your point is that a hypothetical "thinking" LLM would claim to be a human, that could certainly be arranged with an appropriate system prompt. You wouldn't know whether you were talking to an LLM or a human -- just as you don't now -- but nothing would be proved either way. That's ultimately why the Turing test is a poor metric.

The mental gymnastics here is entertainment at best. Of course the thinking LLM would give feedback on how it's actually just a pattern model over text - well, we shouldn't believe that! The LLM was trained to lie about it's true capabilities in your own admission?

How about these...

What observable capability would you expect from "true cognitive thought" that a next-token predictor couldn’t fake?

Where are the system’s goals coming from—does it originate them, or only reflect the user/prompt?

How does it know when it’s wrong without an external verifier? If the training data says X and the answer is Y - how will it ever know it was wrong and reach the correct conclusion?

closewith

5d ago

1 reply

[delayed]

windexh8er

5d ago

Oh yes, they are.

And beyond people claiming that LLMs are basically sentient you have people like CamperBob2 who made this wild claim:

"""There's no such thing as people without language, except for infants and those who are so mentally incapacitated that the answer is self-evidently "No, they cannot."

Language is the substrate of reason. It doesn't need to be spoken or written, but it's a necessary and (as it turns out) sufficient component of thought."""

Let that sink. They literally think that there's no such thing as people without language. Talk about a wild and ignorant take on life in general!

ordersofmag

6d ago

2 replies

I will find this often-repeated argument compelling only when someone can prove to me that the human mind works in a way that isn't 'combining stuff it learned in the past'.

5 years ago a typical argument against AGI was that computers would never be able to think because "real thinking" involved mastery of language which was something clearly beyond what computers would ever be able to do. The implication was that there was some magic sauce that human brains had that couldn't be replicated in silicon (by us). That 'facility with language' argument has clearly fallen apart over the last 3 years and been replaced with what appears to be a different magic sauce comprised of the phrases 'not really thinking' and the whole 'just repeating what it's heard/parrot' argument.

I don't think LLM's think or will reach AGI through scaling and I'm skeptical we're particularly close to AGI in any form. But I feel like it's a matter of incremental steps. There isn't some magic chasm that needs to be crossed. When we get there I think we will look back and see that 'legitimately thinking' wasn't anything magic. We'll look at AGI and instead of saying "isn't it amazing computers can do this" we'll say "wow, was that all there is to thinking like a human".

windexh8er

6d ago

1 reply

> 5 years ago a typical argument against AGI was that computers would never be able to think because "real thinking" involved mastery of language which was something clearly beyond what computers would ever be able to do.

Mastery of words is thinking? In that line of argument then computers have been able to think for decades.

Humans don't think only in words. Our context, memory and thoughts are processed and occur in ways we don't understand, still.

There's a lot of great information out there describing this [0][1]. Continuing to believe these tools are thinking, however, is dangerous. I'd gather it has something to do with logic: you can't see the process and it's non-deterministic so it feels like thinking. ELIZA tricked people. LLMs are no different.

[0] https://archive.is/FM4y8 [0] https://www.theverge.com/ai-artificial-intelligence/827820/l... [1] https://www.raspberrypi.org/blog/secondary-school-maths-show...

CamperBob2

5d ago

3 replies

Mastery of words is thinking?

That's the crazy thing. Yes, in fact, it turns out that language encodes and embodies reasoning, if you pile up enough of it in a high-dimensional space and add some feedback in the form of RL.

No one had the faintest clue. So many people not only don't understand what just happened, they don't think anything happened at all.

meindnoch

5d ago

So people without language cannot reason? I don't think so.

windexh8er

5d ago

> ELIZA, ROFL. How'd ELIZA do at the IMO last year?

What's funny is the failure to grasp any contextual framing of ELIZA. When it came out people were impressed by it's reasoning, it's responses. And in your line of defense it could think because it had mastery of words!

But fast forward the current timeline 30 years. You will have been of the same camp that argued on behalf of ELIZA when the rest of the world was asking, confusingly: how did people think ChatGPT could think?

svieira

4d ago

> Yes, in fact, it turns out that language encodes and embodies reasoning ... No one had the faintest clue

Funnily enough, they did, if you go back far enough. It's only the deconstructionists and the solipsists who had the audacity to think otherwise.

arcatech

6d ago

> I will find this often-repeated argument compelling only when someone can prove to me that the human mind works in a way that isn't 'combining stuff it learned in the past'.

This is the definition of the word ‘novel’.

jama211

6d ago

Yeah you’ve lost me here I’m sorry. In the real world humans work with AI tools to create new things. What you’re saying is the equivalent of “when a human writes a book in English, because they use words and letters that already exist and they already know they aren’t creating anything new”.

weatherlite

6d ago

> So it is absurdly incorrect to say "they can only reproduce the past."

Also , a shitton of what we do economically is reproducing the past with slight tweaks and improvements. We all do very repetitive things and these tools cut the time / personnel needed by a significant factor.

handoflixue

6d ago

4 replies

Seriously, all that familiarity and you think an LLM "literally" can't invent anything that didn't already exist?

Like, I'm sorry, but you're just flat-out wrong and I've got the proof sitting on my hard drive. I use this supposedly impossible program daily.

bigyabai

6d ago

1 reply

FWIW, your "evidence" is a text editor. I'm glad you made a tool that works for you, but the parent's point stands; this is a 200-level course homework assignment. Tens of thousands of homemade editors exist, in various states of disrepair and vain overengineering.

least

6d ago

1 reply

The difference between those is the person is actually using this text editor that they built with the help of LLMs. There's plenty of people creating novel scripts and programs that can accommodate their own unique specifications.

If a programmer creating their own software (or contracting it out to a developer) would be a bespoke suit and using software someone or some company created without your input is an off the rack suit, I'd liken these sorts of programs as semi-bespoke, or made to measure.

"LLMs are literally technology that can only reproduce the past" feels like an odd statement. I think the point they're going for is that it's not thinking and so it's not going to produce new ideas like a human would? But literally no technology does that. That is all derived from some human beings being particularly clever.

LLMs are tools. They can enable a human to create new things because they are interfacing with a human to facilitate it. It's merging the functional knowledge and vision of a person and translating it into something else.

resize2996

5d ago

compilers can only produce machine code. so unorginal.

windexh8er

6d ago

1 reply

Do you also think LLMs "think"?

From what you've described an LLM has not invented anything. LLMs that can reason have a bit more slight of hand but they're not coming up with new ideas outside of the bounds of what a lot of words have encompassed in both fiction and non.

Good for you that you've got a fun token of code that's what you've always wanted, I guess. But this type of fantasy take on LLMs seems to be more and more prevalent as of late. A lot of people defending LLMs as if they're owed something because they've built something or maybe people are getting more and more attached to them from the conversational angle. I'm not sure, but I've run across more people in 2025 that are way too far in the deep end of personifying their relationships with LLMs.

Kerrick

6d ago

2 replies

Hang on, you're now saying that if something has ever been described in fiction it doesn't count as invention? So if somebody literally developed a working photon torpedo, that isn't new because "Star Trek Did It"?

phatfish

6d ago

1 reply

Is there any danger an LLM is going to create a working photo torpedo?

ben_w

6d ago

Well, they can use tools, and tools includes physics simulations, so if it is possible (and FWIW the tool-free "intuition" of ChatGPT is "there will never be an age of antimatter"), then why couldn't LLMs grind those tools to get a solution?

windexh8er

6d ago

1 reply

You seem to be pretty far down the rabbit hole. How about this... You task an LLM to create a photon torpedo. If it can truly think then it should be able to provide you with something tangible. When you've got that in hand let us all know.

Back to the land of reality... Describing something in fiction doesn’t magically make it "not an invention". Fiction can anticipate an idea, but invention is about producing a working, testable implementation and usually involves novel technical methods. "Star Trek did it" is at most prior art for the concept, not a blueprint for the mechanism. If you can't understand that differential then maybe go ask an LLM.

Kerrick

5d ago

I didn't say anything about an LLM. I said "somebody" not "some predictive text engine."

ctxc

6d ago

Some people cannot be convinced simply because their expectation of "novel" is something that appears in an Asimov novel.

I for one think your work is pretty cool - even though I haven't seen it, using something you built everyday is a claim not many can make!

9rx

6d ago

When a computer is able to invent things, we’ve achieved AGI. Do you believe we are already in the AGI era, or is the inventor in this case actually you?

threethirtytwo

6d ago

1 reply

Over half of HN still thinks it’s a stochastic parrot and that it’s just a glorified google search.

The change hit us so fast a huge number of people don’t understand how capable it is yet.

Also it certainly doesn’t help that it still hallucinates. One mistake and it’s enough to set someone against LLMs. You really need to push through that hallucinations are just the weak part of the process to see the value.

CamperBob2

5d ago

1 reply

The big problem I see, over and over, is that people pose poorly-formed questions to the free ChatGPT and Google models, laugh at the resulting half-baked answers that are often full of errors and hallucinations, and draw sweeping conclusions about the technology as a whole.

Either that, or they tried it "a year or two ago" and have no concept of how far things have gone in the meantime.

It's like they wandered into a machine shop, cut off a finger or two, and concluded that their grandpa's hammer and hacksaw were all anyone ever needed.

habinero

5d ago

3 replies

No, frankly it's the difference between actual engineers and hobbyists/amateurs/non-SWEs.

SWEs are trained to discard surface-level observations and be adversarial. You can't just look at the happy path, how does the system behave for edge cases? Where does it break down and how? What are the failure modes?

The actual analogy to a machine shop would be to look at whether the machines were adequate for their use case, the building had enough reliable power to run and if there were any safety issues.

It's easy to Clever Hans yourself and get snowed by what looks like sophisticated effort or flat out bullshit. I had to gently tell a junior engineer that just because the marketing claims something will work a certain way, that doesn't mean it will.

CamperBob2

5d ago

1 reply

You sound pretty certain. There's often money in taking the contrarian view, where you have insights that the so-called "smart money" lacks. What are some good investments to make in the extreme-bear case?

habinero

5d ago

1 reply

My dude, I assure you "humans are really good at convincing themselves of things that are not true" is a very, very well known fact. I don't know what kind of arbitrage you think exists in this incredibly anodyne statement lol.

If you want a financial tip, don't short stock and chase market butterflies. Instead, make real professional friends, develop real skills and learn to be friendly and useful.

I made my money in tech already, partially by being lucky and in the right place at the right time, and partially because I made my own luck by having friends who passed the opportunity along.

Hope that helps!

threethirtytwo

4d ago

That answer is basically an admission that you don’t actually hold a strong contrarian belief about the technology at all.

The question wasn’t “are humans sometimes self-delusional?” Everyone agrees with that. The question was whether, in this specific case, the prevailing view about LLM capability is meaningfully wrong in a way that has implications. If you really believed this was mostly Clever Hans, there would be concrete consequences. Entire categories of investment, hiring, and product strategy would be mispriced.

Instead you retreated to “don’t short stocks” and generic career advice. That’s not skepticism, it’s risk-free agnosticism. You get to sound wise without committing to any falsifiable position.

Also, “I made my money already” doesn’t strengthen the argument. It sidesteps it. Being right once, or being lucky in a good cycle, doesn’t confer epistemic authority about a new technology. If anything, the whole point of contrarian insight is that it forces uncomfortable bets or at least uncomfortable predictions.

Engineers don’t evaluate systems by vibes or by motivational aphorisms. They ask: if this hypothesis is true, what would we expect to see? What would fail? What would be overhyped? What would not scale? You haven’t named any of that. You’ve just asserted that people fool themselves and stopped there.

threethirtytwo

5d ago

1 reply

What you’re describing is just competent engineering, and it’s already been applied to LLMs. People have been adversarial. That’s why we know so much about hallucinations, jailbreaks, distribution shift failures, and long-horizon breakdowns in the first place. If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

The key point you’re missing is the type of failure. Search systems fail by not retrieving. Parrots fail by repeating. LLMs fail by producing internally coherent but factually wrong world models. That failure mode only exists if the system is actually modeling and reasoning, imperfectly. You don’t get that behavior from lookup or regurgitation.

This shows up concretely in how errors scale. Ambiguity and multi-step inference increase hallucinations. Scaffolding, tools, and verification loops reduce them. Step-by-step reasoning helps. Grounding helps. None of that makes sense for a glorified Google search.

Hallucinations are a real weakness, but they’re not evidence of absence of capability. They’re evidence of an incomplete reasoning system operating without sufficient constraints. Engineers don’t dismiss CNC machines because they crash bits. They map the envelope and design around it. That’s what’s happening here.

Being skeptical of reliability in specific use cases is reasonable. Concluding from those failure modes that this is just Clever Hans is not adversarial engineering. It’s stopping one layer too early.

habinero

5d ago

2 replies

> If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

Absolutely not true. I cannot express how strongly this is not true, haha. The tech is neat, and plenty of real computer scientists work on it. That doesn't mean it's not wildly misunderstood by others.

> Concluding from those failure modes that this is just Clever Hans is not adversarial engineering.

I feel like you're maybe misunderstanding what I mean when I refer to Clever Hans. The Clever Hans story is not about the horse. It's about the people.

A lot of people -- including his owner-- were legitimately convinced that a horse could do math, because look, literally anyone can ask the horse questions and it answers them correctly. What more proof do you need? It's obvious he can do math.

Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.

The relevant lesson here is it's very easy to convince yourself you saw something you 100% did not see. (It's why magic shows are fun.)

CamperBob2

5d ago

Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.

How can anyone choose to remain so willfully ignorant in the face of irrefutable evidence that they're wrong?

https://arxiv.org/abs/2507.15855

threethirtytwo

4d ago

You’re leaning very hard on the Clever Hans story, but you’re still missing why the analogy fails in a way that should matter to an engineer.

Clever Hans was exposed because the effect disappeared under controlled conditions. Blind the observers, remove human cues, and the behavior vanished. The entire lesson of Clever Hans is not “people can fool themselves,” it’s “remove the hidden channel and see if the effect survives.” That test is exactly what has been done here, repeatedly.

LLM capability does not disappear when you remove human feedback. It does not disappear under automatic evaluation. It does not disappear across domains, prompts, or tasks the model was never trained or rewarded on. In fact, many of the strongest demonstrations people point to are ones where no human is in the loop at all: program synthesis benchmarks, math solvers, code execution tasks, multi-step planning with tool APIs, compiler error fixing, protocol following. These are not magic tricks performed for an audience. They are mechanically checkable outcomes.

Your framing quietly swaps “some people misunderstand the tech” for “therefore the tech itself is misunderstood in kind.” That’s a rhetorical move, not an argument. Yes, lots of people are confused. That has no bearing on whether the system internally models structure or just parrots. The horse didn’t suddenly keep solving arithmetic when the cues were removed. These systems do.

The “it’s about the people” point also cuts the wrong way. In Clever Hans, experts were convinced until adversarial controls were applied. With LLMs, the more adversarial the evaluation gets, the clearer the internal structure becomes. The failure modes sharpen. You start seeing confidence calibration errors, missing constraints, reasoning depth limits, and brittleness under distribution shift. Those are not illusions created by observers. They’re properties of the system under stress.

You’re also glossing over a key asymmetry. Hans never generalized. He didn’t get better at new tasks with minor scaffolding. He didn’t improve when the problem was decomposed. He didn’t degrade gracefully as difficulty increased. LLMs do all of these things, and in ways that correlate with architectural changes and training regimes. That’s not how self-deception looks. That’s how systems with internal representations behave.

I’ll be blunt but polite here: invoking Clever Hans at this stage is not adversarial rigor, it’s a reflex. It’s what you reach for when something feels too capable to be comfortable but you don’t have a concrete failure mechanism to point at. Engineers don’t stop at “people can be fooled.” They ask “what happens when I remove the channel that could be doing the fooling?” That experiment has already been run.

If your claim is “LLMs are unreliable for certain classes of problems,” that’s true and boring. If your claim is “this is all an illusion caused by human pattern-matching,” then you need to explain why the illusion survives automated checks, blind evaluation, distribution shift, and tool-mediated execution. Until then, the Hans analogy isn’t skeptical. It’s nostalgic.

jheez3

5d ago

1 reply

I wish there was a way to discern posts from legit clever people from the not-so.

Its annoying to see posts from people who lag behind in intelligence and just dont get it - people learn at different rates. Some see way further ahead.

threethirtytwo

5d ago

A good way to filter is to look in the mirror. Only the person in the mirror sees further ahead than anyone else.

Greduan

6d ago

1 reply

Text editors in a thousand flavours has indeed already been programmed though. I don't think you understood what op meant.

Curious, does it perform at the limit of the hardware? Was it programmed in a tools language (like C++, Rust, C, etc.) or in a web tech?

zingar

6d ago

What is the point that you believe would be demonstrated by a new text editor running at the limit of hardware in a compiled editor? Would that point apply to every other text editor that exists already?

nsxwolf

5d ago

The LLM didn't invent any new technology to do that, though. You used the LLM to reorganize Lego building blocks of knowledge into something new.

Without you, there was nothing.

fmbb

6d ago

Is your new text editor open source?

noosphr

6d ago

>Funny, I've used them to create my own personalized text editor, perfectly tailored to what I actually want. I'm pretty sure that didn't exist before.

Unless you time travelled from 1946 you should be aware that there has been at least one text editor written by human that LLMs were trained on.

ako

6d ago

1 reply

> This year honestly feels quite stagnant. LLMs are literally technology that can only reproduce the past.

Is this such a big limitation? Most jobs are basically people trained on past knowledge applying it today. No need to generate new knowledge.

mr_toad

5d ago

Most people are capable of long-term learning. Some people are capable of discovering and inventing new things. I think the two are related, and current NN architecture doesn’t allow this. An AI that can cobble together a CRUD application to spec is one thing. An AI that can come up with a new idea for a successful app on its own is a completely different ball game.

HarHarVeryFunny

5d ago

> LLMs are literally technology that can only reproduce the past.

That's incorrect on many levels. They are drawing upon, and reproducing, language patterns from "the past", but they are combining those patterns in ways that may have never have been seen before. They may not be truly creative, but they are still capable of generating novel outputs.

> They're cool, but they were way cooler 4 years ago.

Maybe this year has been more about incremental progress with LLMs than the shock/coolness factor of talking to an LLM for the first time, but the utility of them, especially for programming, has dramatically increased this year, really in the last 6 months.

The improvement in "AI" image and video generation has also been impressive, to the point now that fake videos on YouTube can often only be identified as such by common sense rather that the fact that they don't look real.

Incremental improvement can often be more impressive that innovation, whose future importance can be hard to judge when it first appears. How many people read "Attention is all you need" in 2017 and thought "Wow! This is going to change the world!". Not even the authors of the paper thought that.

waldrews

6d ago

I'm being hyperbolic of course, but I'm a little dismissive of the progress that happened since the days of BBS's and car based cell phones - we just got more connectivity, more capacity, more content, bigger/faster. Likewise, my attitude toward machine learning before 2023 is a smug 'heh, these computer scientists are doing undisciplined statistics at scale, how nice for them.' Then all of a sudden the machines woke up and started arguing with me, coherently, even about niche topics I have a PhD in. I can appreciate in retrospect how much of the machine learning progress ultimately went into that, but, like fusion, the magic payoff was supposed to be decades away and always remain decades away. This wasn't supposed to happen in my lifetime. 2025 progress isn't the 2023 shock, but this was the year LLM's-as-programmers (and LLM's-as-mathematicians, and...) went from 'isn't that cute, the machine is trying' to 'an expert with enough time would make better choices than the machine did,' and that makes for a different world. More so than, going from a Commodore Vic 20 with 4k of RAM and a modem to the latest Macbook.

odiroot

6d ago

2 replies

I'm very relieved we've moved away from rewriting everything in Rust.

jll29

6d ago

1 reply

There's no reason not to use Rust for LLM-generated code in the longer term (other than lack of Rust code to learn from in the shorter term).

The stricter typing of Rust would make sematic errors in generated code come out more quickly than in e.g. Python because using static typing the chances are that some of the semantic errors are also type violations.

yencabulator

4d ago

[delayed]

michaelcampbell

6d ago

Have we though? I'm glad we're not shouting about it from the rooftops like it's some magical "win" button as much, but TBH the things I use routinely that HAVE been rewritten in rust are generally much better. That could also just be because they're newer and have the errors of the past to not repeat.

sanreau

6d ago

4 replies

> Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi

...and the best of them all, OpenCode[1] :)

[1]: https://opencode.ai

simonwAuthor

6d ago

1 reply

Good call, I'll add that. I think I mentally scrambled it with OpenHands.

the_mitsuhiko

6d ago

Thanks for adding pi to it though :)

d4rkp4ttern

6d ago

2 replies

Can OpenCode be used with the Claude Max or ChatGPT Pro subscriptions, i.e., without per-token API charges?

simonwAuthor

6d ago

1 reply

Apparently it does work with Claude Max: https://opencode.ai/docs/providers/#anthropic

I don't see a similar option for ChatGPT Pro. Here's a closed issue: https://github.com/sst/opencode/issues/704

williamstein

6d ago

There's a plugin that evidently supports ChatGPT Pro with Opencode: https://github.com/sst/opencode/issues/1686#issuecomment-349...

ewoodrich

6d ago

Yes, I use it with a regular Claude Pro subscription. It also supports using GitHub Copilot subscriptions as a backend.

nineteen999

6d ago

How did I miss this until now! Thank you for sharing.

logicprog

6d ago

I don't know why you're downloaded, OpenCode is by far the best.

the_mitsuhiko

6d ago

4 replies

> The (only?) year of MCP

I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.

simonwAuthor

6d ago

1 reply

I think it will stick around, but I don't think it will have another year where it's the hot thing it was back in January through May.

Alex-Programs

6d ago

1 reply

I never quite got what was so "hot" about it. There seems to be an entire parallel ecosystem of corporates that are just begging to turn AI into PowerPoint slides so that they can mould it into a shape that's familiar.

9dev

6d ago

One reason may be that it makes it a lot easier to open up a product to AI. Instead of adding a bad ChatGPT UI clone into your app, you inverse control and let external AI tools interact with your application and its data, thus giving your customers immediate benefits, while simultaneously sating your investors/founders/managers desire to somehow add AI.

MitziMoto

6d ago

1 reply

MCP isn't going anywhere. Some developers can't seem to see past their terminal or dev environment when it comes to MCP. Skills, etc do not replace MCP and MCP is far more than just documentation searching.

MCP is a great way for an LLM to connect to an external system in a standardized way and immediately understand what tools it has available, when and how to use them, what their inputs and outputs are,etc.

For example, we built a custom MCP server for our CRM. Now our voice and chat agents that run on elevenlabs infrastructure can connect to our system with one endpoint, understand what actions it can take, and what information it needs to collect from the user to perform those actions.

I guess this could maybe be done with webhooks or an API spec with a well crafted prompt? Or if eleven labs provided an executable environment with tool calling? But at some point you're just reinventing a lot of the functionality you get for free from MCP, and all major LLMs seem to know how to use MCP already.

simonwAuthor

6d ago

1 reply

Yeah, I don't think I was particularly clear in that section.

I don't think MCP is going to go away, but I do think it's unlikely to ever achieve the level of excitement it had in early 2025 again.

If you're not building inside a code execution environment it's a very good option for plugging tools into LLMs, especially across different systems that support the same standard.

But code execution environments are so much more powerful and flexible!

I expect that once we come up with a robust, inexpensive way to run a little Bash environment - I'm still hoping WebAssembly gets us there - there will be much less reason to use MCP even outside of coding agent setups.

brabel

6d ago

1 reply

I disagree. MCP will remain the best way to do most things for the same reason REST APIs are the main way to access non local services: they provide a way to secure and audit access to systems in a way that a coding environment cannot. And you can authorize actions depending on the well defined inputs and outputs. You can’t do that using just a bash script unless said script actually does SSO and calls REST APIs but then you just have a worse MCP client without any interoperability.

the_mitsuhiko

6d ago

I find it very hard to pick winners and losers in this environment where everything changes so quickly. Right now a lot of people are using bash as a glue environment for agents, even if they are not for developers.

cloudking

6d ago

1 reply

For connecting agents to third-party systems I prefer CLI tools, less context bloat and faster. You can define the CLI usage in your agent instructions. If the MCP you're using doesn't exist as a CLI, build one with your agent.

martinald

5d ago

Totally agree - wrote this over the holidays which sums it all pretty well https://martinalderson.com/posts/why-im-building-my-own-clis...

nrhrjrjrjtntbt

6d ago

MCP or skills? Can a skill negate the need for MCP. In addition there was a YC startup who is looking at searching docs for LLMs or similar. I think MCP may be less needed once you have skills, openapi specs, and other things that LLMs can call directly.

npalli

6d ago

1 reply

Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?

simonwAuthor

6d ago

2 replies

Given how badly my 2025 predictions aged I'm probably going to sit that one out! https://simonwillison.net/2025/Jan/10/ai-predictions/

zahlman

6d ago

1 reply

Making predictions is useful even when they turn out very wrong. Consider also giving confidence levels, so that you can calibrate going forward.

jjude

6d ago

I use predictions to prepare rather than to plan.

Planing depends on deterministic view of the future. I used to plan (esp annual plans) until about 5 years. Now I scan for trends and prepare myself for different scenarios that can come in the future. Even if you get it approximately right, you stand apart.

For tech trends, I read Simon, Benedict Evans, Mary Meeker etc. Simon is in a better position make these predictions than anyone else having closely analyzed these trends over the last few years.

Here I wrote about my approach: https://www.jjude.com/shape-the-future/

DANmode

6d ago

Don’t be a bad sport, now!!

skydhash

6d ago

3 replies

Pretty much a whole year of nothing really. Just coming with a bunch of abstraction and ideas trying to solve an unsolvable problem.

MattRix

6d ago

2 replies

I’m not sure how to tell you how obvious it is you haven’t actually used these tools.

skydhash

6d ago

4 replies

Why do people assume negative critique is ignorance?

dmd

6d ago

5 replies

People denied that bicycles could possibly balance even as others happily pedaled by. This is the same thing.

measurablefunc

6d ago

3 replies

Bicycles don't balance, the human on the bicycle is the one doing the balancing.

dmd

6d ago

1 reply

Yes, that is the analogy I am making. People argued that bicycles (a tool for humans to use) could not possibly work - even as people were successfully using them.

measurablefunc

6d ago

People use drugs as well but I'm not sure I'd call that successful use of chemical compounds. There are many analogies one can apply here that would be equally valid.

duchef

6d ago

1 reply

Bicycles do balance at sufficient speed via a self steering and correction mechanism of the front axle..

measurablefunc

5d ago

So does a tire rolling down a hill.

moralestapia

6d ago

Yikes.

skydhash

6d ago

2 replies

Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.

Here is the changelog for OpenBSD 7.8:

https://www.openbsd.org/78.html

There's nothing here that says: We make it easier to use it more of it. It's about using it better and fixing underlying problems.

simonwAuthor

6d ago

2 replies

The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.

Mistakes and hallucinations matter a whole lot less if a reasoning LLM can try the code, see that it doesn't work and fix the problem.

skydhash

6d ago

2 replies

> The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.

Does it? It's all prompt manipulation. Shell script are powerful yes, but not really huge improvement over having a shell (REPL interface) to the system. And even then a lot of programs just use syscalls or wrapper libraries.

> can try the code, see that it doesn't work and fix the problem.

Can you really say that does happens reliably?

dham

6d ago

You're welcome to try the LLM's yourself and come up with your own conclusions. By what you've posted it doesn't look like you've tried the anything in the last 2 years. Yes LLM's can be annoying, but there has been progress.

simonwAuthor

6d ago

Depends on what you mean by "reliably".

If you mean 100% correct all of the time then no.

If you mean correct often enough that you can expect it to be a productive assistant that helps solve all sorts of problems faster than you could solve them without it, and which makes mistakes infrequently enough that you waste less time fixing them than you would doing everything by yourself then yes, it's plenty reliable enough now.

walt_grata

6d ago

If it actually does that without an argument. I can't believe I have to say that about a computer program

noodletheworld

6d ago

1 reply

I know it seems like forever ago, but claude code only came out in 2025.

Its very difficult to argue the point that claude code:

1) was a paradigm shift in terms of functionality, despite, to be fair, at best, incremental improvements in the underlying models.

2) The results are an order of magnitude, I estimate, better in terms of output.

I think its very fair to distill “AI progress 2025” to: you can get better results (up to a point; better than raw output anyway) without better models with clever tools and loops. (…and video/image slop infests everuthing :p).

bandrami

6d ago

2 replies

Did more software ship in 2025 than in 2024? I'm still looking for some actual indication of output here. I get that people feel more productive but the actual metrics don't seem to agree.

skydhash

6d ago

I'm still waiting for the Linux drivers to be written because of all the 20x improvements that AI hypers are touting. I would even settle for Apple M3 and M4 computers to be supported by Asahi.

noodletheworld

6d ago

I am not making any argument about productivity about using AI vs. not using AI.

My point is purely that, compared to 2024, the quality of the code produced by LLM inference agent systems is better.

To say that 2025 was a nothing burger is objectively incorrect.

Will it scale? Is it good enough to use professionally? Is this like self driving cars where the best they ever get is stuck with an odd shaped traffic cone? Is it actually more productive?

Who knows?

Im just saying… LLM coding in 2024 sucked. 2025 was a big year.

rhubarbtree

6d ago

1 reply

It’s possible this is correct.

It’s also possible that people more experienced, knowledgable and skilled than you can see fundamental flaws in using LLMs for software engineering that you cannot. I am not including myself in that category.

I’m personally honestly undecided. I’ve been coding for over 30 years and know something like 25 languages. I’ve taught programming to postgrad level, and built prototype AI systems that foreshadowed LLMs, I’ve written everything from embedded systems to enterprise, web, mainframes, real time, physics simulation and research software. I would consider myself an 7/10 or 8/10 coder.

A lot of folks I know are better coders. To put my experience into context: one guy in my year at uni wrote one of the world’s most famous crypto systems; another wrote large portions of some of the most successful games of the last few decades. So I’ve grown up surrounded by geniuses, basically, and whilst I’ve been lectured by true greats I’m humble enough to recognise I don’t bleed code like they do. I’m just a dabbler. But it irks me that a lot of folks using AI profess it’s the future but don’t really know anything about coding compared to these folks. Not to be a Luddite - they are the first people to adopt new languages and techniques, but they also are super sceptical about anything that smells remotely like bullshit.

One of the most wise insights in coding is the aphorism“beware the enthusiasm of the recently converted.” And I see that so much with AI. I’ve seen it with compilers, with IDEs, paradigms, and languages.

I’ve been experimenting a lot with AI, and I’ve found it fantastic for comprehending poor code written by others. I’ve also found it great for bouncing ideas. And the code it writes, beyond boiler plate, is hot garbage. It doesn’t properly reason, it can’t design architecture, it can’t write code that is comprehensible to other programmers, and treating it as a “black box to be manipulated by AI” just leads to dead ends that can’t be escaped, terrible decisions that will take huge amounts of expert coding time to undo, subtle bugs that AI can’t fix and are super hard to spot, and often you can’t understand their code enough to fix them, and security nightmares.

Testing is insufficient for good code. Humans write code in a way that is designed for general correctness. AI does not, at least not yet.

I do think these problems can be solved. I think we probably need automated reasoning systems, or else vastly improved LLMs that border on automated reasoning much like humans do. Could be a year. Could be a decade. But right now these tools don’t work well. Great for vibe coding, prototyping, analysis, review, bouncing ideas.

CamperBob2

5d ago

1 reply

But right now these tools don’t work well. Great for vibe coding, prototyping, analysis, review, bouncing ideas.

What are some of the models you've been working with?

rhubarbtree

2d ago

All the major models from Anthropic, OpenAI, Google. I’ve probably used Gemini the least.

blibble

6d ago

1 reply

people also said that selling jpegs of monkeys for millions of dollars was a pump and dump scam, and would collapse

they were right

sothatsit

6d ago

JPEGs with no value other than fake scarcity is very different to coding agents that people actively use to ship real code.

tehnub

6d ago

People did?

kakapo5672

6d ago

1 reply

Whenever someone tells me that AI is worthless, does nothing, scam/slop etc, I ask them about their own AI usage, and their general knowledge about what's going on.

Invariably they've never used AI, or at most very rarely. (If they used AI beyond that, this would be admission that it was useful at some level).

Therefore it's reasonable to assume that you are in that boat. Now that might not be true in your case, who knows, but it's definitely true on average.

snigsnog

6d ago

It's not worthless, it's just not worldchanging as is even in the fields where it's most useful, like programming. If the trajectory changes and we reach AGI then this changes too but right now it's just a way to

- fart out demos that you don't plan on maintaining, or want to use as a starting place

- generate first-draft unit tests/documentation

- generate boilerplate without too much functionality

- refactor in a very well covered codebase

It's very useful for all of the above! But it doesn't even replace a junior dev at my company in its current state. It's too agreeable, makes subtle mistakes that it can't permanently correct (GEMINI.md isn't a magic bullet, telling it to not do something does not guarantee that it won't do it again), and you as the developer submitting LLM-generated code for review need to review it closely before even putting it up (unless you feel like offloading this to your team) to the point that it's not that much faster than having written it yourself.

sothatsit

6d ago

You did not make a negative critique. You completely dismissed the value of coding agents. Your argument is completely devoid of any substance, and that is why people will assume you have not actually used these tools at all.

LewisVerstappen

6d ago

because your "negative critique" is just idiotic and wrong

dang

5d ago

Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes things worse.

https://news.ycombinator.com/newsguidelines.html

senordevnyc

6d ago

1 reply

This comment is legitimately hilarious to me. I thought it was satire at first. The list of what has happened in this field in the last twelve months is staggering to me, while you write it off as essentially nothing.

Different strokes, but I’m getting so much more done and mostly enjoying it. Can’t wait to see what 2026 holds!

ronsor

6d ago

People who dislike LLMs are generally insistent that they're useless for everything and have infinitely negative value, regardless of facts they're presented with.

Anyone that believes that they are completely useless is just as deluded as anyone that believes they're going to bring an AGI utopia next week.

n2d4

6d ago

This is extremely dismissive. Claude Code helps me make a majority of changes to our codebase now, particularly small ones, and is an insane efficiency boost. You may not have the same experience for one reason or another, but plenty of devs do, so "nothing happened" is absolutely wrong.

2024 was a lot of talk, a lot of "AI could hypothetically do this and that". 2025 was the year where it genuinely started to enter people's workflows. Not everything we've been told would happen has happened (I still make my own presentations and write my own emails) but coding agents certainly have!

431 more comments available on Hacker News

View full discussion on Hacker News

ID: 46449643Type: storyLast synced: 1/3/2026, 11:30:35 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN