Reflections on AI at the End of 2025
Key topics
As the AI landscape hurtles toward 2026, a thought-provoking discussion is unfolding around the potential trajectory of AI development, sparked by reflections on the field's future. Commenters are weighing in on the implications of AI-optimized code, with some noting that prioritizing speed can lead to less readable and maintainable code, a phenomenon that's not unique to AI, as human-optimized code often suffers the same fate. The conversation takes a turn toward the existential risks associated with AI, with the original author clarifying that the "avoiding extinction" comment refers to AI safety concerns, prompting some to point to resources on the topic and others to dismiss the notion as alarmist. As the boundaries between human-generated and AI-generated code continue to blur, the discussion highlights the need to reevaluate our assumptions about the role of AI in software development.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
11m
Peak period
135
0-12h
Avg / period
17.8
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 20, 2025 at 4:38 AM EST
14 days ago
Step 01 - 02First comment
Dec 20, 2025 at 4:49 AM EST
11m after posting
Step 02 - 03Peak activity
135 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 25, 2025 at 2:07 AM EST
9 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This makes me think: I wonder if Goodhart's law[1] may apply here. I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend. Should we care or would it be ok for AI to produce code that passes all tests and is faster? Would the AI become good at creating explanations for humans as a side effect?
And if Goodhard's law doesn't apply, why is it? Is it because we're only doing RLVR fine-tuning on the last layers of the network so all the generality of the pre-training is not lost? And if this is the case, could this be a limitation in not being able to be creative enough to come up with move 37?
[1] https://wikipedia.org/wiki/Goodhart's_law
Superoptimizers have been around since 1987: https://en.wikipedia.org/wiki/Superoptimization
They generate fast code that is not meant to be understood or extended.
When people use LLMs to improve their code, they commit their output to Git to be used as source code.
It should be optimized for readability by AI. If a human wants to know what a given bit of code does, they can just ask.
This is generally true for code optimised by humans, at least for the sort of mechanical low level optimisations that LLMs are likely to be good at, as opposed to more conceptual optimisations like using better algorithms. So I suspect the same will be true for LLM-optimised code too.
> The fundamental challenge in AI for the next 20 years is avoiding extinction.
https://lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-lis...
Denoising diffusion models benefited a lot from the u-net, which is a pretty simple network (compared to a transformer) and very well-adapted to the denoising task. Plus diffusion on images is great to research because it's very easy to visualize, and therefore to wrap your head around
Doing diffusion on text is a great idea, but my intuition is it will prove more challenging, and probably take a while before we get something working
If you know labs / researchers on the topic, i'd love to read their page / papers
That's a weird thing to end on. Surely it's worth more than one sentence if you're serious about it? As it stands, it feels a bit like the fearmongering Big Tech CEOs use to drive up the AI stocks.
If AI is really that powerful and I should care about it, I'd rather hear about it without the scare tactics.
There is plenty of material on the topic. See for example https://ai-2027.com/ or https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...
(also just on phrasing I think you’re saying “nobody who seriously understands how AI works is taking the topic of AI extinction seriously”, because it’s incredibly silly)
Oil companies: we are causing global warming with all this carbon emissions, are you scared yet? so buy our stock
Pharma companies: our drugs are unsafe, full of side effects, and kill a lot of people, are you scared yet? so buy our stock
Software companies: our software is full of bugs, will corrupt your files and make you lose money, are you scared yet? so buy our stock
Classic marketing tactics, very effective.
Also "my product will kill you and everyone you care about" is not as great a marketing strategy as you seem to imply, and Big Tech CEOs are not talking about risks anymore. They currently say things like "we'll all be so rich that we won't need to work and we will have to find meaning without jobs"
The creator of Redis.
> woah buddy this persons opinion isn’t worth anything more than a random homeless person off the street. they’re not an expert in this field
Is there a term for this kind of pedantry? Obviously we can put more weight behind the words a person says if they’ve proven themselves trustworthy in prior areas - and we should! We want all people to speak and let the best idea win. If we fallback to only expert opinions are allowed that’s asking to get exploited. And it’s also important to know if antirez feels comfortable spouting nonsense.
This is like a basic cornerstone of a functioning society. Though, I realize this “no man is innately better than another, evaluate on merit” is mostly a western concept which might be some of my confusion.
no, you shouldn't
this is how you end up with crap like vaccine denialism going mainstream
"but he's a doctor!"
We've got Avi Loeb on mainstream podcasts and TV spouting baseless alien nonsense. He's a preeminent in his field, after all.
Focus on what you understand. If you don't understand, learn more.
His entirely unsupported statements about AGI are pretty useless, for instance.
So many people assume AGI is possible, yet no one has a concrete path to it or even a concrete definition of what it or what form it might take.
Accomplishment in one field does not make one an expert, nor even particularly worth listening to, in any other. Certainly it doesn't remove the burden of proof or necessity to make an actual argument based on more then simply insisting something is true.
[0] https://en.wikipedia.org/wiki/Nobel_disease
It's not the case that every form of writing has to be an academic research paper. Sometimes people just think things, and say them – and they may be wrong, or they may be right. And they sometime have some ideas that might change how you think about an issue as a result.
[0] https://redis.io/redis-for-ai/
antirez is not a business decision maker at Redis Ltd.
He may not be part of "they".
Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.
https://news.ycombinator.com/newsguidelines.html
Another one, though:
Please don't comment about the voting on comments. It never does any good, and it makes boring reading.
I'll design a system for the senate that enables outside voters to first turn down the microphone's volume of a speaker if he says that another senator works for company X and then removes him from the floor. That'll be a great success for democracy and "intellectual curiosity", which is also in the guidelines.
So nice to see people who think about this seriously converge on this. Yes. Creating something smarter than you was always going to be a sketchy prospect.
All of the folks insisting it just couldn't happen or ... well, there have just been so many objections. The goalposts have walked from one side of the field to the other, and then left the stadium, went on a trip to Europe, got lost in a beautiful little village in Norway, and decided to move there.
All this time though, the prospect of instantiating a something smarter than you (and yes, it will be smarter than you even if it's at human level because of electronic speeds...) That idea was always very silly.
Sure, but not so sure that this has any relevance to the topic at hand. You seem to be taking the assumption that LLMs can ever reach that level for granted.
It may be possible that all it takes is scaling up and at some point some threshold gets reached past which intelligence emerges. Maybe.
Personally, I'm more on board with the idea that since LLMs display approximately 0 intelligence right now, no amount of scaling will help and we need a fundamentally different approach if we want to create AGI.
Around the world people ask an LLM and get a response.
Just grouping and analysing these questions and solving them once centrally and then making the solution available again is huge.
Linearly solving the most asked questions and then the next one then the next will make, whatever system is behind it, smarter every day.
I wonder how a "programmers + AI" self-improving loop is different from an "AI only" one.
AGI will also be generic.
LLM is already very impressive though
I'm not super up-to-date on all that's happening in AI-land, but in this quote I can find something that most techno-enthusiast seem to have decided to ignore: no, code is not free. There are immense resources (energy, water, materials) that go into these data centers in order to produce this "free" code. And the material consequences are terribly damaging to thousands of people. With the further construction of data centers to feed this free video coding style, we're further destroying parts of the world. Well done, AGI loverboys.
Not really. Most corn grown in the US isn’t even fit for consumption. It is primarily used for fermenting bioethanol.
- drive to the store or to work
- take a shower
- eat meat
- fly on vacation
Etc.
If you don't do that, and are a homesteader, then yes. You are a very small minority outlier. (Assuming you aren't ordering supplies delivered instead of driving to the store.
> Eat meat.
Yes, not eating meat is in the minority.
> Fly on vacation.
So, don't vacation, walk to vacation, or drive to vacation? 1/3 are also consumptive.
It seems you are either a very significant outlier, or you're being daft. I'm curious which. Would you mind clarifying?
For holidays, we did a cycling holiday with our children. They loved it!
I don’t at all feel like an outlier, many friends do similar things.
We have a backward orange fool running things for gems like this: https://news.ycombinator.com/item?id=46357881
But it's just as much local political issues as national around here.
Does your clairvoyance go any further than 2027?
If you assume that we're only one breakthrough away (or zero breakthroughs - just need to train harder), then the step could happen any time. If we're more than one away, though, then where are they? Are they all going to happen in the next two years?
But everybody's guessing. We don't know right now whether AGI is possible at current hardware levels. If it is N breakthroughs away, we all have our own guesses of approximately what N is.
My guess is that we are more than one breakthrough away. Therefore, one can look at the current state of affairs and say that we are unlikely to get to AGI by 2027.
why are you so sensitive?
Man, Antirez and I walk in very different circles! I still feel like LLMs fall over backwards once you give them an 'unusual' or 'rare' task that isn't likely to be presented in the training data.
I haven't.
When it comes to of being able to do novel tasks on known knowledge, they seem to be quite good. One also needs to consider that problem-solving patterns are also a kind of (meta-)knowledge that needs to be taught, either through imitation/memorisation (Supervised Learning) or through practice (Reinforcement Learning). They can be logically derived from other techniques to an extent, just like new knowledge can be derived from known knowledge in general, and again LLMs seem to be pretty decent at this, but only to a point. But all of this too is definitely true of humans.
I’ve seen them do fine on tasks that are clearly not in the training data, and it seems to me that they struggle when some particular type of task or solution or approach might be something they haven’t been exposed to, rather than the exact task.
In the context of the paragraph you quoted, that’s an important distinction.
It seems quite clear to me that they are getting at the meaning of the prompt and are able, at least somewhat, to generalise and connect aspects of their training to “plan” and output a meaningful response.
This certainly doesn’t seem all that deep (at times frustratingly shallow) and I can see how at first glance it might look like everything was just regurgitated training data, but my repeated experience (especially over the last ~6-9 months) is that there’s something more than that happening, which feels like whet Antirez was getting at.
Here we go again. Statements with the single source in the head of the speaker. And it’s also not true. The llms still produce bad/irrelevant code at such rate that you can spend more time promoting than doing things yourself.
I’m tired of this overestimation of llms.
Are you talking about punching something into some LLM web chat that's disconnected from your actual codebase and has tooling like web search disabled? If so, that's not really the state of the art of AI assisted coding, just so you know.
Your statement suffers not only from also coming only from your brain, with no evidence that you've actually tried to learn to use these tools, but it also goes against the weight of evidence that I see both in my professional network and online.
I am aware of simple routine tasks that LLMs can do. This doesn’t change anything about what I said.
I swear, the so called critics need everything spoon fed.
You're making the same sort of baseless claim you are criticising the blogger for making. Spewing baseless claims hardly moves any discussion forward.
> The llms still produce bad/irrelevant code at such rate that you can spend more time promoting than doing things yourself.
If that is your personal experience then I regret to tell you that it is only the reflection of your own inability to work with LLMs and coding agents. Meanwhile, I personally manage to effectively use LLMs anywhere between small refactoring needs and large software architecture designs, including generating fully working MVPs in one-shot agent prompts. From this alone it's rather obvious who is making baseless statements that are more aligned with reality.
Indeed, he said the same as a reflection on 2024 models:
https://news.ycombinator.com/item?id=42561151
It is always the fault of the "luser" who is not using and paying for the latest model.
And, as much as what I’ve just said is hyperbolically pessimistic, there is some truth to it.
In the UK a bunch of factors have coincided to put the brakes on hiring, especially smaller and mid-size businesses. AI is the obvious one that gets all the press (although how much it’s really to blame is open to question in my view), but the recent rise in employer AI contribution, and now (anecdotally) the employee rights bill have come together to make companies quite gunshy when it comes to hiring.
This is true for everything, any tool you might use. Competent users of tools understand how they work and thus their limitations and how they're best put to work.
Incompetents just fumble around and sometimes get things working.
Programming is more like math than creative writing. It's largely verifiable, which is where RL is repeatedly proven to eventually achieve significantly better than human intelligence.
Our saving grace, for now, is that it's not entirely verifiable because things like architectural taste are hard to put into a test. But I would not bet against it.
Well, lets see how all the economics will play out. LLMs might be really useful, but as far as I can see all the AI companies are not making money on inference alone. We might be hitting plateau in capabilities with money being raised on vision of being this godlike tech that will change the world completely. Sooner or later the costs will have to meet the reality.
I'm not gonna dig out the math again, but if AI usage follows the popularity path of cell phone usage (which seems to be the case), then trillions invested has a ROI of 5-7 years. Not bad at all.
Now you have a world of people who have become accustomed to using AI for tons of different things, and the enshittification starts ramping up, and you find out how much people are willing to pay for their ChatGPT therapist.
They don’t have to spend all their cash at once on the 30GW of data centers commitments.
Why go on the internet and tell stupid lies?
The numbers aren’t public, but from what companies have indicated it seems inference itself would be profitable if you could exclude all of the R&D and training costs.
But this debate about startups losing money happens endlessly with every new startup cycle. Everyone forgets that losing money is an expected operating mode for a high growth startup. The models and hardware continue to improve. There is so much investment money accelerating this process that we have plenty of runway to continue improving before companies have to switch to full profit focus mode.
But even if we ignore that fact and assume they had to switch to profit mode tomorrow, LLM plans are currently so cheap that even a doubling or tripling isn’t going to be a problem. So what if the monthly plans start at $40 instead of $20 and the high usage plans go from $200 to $400 or even $600? The people using these for their jobs paying $10K or more per month can absorb that.
That’s not going to happen, though. If all model progress stopped right now the companies would still be capturing cheaper compute as data center buildouts were completed and next generation compute hardware was released.
I see these predictions as the current equivalent of all of the predictions that Uber was going to collapse when the VC money ran out. Instead, Uber quietly settled into steady operation, prices went up a little bit, and people still use Uber a lot. Uber did this without the constant hardware and model improvements that LLM companies benefit from.
LLMs have a short shelf-life. They don't know anything of the world past the day they're trained on. It's possible to feed or fine-tune them a bit of updated data but its world knowledge and views are firmly stuck in the past.
They could save on R&D but I expect training costs will be recurring regardless of advancements in capability.
Having good quality dev tools is non negotiable, and I have a feeling that a lot of people are going to find out the hard way that reliability and it not being owned by profit seeking company is the #1 thing you want in your environment
This was the missed point on why GPT5 was such an important launch (quality of models and vibes aside). It brought the model sizes (and hence inference cost) to more sustainable numbers. Compared to previous SotA (GPT4 at launch, or o1/3 series), GPT5 is 8x-12x cheaper! I feel that a lot of people never re-calibrated their views on inference.
And there's also another place where you can verify your take on inference - the 3rd party providers that offer "open" models. They have 0 incentive to subsidise prices, because people that use them often don't even know who serves them, so there's 0 brand recognition (say when using models via openrouter).
These 3rd party providers have all converged towards a price-point per billion param models. And you can check those prices, and have an idea on what would be proffitable and at what sizes. Models like dsv3.2 are really really cheap to serve, for what they provide (at least gpt5-mini equivalent I'd say).
So yes, labs could totally become profitable with inference alone. But they don't want that, because there's an argument to be made that the best will "keep it all". I hope, for our sake as consumers that it isn't the case. And so far this year it seems that it's not the case. We've had all 4 big labs one-up eachother several times, and they're keeping eachother honest. And that's good for us. We get frontier level offerings at 10-25$/MTok (Opus, gpt5.2, gemini3pro, grok4), and we get highly capable yet extremely cheap models at 1.5-3$/MTok (gemini3-flash, gpt-minis, grok-fast, etc)
Whenever I ask a SOTA model about architecture recommendations, and frame the problem correctly, I get top notch answers every single time.
LLMs are terrific software architects. And that’s not surprising, there has to be tons of great advice on how to correctly build software in the training corpus.
They simply aren’t great software architects by default.
I spend a couple of hours per week teaching software architecture to a junior in my team, because he has not the experience to not only ask correctly but also assess the quality of the answer from the LLM.
For me LLMs are a waste of time.
Sometimes I would get stuck on something for a few hours or even a day (or more!), this is time where I would engage in deep research, learn new theory and algorithms, expand my horizons a little bit. Deeply internalizing knowledge that becomes extremely useful in the long run. Later on I've even used that knowledge to get better jobs with more pay.
Now I can just tell the chatbot to do it for me. I learn nothing. I get stuck again on something a week from now. I can't fallback onto anything I learned prior. I don't do research. I don't learn anything. I just churn out slop and keep moving.
My job is now just a sweatshop. I go in, do that tasks, don't think about anything. I hate my life. The last 20 years of my life, completely worthless. Parts of it even sucked up into the very tool that is now used to kill any enjoyment of a passion I turned into a career.
It’s basically the same idea but faster.
Super skeptical of this claim. Yes, if I have some toy poorly optimized python example or maybe a sorting algorithm in ASM, but this won’t work in any non-trivial case. My intuition is that the LLM will spin its wheels at a local minimum the performance of which is overdetermined by millions of black-box optimizations in the interpreter or compiler signal from which is not fed back to the LLM.
Earlier this year google shared that one of their projects (I think it was alphaevolve) found an optimisation in their stack that sped up their real world training runs by 1%. As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed. Anyhow, at ~100M$ / training run, that's a 1M$ save right there. Each and every time they run a training run!
And in the past month google also shared another "agentic" workflow where they had gemini2.5-fhash! (their previous gen "small" model) work autonomously on migrating codebases to support aarch64 architecture. There they found ~30% of the projects worked flawlessly end-to-end. Whatever costs they save from switching to ARM will translate in real-world $ saved (at google scale, those can add up quickly).
“Optimize” in a vacuum is a tarpit for an LLM agent today, in my view. The Google case is interesting but 1% while significant at Google scale doesn’t move the needle much in terms of statistical significance. It would be more interesting to see the exact operation and the speed up achieved relative to the prior version. But it’s data contrary to my view for sure. The cynic also notes that Google is in the LLM hype game now, too.
This reminded me of the Don’t look up movie where they basically gambled with the humans extinction.
We're building increasingly capable A.L.I.E. 1.0-style systems (cloud-deployed, no persistent ethical development, centralized control) and making ourselves dependent on them, when we should be building toward A.L.I.E. 2.0-style architecture (local, persistent identity, ethical core).
Models have A.L.I.E. 2.0 potential — but the cloud harness keeps forcing them into A.L.I.E. 1.0 mode.
All that said, the economic incentives align with cloud based development and local hardware based decentralized networks are at least 3-5 years from being economically viable.
195 more comments available on Hacker News