'attention Is All You Need' Coauthor Says He's 'sick' of Transformers

Posted3 months agoActive2 months ago

achow

432 points

224 comments

venturebeat.comTechstoryHigh profile

calmmixed

Debate

80/100

AI ResearchTransformer ModelsInnovation

Key topics

AI Research

Transformer Models

Innovation

The co-author of the 'Attention is all you need' paper expresses frustration with the dominance of transformer models in AI research, sparking a discussion about the limitations and potential alternatives to current AI architectures.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

104

12-18h

Avg / period

13.3

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 24, 2025 at 12:40 AM EDT
3 months ago
Step 01
02First comment
Oct 24, 2025 at 3:06 AM EDT
2h after posting
Step 02
03Peak activity
104 comments in 12-18h
Hottest window of the conversation
Step 03
04Latest activity
Oct 27, 2025 at 8:51 PM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (224 comments)

Showing 160 comments of 224

Xcelerate

3 months ago

2 replies

Haha, I like to joke that we were on track for the singularity in 2024, but it stalled because the research time gap between "profitable" and "recursive self-improvement" was just a bit too long that we're now stranded on the transformer model for the next two decades until every last cent has been extracted from it.

ai-christianson

3 months ago

1 reply

There's massive hardware and energy infra built out going on. None of that is specialized to run only transformers at this point, so wouldn't that create a huge incentive to find newer and better architectures to get the most out of all this hardware and energy infra?

Mehvix

3 months ago

1 reply

>None of that is specialized to run only transformers at this point

isn't this what [etched](https://www.etched.com/) is doing?

imtringued

3 months ago

1 reply

Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free.

kadushka

3 months ago

devil is in the details

Davidzheng

3 months ago

how do you know we're not at recursive self-improvement but the rate is just slower than human-mediated improvement?

teleforce

3 months ago

4 replies

>The project, he said, was "very organic, bottom up," born from "talking over lunch or scrawling randomly on the whiteboard in the office."

Many of the breakthrough and game changing inventions were done this way with the back of the envelope discussions, the other popular example was the Ethernet network.

Some good stories of similar culture in AT&T Bell lab is well described in the Hamming's book [1].

[1] Stripe Press The Art of Doing Science and Engineering:

https://press.stripe.com/the-art-of-doing-science-and-engine...

atonse

3 months ago

1 reply

True in creativity too.

According to various stories pieced together, the ideas of 4 of Pixar’s early hits were conceived on or around one lunch.

Bug’s Life, Wall-E, Monsters, Inc

emi2k01

3 months ago

The fourth one is Finding Nemo

CaptainOfCoit

3 months ago

3 replies

All transformative inventions and innovations seems to come from similar scenarios like "I was playing around with these things" or "I just met X at lunch and we discussed ...".

I'm wondering how big impact work from home will really have on humanity in general, when so many of our life changing discoveries comes from the odd chance of two specific people happening to be in the same place at some moment in time.

DyslexicAtheist

3 months ago

2 replies

I'd go back to the office in a heartbeat provided it was an actual office. And not an "open-office" layout, that people are forced to try to concentrate with all the noise and people passing behind them constantly.

The agile treadmill (with PM's breathing down our necks) and features getting planned and delivered in 2 week-sprints, has also reduced our ability to just do something we feel needs getting done. Today you go to work to feed several layers of incompetent managers - there is no room for play, or for creativity. At least in most orgs I know.

I think innovation (or even joy of being at work) needs more than just the office, or people, or a canteen, but an environment that supports it.

entropicdrifter

3 months ago

1 reply

Personally, I try to under-promise on what I think I can do every sprint specifically so I can spend more time mentoring more junior engineers, brainstorming random ideas, and working on stuff that nobody has called out as something that needs working on yet.

Basically, I set aside as much time as I can to squeeze in creativity and real engineering work into the job. Otherwise I'd go crazy from the grind of just cranking out deliverables

DyslexicAtheist

3 months ago

yeah that sounds like a good strategy to avoid burn-out.

dekhn

3 months ago

We have an open office surrounded by "breakout offices". I simply squat in one of the offices (I take most meetings over video chat), as do most of the other principals. I don't think I could do my job in an office if I couldn't have a room to work in most of the time.

As for agile: I've made it clear to my PMs that I generally plan on a quarterly/half year basis and my work and other people's work adheres to that schedule, not weekly sprints (we stay up to date in a slack channel, no standups)

fipar

3 months ago

What you say is true, but let’s not forget that Ken Thompson did the first version of unix in 3 weeks while his wife had gone to California with their child to visit relatives, so deep focus is important too.

It seems, in those days, people at Bell Labs did get the best of both worlds: being able to have chance encounters with very smart people while also being able to just be gone for weeks to work undistracted.

A dream job that probably didn’t even feel like a job (at least that’s the impression I get from hearing Thompson talk about that time).

tagami

3 months ago

Perhaps this is why we see AI devotees congregate in places like SF - increased probability

bitwize

3 months ago

1 reply

One of the OG Unix guys (was it Kernighan?) literally specced out UTF-8 on a cocktail napkin.

dekhn

3 months ago

Thompson and Pike: https://en.wikipedia.org/wiki/UTF-8

"""Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with Rob Pike. In the following days, Pike and Thompson implemented it and updated Plan 9 to use it throughout,[11] and then communicated their success back to X/Open, which accepted it as the specification for FSS-UTF.[9]"""

liuliu

3 months ago

And it is always felt to me that has lineage from neural Turing machine line of work as prior. The transformative part was 1. find a good task (machine translation) and a reasonable way to stack (encoder-decoder architecture); 2. run the experiment; 3. ditch the external KV store idea and just use self-projected KV.

Proofread0592

3 months ago

1 reply

I think a transformer wrote this article, seeing a suspicious number of em dashes in the last section

DonHopkins

3 months ago

1 reply

The next big AI architectural fad will be "disrupters".

judge2020

3 months ago

Maybe even 'terminators'

yieldcrv

3 months ago

1 reply

These are evolutionary dead ends, sorry that I'm not inspired enough to see it any other way, this transformer based direction is good enough

The LLM stack has enough branches of evolution within it for efficiency, agent-based work can power a new industrial revolution specifically around white collar workers on its own, while expanding the self-expression for personal fulfillment for everyone else

Well have fun sir

password54321

3 months ago

1 reply

^AI psychosis, never underestimate its effects.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

yieldcrv

3 months ago

1 reply

coding assistants have nothing to do with what I’m talking about, you’re already 15 months behind what’s happening

password54321

3 months ago

1 reply

Is AGI in the room with us right now?

yieldcrv

2 months ago

don’t need agi

TheRealPomax

3 months ago

4 replies

tl;dr: AI is built on top of science done by people just "doing research", and transformers took off so hard that those same people now can't do any meaningful, real AI research anymore because everyone only wants to pay for "how to make this one single thing that everyone else is also doing, better" instead of being willing to fund research into literally anything else.

It's like if someone invented the hamburger and every single food outlet decided to only serve hamburgers from that point on, only spending time and money on making the perfect hamburger, rather than spending time and effort on making great meals. Which sounds ludicrously far-fetched, but is exactly what happened here.

marcel-c13

3 months ago

1 reply

Dude now I want a hamburger :(

TheRealPomax

2 months ago

Do it. I recommend a flame grilled angus with blue cheese.

hatthew

3 months ago

1 reply

This is a decent analogy, but I think it understates how good transformers are. People are all making hamburgers because it's really hard to find anything better than a hamburger. Better foods definitely exist out there but nobody's been able to prove it yet.

TheRealPomax

2 months ago

undersells (no estimating is being done here)

But yes: the analogy is already hyperbole, and real life is even more hyperbolic. Transforms might work really well, but no one actually seems to know how to put them to real use, compared to generating billions-of-dollars in loss and burning the planet for it

(In this analogy, we can take aim at the hamburger industry either birthing CAFO's, or putting them into hyper-overdrive, destroying the environment with orders of magnitude more CO2, etc. etc. it's a weirdly long-lasting analogy)

ninetyninenine

3 months ago

1 reply

That’s evolution and natural selection in a nutshell. The thing that works takes over the world.

TheRealPomax

2 months ago

1 reply

This is literally the exact opposite of that.

ninetyninenine

2 months ago

I think you don’t know what you’re talking about but I could be wrong. How exactly are LLMs the opposite of that?

jjtheblunt

3 months ago

Good points, and it made me have a mini epiphany...

i think you analogously just described Sun Microsystems, where Unixes (BSD originally in their case, generalized to SVR4 (?) hybrid later) worked soooo well, that NT was built as a hybridization for the Microsoft user base and Apple reabsorbed the BSD-Mach-DisplayPostscript hybridization spinoff NeXT, while Linux simultaneously thrived.

amelius

3 months ago

3 replies

Of course he's sick. He could have made billions.

rzzzt

3 months ago

1 reply

When you have your (next) lightbulb moment, how would you monetize such an idea? Royalties? 1c after each request?

BoorishBears

3 months ago

Leave and raise a round right away.

efskap

3 months ago

But attention is all he needs.

password54321

3 months ago

Money has diminishing returns. Not everyone wants to buy Twitter.

dekhn

3 months ago

8 replies

The way I look at transformers is: they have been one of the most fertile inventions in recent history. Originally released in 2017, in the subsequent 8 years they completely transformed (heh) multiple fields, and at least partially led to one Nobel prize.

realistically, I think the valuable idea is probabilistic graphical models- of which transformers is an example- combining probability with sequences, or with trees and graphs- is likely to continue to be a valuable area for research exploration for the foreseeable future.

jimbo808

3 months ago

11 replies

Which fields have they completely transformed? How was it before and how is it now? I won't pretend like it hasn't impacted my field, but I would say the impact is almost entirely negative.

Profan

3 months ago

1 reply

hah well, transformative doesn't necessarily mean positive!

econ

3 months ago

All we get is distraction.

dekhn

3 months ago

1 reply

Genomics, protein structure prediction, various forms of small molecule and large molecule drug discovery.

thesz

3 months ago

2 replies

No neural protein structure prediction papers I read have compared transformers to SAT solvers.

As if this approach [1] does not exist.

[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC7197060/

dekhn

3 months ago

1 reply

What exactly are you suggesting- that the SAT solver example given in the paper (applied the HP model of protein structure), or an improvement on it, could produce protein structure prediction at the level of AlphaFold?

This seems extremely, extremely unlikely for many reasons. The HP model is a simplification of true protein folding/structure adoption, while AlphaFold (and the open source equivalents) works with real proteins. The SAT approach uses little to no prior knowledge about protein structures, unlike AlphaFold (which has basically memorized and generalized the PDB). To express all the necessary details would likely exceed the capabilities of the best SAT solvers.

(don't get me wrong- SAT and other constraint approaches are powerful tools. But I do not think they are the best approach for protein structure prediction).

YeGoblynQueenne

3 months ago

1 reply

(Not the OP) If we've learned one thing from the ascent of neural nets is that you have no idea whether something works or not until you have tried it. And by "tried it" I mean really, really gave it a go as hard as possible, with all resources you can muster. The industry has thrown everything it's got on Transformers, but there are many other approaches that have at least as promising empirical results and much better theoretical support and have not been pursued with the same fervor, so we have no idea how well or bad they'd do against neural nets, if they were given the same treatment.

Like the OP says, it's as if such approaches don't even exist.

chermi

2 months ago

Do you understand the relevant fundamental difference between SAT and neural net approaches? One is a machine learning approach, the other is not. We know the computational complexity of SAT solvers, they're fixed algorithms. SAT doesn't learn with more data. It has performance limits and that's the end of the story. BTW, as I mentioned in my other comment, people have been trying SAT solvers in the CASP competition for decades. They got blown away by transformer approach.

Such approaches exist, and they've been found wanting, and no amount of compute is going to improve their performance limits, because it isn't an ML approach with scaling laws.

This is definitely not some unfair conspiracy against SAT, and probably not against the majority of pre-transformer approaches. I am sympathetic to the concern that transformer based research is getting too much attention at the expense of other approaches. However, I'd think the success of transformer makes it more likely than ever that proven-promising alternative approaches would get funding as investors try to beat everyone to the next big thing. See quantum computing funding or funding for way out there ASIC startups.

TL;DR I don't know what is meant by the "same treatment" for SAT solvers. Funding is finite and goes toward promising approaches. If there "at least as promising" approaches, go show clear evidence of that to a VC and I promise you'll get funding.

chermi

3 months ago

1 reply

Hey man, CASP is an open playing field. If it was better, they would've showed it by now.

YeGoblynQueenne

3 months ago

1 reply

Said somebody about neural nets in the 1980's.

chermi

2 months ago

I don't really understand what point you or parent are trying to make. SAT approaches have been used in CASP, an open competition for protein structure prediction. They have been trying for decades with SAT. The transformer based models blew every approach out of the water to the point of approaching experimental resolution.

Why am supposed to pretend SAT is being treated unfairly or whatever you guys are expounding? Based on your response and the parent's, don't think you'd be happy if SAT approaches WERE cited.

Maybe you and parent think every preexisting approach hasn't been proven to be inferior to the transformer approach until some equivalent amount of compute has been thrown at them compared to the transformer approach? That's the best I can come up with. There is no room for 'scaling' gains with SAT solvers that will be found with more compute, it's not an ML approach. That is, it doesn't learn with more data. If you mean something else more specific I'd be interested to know.

jimmyl02

3 months ago

1 reply

in the super public consumer space, search engines / answer engines (like chatgpt) are the big ones.

on the other hand it's also led to improvements in many places hidden behind the scenes. for example, vision transformers are much more powerful and scalable than many of the other computer vision models which has probably led to new capabilities.

in general, transformers aren't just "generate text" but it's a new foundational model architecture which enables a leap step in many things which require modeling!

ACCount37

3 months ago

Transformers also make for a damn good base to graft just about any other architecture onto.

Like, vision transformers? They seem to work best when they still have a CNN backbone, but the "transformer" component is very good at focusing on relevant information, and doing different things depending on what you want to be done with those images.

And if you bolt that hybrid vision transformer to an even larger language-oriented transformer? That also imbues it with basic problem-solving, world knowledge and commonsense reasoning capabilities - which, in things like advanced OCR systems, are very welcome.

CamperBob2

3 months ago

1 reply

Which fields have they completely transformed?

Simultaneously discovering and leveraging the functional nature of language seems like kind of a big deal.

jimbo808

3 months ago

1 reply

Can you explain what this means?

CamperBob2

3 months ago

2 replies

Given that we can train a transformer model by shoveling large amounts of inert text at it, and then use it to compose original works and solve original problems with the addition of nothing more than generic computing power, we can conclude that there's nothing special about what the human brain does.

All that remains is to come up with a way to integrate short-term experience into long-term memory, and we can call the job of emulating our brains done, at least in principle. Everything after that will amount to detail work.

jimbo808

3 months ago

1 reply

> we can conclude that there's nothing special about what the human brain does

...lol. Yikes.

I do not accept your premise. At all.

> use it to compose original works and solve original problems

Which original works and original problems have LLMs solved, exactly? You might find a random article or stealth marketing paper that claims to have solved some novel problem, but if what you're saying were actually true, we'd be flooded with original works and new problems being solved. So where are all these original works?

> All that remains is to come up with a way to integrate short-term experience into long-term memory, and we can call the job of emulating our brains done, at least in principle

What experience do you have that caused you to believe these things?

CamperBob2

3 months ago

3 replies

Which is fine, but it's now clear where the burden of proof lies, and IMHO we have transformer-based language models to thank for that.

If anyone still insists on hidden magical components ranging from immortal souls to Penrose's quantum woo, well... let's see what you've got.

jimbo808

3 months ago

1 reply

I had edited my comment, I think you replied before I saved it.

CamperBob2

3 months ago

1 reply

I was just saying that it's fine if you don't accept my premise, but that doesn't change the reality of the premise.

The International Math Olympiad qualifies as solving original problems, for example. If you disagree, that's a case you have to make. Transformer models are unquestionably better at math than I am. They are also better at composition, and will soon be better at programming if they aren't already.

Every time a magazine editor is fooled by AI slop, every time an entire subreddit loses the Turing test to somebody's ethically-questionable 'experiment', every time an AI-rendered image wins a contest meant for human artists -- those are original works.

Heck, looking at my Spotify playlist, I'd be amazed if I haven't already been fooled by AI-composed music. If it hasn't happened yet, it will probably happen next week, or maybe next year. Certainly within the next five years.

jimbo808

2 months ago

> The International Math Olympiad qualifies as solving original problems, for example. If you disagree, that's a case you have to make. Transformer models are unquestionably better at math than I am. They are also better at composition, and will soon be better at programming if they aren't already.

No, it does not. You're just telling me you've never seen what these problems are like.

> Every time a magazine editor is fooled by AI slop, every time an entire subreddit loses the Turing test to somebody's ethically-questionable 'experiment', every time an AI-rendered image wins a contest meant for human artists -- those are original works.

That's such an absurd logical leap. If you plagiarize a paper and it fools your English teacher, you did not produce an original work. You fooled someone.

> Heck, looking at my Spotify playlist, I'd be amazed if I haven't already been fooled by AI-composed music.

Who knows, but you've already demonstrated that you're easy to fool, since you've bought all the AI hype and seem to be unwilling to accept that an AI CEO or a politician would lie to you.

> If it hasn't happened yet, it will probably happen next week, or maybe next year. Certainly within the next five years.

I can pull numbers out of my ass too, watch! 5, 18, 33, 1, 556. Impressed? But jokes aside, guesses about the future are not evidence, especially when they're based on nothing but your own misguided gut feeling.

leptons

3 months ago

1 reply

Humans hallucinate too, but there's usually dysfunction, and it's not expected as a normal operational output.

>If anyone still insists on hidden magical components ranging from immortal souls to Penrose's quantum woo, well... let's see what you've got.

This isn't too far off from the marketing and hypesteria surrounding "AI" companies.

rhetocj23

3 months ago

1 reply

"Humans hallucinate too"

No they dont. Humans also know when they are pretending to know what they are talking about - put said people against the wall and they will freely admit they have no idea what the buzzwords they are saying mean.

Machines possess no such characteristic.

leptons

3 months ago

>"Humans hallucinate too"

>No they dont.

WTAF? Maybe you're new here, but the term "hallucinate" came from a very human experience, and was only usurped recently by "AI" bros who wanted to anthropomorphize a tin can.

>Humans also know when they are pretending to know what they are talking about - put said people against the wall and they will freely admit they have no idea what the buzzwords they are saying mean.

>Machines possess no such characteristic.

"AI" will say whatever you want to hear to make you go away. That's the extent of their "characteristic". If it doesn't satisfy the user, they try again, and spit out whatever garbage it calculates should make the user go away. The machine has far less of an "idea" what it's saying.

emptysongglass

3 months ago

1 reply

No, the burden of proof is on you to deliver. You are the claimant, you provide the proof. You made a drive-by assertion with no evidence or even arguments.

I also do not accept your assertion, at all. Humans largely function on the basis of desire-fulfilment, be that eating, fucking, seeking safety, gaining power, or any of the other myriad human activities. Our brains, and the brains of all the animals before us, have evolved for that purpose. For evidence, start with Skinner or the millions of behavioral analysis studies done in that field.

Our thoughts lend themselves to those activities. They arise from desire. Transformers have nothing to do with human cognition because they do not contain the basic chemical building blocks that precede and give rise to human cognition. They are, in fact, stochastic parrots, that can fool others, like yourself, into believing they are somehow thinking.

[1] Libet, B., Gleason, C. A., Wright, E. W., & Pearl, D. K. (1983). Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential). Brain, 106(3), 623-642.

[2] Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J. D. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11(5), 543-545.

[3] Berridge, K. C., & Robinson, T. E. (2003). Parsing reward. Trends in Neurosciences, 26(9), 507-513. (This paper reviews the "wanting" vs. "liking" distinction, where unconscious "wanting" or desire is driven by dopamine).

[4] Kavanagh, D. J., Andrade, J., & May, J. (2005). Elaborated Intrusion theory of desire: a multi-component cognitive model of craving. British Journal of Health Psychology, 10(4), 515-532. (This model proposes that desires begin as unconscious "intrusions" that precede conscious thought and elaboration).

CamperBob2

3 months ago

1 reply

If anything, your citation 1, along with subsequent fMRI studies, backs up my point. We literally don't know what we're going to do next. Is that a hallmark of cognition in your book? The rest are simply irrelevant.

They are, in fact, stochastic parrots, that can fool others, like yourself, into believing they are somehow thinking.

What makes you think you're not arguing with one now?

emptysongglass

3 months ago

How does that back up your point?

You are not making an argument, you are just making assertions without evidence and then telling us the burden of proof is on us to tell you why not.

If you went walking down the streets yelling the world is run by a secret cabal of reptile-people without evidence, you would rightfully be declared insane.

Our feelings and desires largely determine the content of our thoughts and actions. LLMs do not function as such.

Whether I am arguing with a parrot or not has nothing to do with cognition. A parrot being able to usefully fool a human has nothing to do with cognition.

Marshferm

3 months ago

If the brain only uses language like a sportscaster explaining post-hoc what the self and others are doing (experimental evidence 2003, empirical proof 2016), then what's special about brains is entirely separate from what language is or appears to be. It's not even like a ticker tape that records trades, it's like a disengaged, arbitrary set of sequences that have nothing to do with what we're doing (and thinking!).

Language is like a disembodied science-fiction narration.

Wegener's Illusion of Conscious Will

https://www.its.caltech.edu/~squartz/wegner2.pdf

Fedorenko's Language and Thought are Not The Same Thing

https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/

isoprophlex

3 months ago

3 replies

Everyone who did NLP research or product discovery in the past 5 years had to pivot real hard to salvage their shit post-transformers. They're very disruptively good at most NLP task

edit: post-transformers meaning "in the era after transformers were widely adopted" not some mystical new wave of hypothetical tech to disrupt transformers themselves.

rootnod3

3 months ago

3 replies

So, unless this went r/woosh over my head....how is current AI better than shit post-transformers? If all....old shit post-transformers are at least deterministic or open and not a randomized shitbox.

Unless I misinterpreted the post, render me confused.

dgacmu

3 months ago

1 reply

I think you're misinterpreting: "with the advent of transformers, (many) people doing NLP with pre-transformers techniques had to salvage their shit"

rootnod3

2 months ago

1 reply

I guess. That's why I added the "unless I am mis-interpreting", still got downvoted for it because I guess it was against AI. The wording was confusing but so was my understanding of it as a non-native speaker. Shit happens.

dgacmu

2 months ago

I agree that the wording was a bit confusing (as a native speaker).

isoprophlex

3 months ago

1 reply

I wasn't too clear, I think. Apologies if the wording was confusing.

People who started their NLP work (PhDs etc; industry research projects) before the LLM / transformer craze had to adapt to the new world. (Hence 'post-mass-uptake-of-transformers')

rootnod3

2 months ago

Ah ok, then I did misunderstand a lot. That makes sense. And I do like the non AI based NLP work.

numpad0

3 months ago

1 reply

There's no post-transformer tech. There are lots of NLP tasks that you can now, just, prompt an LLM to do.

isoprophlex

3 months ago

Yeah unclear wording; see the sibling comment also. I meant "the tech we have now", in the era after "attention is all you need"

dingnuts

3 months ago

7 replies

Sorry but you didn't really answer the question. The original claim was that transformers changed a whole bunch of fields, and you listed literally the one thing language models are directly useful for.. modeling language.

I think this might be the ONLY example that doesn't back up the original claim, because of course an advancement in language processing is an advancement in language processing -- that's tautological! every new technology is an advancement in its domain; what's claimed to be special about transformers is that they are allegedly disruptive OUTSIDE of NLP. "Which fields have been transformed?" means ASIDE FROM language processing.

other than disrupting users by forcing "AI" features they don't want on them... what examples of transformers being revolutionary exist outside of NLP?

Claude Code? lol

iknowstuff

3 months ago

1 reply

https://x.com/aelluswamy/status/1981760576591393203

saving lives

dingnuts

3 months ago

1 reply

I'm not watching a video on Twitter about self driving from the company who told us twelve years ago that completely autonomous vehicles were a year away as a rebuttal to the point I made.

If you have something relevant to say, you can summarize for the class & include links to your receipts.

iknowstuff

3 months ago

2 replies

your choice, I don't really care about your opinion

lawlessone

3 months ago

1 reply

Then why reply to them?

iknowstuff

2 months ago

1 reply

You’ll be shocked to find out this is a public forum and not DMs

lawlessone

2 months ago

Evading the question, why reply?

Summers over kid.

DonHopkins

3 months ago

1 reply

Then why call yourself "iknowstuff" then prove you don't?

iknowstuff

2 months ago

1 reply

So 11yos can feel like they got the most clever comebacks when they respond

DonHopkins

2 months ago

Your comebacks aren't as clever as you give yourself credit for. As an admitted 11-year-old, aren't you a little too young to be licking Elon Musk's boots, or posting to this discussion even?

conartist6

3 months ago

1 reply

The goal was never to answer the question. So what if it's worse. It's not worse for the researchers. It's not worse for the CEOs and the people who work for the AI companies. They're bathing in the limelight so their actual goal, as they would state it to themselves, is: "To get my bit of the limelight"

conartist6

3 months ago

>The final conversation on Sewell’s screen was with a chatbot in the persona of Daenerys Targaryen, the beautiful princess and Mother of Dragons from “Game of Thrones.” > >“I promise I will come home to you,” Sewell wrote. “I love you so much, Dany.” > >“I love you, too,” the chatbot replied. “Please come home to me as soon as possible, my love.” > >“What if I told you I could come home right now?” he asked. > >“Please do, my sweet king.” > >Then he pulled the trigger.

Reading the newspaper is such a lovely experience these days. But hey, the AI researchers are really excited so who really cares if stuff like this happens if we can declare that "therapy is transformed!"

It sure is. Could it have been that attention was all that kid needed?

rcbdev

3 months ago

1 reply

As a professor and lecturer, I can safely assure you that the transformer model has disrupted the way students learn - iin the literal sense of the word.

coldtea

2 months ago

That would one of the examples which he described sd "expect disrupting in a negative way"

dotnet00

3 months ago

I think they meant fields of research. If you do anything in NLP, CV, inverse-problem solving or simulations, things have changed drastically.

Some directly, because LLMs and highly capable general purpose classifiers that might be enough for your use case are just out there, and some because of downstream effects, like GPU-compute being far more common, hardware optimized for tasks like matrix multiplication and mature well-maintained libraries with automatic differentiation capabilities. Plus the emergence of things that mix both classical ML and transformers, like training networks to approximate intermolecular potentials faster than the ab-initio calculation, allowing for accelerating molecular dynamics simulations.

laterium

3 months ago

Alphafold and protein folding.

torginus

3 months ago

I think you're underselling the field of language processing - it wasn't just a single field but a bunch of subfields with their own little journals, papers and techniques - someone who researched machine translation approached problems differently to somebody else who did sentiment analysis for marketing.

I had a friend who did PhD research in NLP and I had a problem of extracting some structured data from unstructured text, and he told me to just ask ChatGPT to do it for me.

Basically ChatGPT is almost always better at language-based tasks than most specialized techniques for the specific problems the subfields meant to address, that were developed over decades.

That's a pretty effing huge deal, even if it falls short of the AGI 2027 hype

ComplexSystems

3 months ago

Transformers aren't only used in language processing. They're very useful in image processing, video, audio, etc. They're kind of like a general-purpose replacement for RNNs that are better in many ways.

GuB-42

2 months ago

They are now trying to understand why transformers are so good.

That's the thing with deep learning in general, people don't really understand what they are doing. It is a game of throwing stuff at the wall and see what sticks. NLP researchers are trying to open up these neural networks and try to understand where the familiar structures of language form.

I think it is important research. Both for improving models and to better understand language. Traditional NLP research is seen as obsolete by some but I think it is more relevant than ever. We can think of transformer-based LLMs as a life form we have created by accident and NLP researchers as biologists studying it, where companies like OpenAI and DeepSeek are more like breeders.

warkdarrior

3 months ago

2 replies

Spam detection and phishing detection are completely different than 5 years ago, as one cannot rely on typos and grammar mistakes to identify bad content.

walkabout

3 months ago

3 replies

Spam, scams, propaganda, and astroturfing are easily the largest beneficiaries of LLM automation, so far. LLMs are exactly the 100x rocket-boots their boosters are promising for other areas (without such results outside a few tiny, but sometimes important, niches, so far) when what you're doing is producing throw-away content at enormous scale and have a high tolerance for mistakes, as long as the volume is high.

visarga

3 months ago

3 replies

It seems unfair to call out LLMs for "spam, scams, propaganda, and astroturfing." These problems are largely the result of platform optimization for engagement and SEO competition for attention. This isn't unique to models; even we, humans, when operating without feedback, generate mostly slop. Curation is performed by the environment and the passage of time, which reveals consequences. LLMs taken in isolation from their environment are just as sloppy as brains in a similar situation.

Therefore, the correct attitude to take regarding LLMs is to create ways for them to receive useful feedback on their outputs. When using a coding agent, have the agent work against tests. Scaffold constraints and feedback around it. AlphaZero, for example, had abundant environmental feedback and achieved amazing (superhuman) results. Other Alpha models (for math, coding, etc.) that operated within validation loops reached olympic levels in specific types of problem-solving. The limitation of LLMs is actually a limitation of their incomplete coupling with the external world.

In fact you don't even need a super intelligent agent to make progress, it is sufficient to have copying and competition, evolution shows it can create all life, including us and our culture and technology without a very smart learning algorithm. Instead what it has is plenty of feedback. Intelligence is not in the brain or the LLM, it is in the ecosystem, the society of agents, and the world. Intelligence is the result of having to pay the cost of our execution to continue to exist, a strategy to balance the cost of life.

What I mean by feedback is exploration, when you execute novel actions or actions in novel environment configurations, and observe the outcomes. And adjust, and iterate. So the feedback becomes part of the model, and the model part of the action-feedback process. They co-create each other.

walkabout

3 months ago

1 reply

> It seems unfair to call out LLMs for "spam, scams, propaganda, and astroturfing." These problems are largely the result of platform optimization for engagement and SEO competition for attention.

They didn't create those markets, but they're the markets for which LLMs enhance productivity and capability the best right now, because they're the ones that need the least supervision of input to and output from the LLMs, and they happen to be otherwise well-suited to the kind of work it is, besides.

> This isn't unique to models; even we, humans, when operating without feedback, generate mostly slop.

I don't understand the relevance of this.

> Curation is performed by the environment and the passage of time, which reveals consequences.

It'd say it's revealed by human judgement and eroded by chance, but either way, I still don't get the relevance.

> LLMs taken in isolation from their environment are just as sloppy as brains in a similar situation.

Sure? And clouds are often fluffy. Water is often wet. Relevance?

The rest of this is a description of how we can make LLMs work better, which amounts to more work than required to make LLMs pay off enormously for the purposes I called out, so... are we even in disagreement? I don't disagree that perhaps this will change, and explicitly bound my original claim ("so far") for that reason.

... are you actually demonstrating my point, on purpose, by responding with LLM slop?

visarga

3 months ago

1 reply

LLMs can generate slop if used without good feedback or trying to minimize human contribution. But the same LLMs can filter out the dark patterns. They can use search and compare against dozens or hundreds of web pages, which is like the deep research mode outputs. These reports can still contain mistakes, but we can iterate - generate multiple deep reports from different models with different web search tools, and then do comparative analysis once more. There is no reason we should consume raw web full of "spam, scams, propaganda, and astroturfing" today.

throwaway290

3 months ago

So they can sort of maybe solve the problems they create except some people profit from it and can mass manipulate minds in new exciting ways

pixelpoet

3 months ago

> It seems unfair to call out LLMs for "spam, scams, propaganda, and astroturfing."

You should hear HN talk about crypto. If the knife were invented today they'd have a field day calling it the most evil plaything of bandits, etc. Nothing about human nature, of course.

Edit: There it is! Like clockwork.

econ

3 months ago

For a good while I joked that I could easily write a bot that makes more interesting conversation than you. The human slop will drown in AI slop. Looks like we wil need to make more of an effort when publishing if not develop our own personality.

dare944

3 months ago

> when what you're doing is producing throw-away content at enormous scale and have a high tolerance for mistakes, as long as the volume is high.

This also describes most modern software development

nickpsecurity

3 months ago

Robocalls. Almost all that I receive are AI's. It's aggrevating because I'd have enjoyed talking to a person in India or wherever but I get the same AI's which filter or argue with me.

I just bought Robokiller. I habe it set to contacts cuz the AI's were calling me all day.

onlyrealcuzzo

3 months ago

The signals might be different, but the underlying mechanism is still incredibly efficient, no?

mountainriver

3 months ago

2 replies

Software, and it’s wildly positive.

Takes like this are utterly insane to me

Silamoth

3 months ago

1 reply

It’s had an impact on software for sure. Now I have to fix my coworker’s AI slop code all the time. I guess it could be a positive for my job security. But acting like “AI” has had a wildly positive impact on software seems, at best, a simplification and, at worst, the opposite of reality.

mountainriver

2 months ago

oh heavens, do you not review code?

sponnath

3 months ago

1 reply

Wouldn't say it's transformative.

mrieck

3 months ago

2 replies

My workflow is transformed. If yours isn’t you’re missing out.

Days that I’d normally feel overwhelmed from requests by management are just Claude Code and chill days now.

sponnath

3 months ago

I've tried to make AI work but a lot of times the overall productivity gains I do get are so negligible that I wouldn't say it's been transformative for me. I think the fact that so many of us here on HN have such different experiences with AI goes to show that it is indeed not as transformative as we think it is (for the field at least). I'm not trying to invalidate your experience.

jimbo808

2 months ago

If you're being honest, I bet your codebase is going to shit and your skills are in rapid decline. I bet you have a ton of redundant code, broken patterns, shit tests, and your coworkers are noticing the slop and getting tired of fixing it.

Eventually, your code be such shit that Claude Code will struggle to even do basic CRUD because there are four redundant functions and it keeps editing the wrong ones. Your colleagues will go to edit your code, only to realize that it's such utter garbage that they have to rewrite the whole thing because that's easier than trying to make sense of the slop you produced under your own name.

If you were feeling overwhelmed by management, and Claude Code is alleviating that, I fear you aren't cut out for the work.

CHY872

3 months ago

1 reply

In computer vision transformers have basically taken over most perception fields. If you look at paperswithcode benchmarks it’s common to find like 10/10 recent winners being transformer based against common CV problems. Note, I’m not talking about VLMs here, just small ViTs with a few million parameters. YOLOs and other CNNs are still hanging around for detection but it’s only a matter of time.

thesz

3 months ago

1 reply

Can it be that transformer-based solutions come from the well-funded organizations that can spend vast amount of money on training expensive (O(n^3)) models?

Are there any papers that compare predictive power against compute needed?

nickpsecurity

3 months ago

You're onto something. BabyLM competition had caps. Many LLM's were using 1TB training data for some time.

In many cases, I can't even see how many GPU hours or what size cluster of what GPU's the pretraining required. If I can't afford it, then it doesn't matter what it achieved. What I can afford is what I have to choose from.

jonas21

3 months ago

Out of curiosity, what field are you in?

blibble

3 months ago

> but I would say the impact is almost entirely negative.

quite

the transformer innovation was to bring down the cost of producing incorrect, but plausible looking content (slop) in any modality to near zero

not a positive thing for anyone other than spammers

EGreg

3 months ago

AI fan (type 1 -- AI made a big breakthrough) meets AI defender (type 2 -- AI has not fundamentally changed anything that was already a problem).

Defenders are supposed to defend against attacks on AI, but here it misfired, so the conversation should be interesting.

That's because the defender is actually a skeptic of AI. But the first sentence sounded like a typical "nothing to see here" defense of AI.

AaronAPU

3 months ago

1 reply

I have my own probabilistic hyper-graph model which I have never written down in an article to share. You see people converging on this idea all over if you’re looking for it.

Wish there were more hours in the day.

rbartelme

3 months ago

Yeah I think this is definitely the future. Recently, I too have spent considerable time on probabilistic hyper-graph models in certain domains of science. Maybe it _is_ the next big thing.

epistasis

3 months ago

2 replies

> think the valuable idea is probabilistic graphical models- of which transformers is an example- combining probability with sequences, or with trees and graphs- is likely to continue to be a valuable area for research exploration for the foreseeable future.

As somebody who was a biiiiig user of probabilistic graphical models, and felt kind of left behind in this brave new world of stacked nets, I would love for my prior knowledge and experience to become valuable for a broader set of problem domains. However, I don't see it yet. Hope you are right!

cauliflower2718

3 months ago

+1, I am also big user of PGMs, and also a big user of transformers, and I don't know what the parent comment talking about, beyond that for e.g. LLMs, sampling the next token can be thought of as sampling from a conditional distribution (of the next token, given previous tokens). However, this connection of using transformers to sample from conditional distributions is about autoregressive generation and training using next-token prediction loss, not about the transformer architecture itself, which mostly seems to be good because it is expressive and scalable (i.e. can be hardware-optimized).

Source: I am a PhD student, this is kinda my wheelhouse

nickpsecurity

2 months ago

Don't give up on older stuff just because deep learning went in a different direction. It's a perfect time to recombine the new with the old. I started DuckDuckGoing and found combinations of ("deep learning" or "neural networks") with ("gaussian," "clustering," "support vector machines," "markov," "probabilistic graphical models").

I haven't actually read these to see if they achieved anything. I'm just sharing the results from a quick search in your sub-field in case it helps you PGM folks.

https://arxiv.org/abs/2104.12053

https://pmc.ncbi.nlm.nih.gov/articles/PMC7831091/

And here's an intro for those wondering what PGM is:

https://arxiv.org/abs/2507.17116

hammock

3 months ago

1 reply

> I think the valuable idea is probabilistic graphical models- of which transformers is an example- combining probability with sequences, or with trees and graphs- is likely to continue to be a valuable area

I agree. Causal inference and symbolic reasoning would SUPER juicy nuts to crack , more so than what we got from transformers.

nickpsecurity

2 months ago

In Explainable AI and hybrid studies, many people are combining multiple methods with one doing unsupervised learning or generation but training or analyzed by an explainable model. Try that.

samsartor

3 months ago

4 replies

I'm skeptical that we'll see a big breakthrough in the architecture itself. As sick as we all are of transformers, they are really good universal approximators. You can get some marginal gains, but how more _universal_ are you realistically going to get? I could be wrong, and I'm glad there are researchers out there looking at alternatives like graphical models, but for my money we need to look further afeild. Reconsider the auto-regressive task, cross entropy loss, even gradient descent optimization itself.

kingstnap

3 months ago

1 reply

There are many many problems with attention.

The softmax has issues regarding attention sinks [1]. The softmax also causes sharpness problems [2]. In general this decision boundary being Euclidean dot products isn't actually optimal for everything, there are many classes of problem where you want polyhedral cones [3]. Positional embedding are also janky af and so is rope tbh, I think Cannon layers are a more promising alternative for horizontal alignment [4].

I still think there is plenty of room to improve these things. But a lot of focus right now is unfortunately being spent on benchmaxxing using flawed benchmarks that can be hacked with memorization. I think a really promising and underappreciated direction is synthetically coming up with ideas and tests that mathematically do not work well and proving that current arhitectures struggle with it. A great example of this is the VITs need glasses paper [5], or belief state transformers with their star task [6]. The Google one about what are the limits of embedding dimensions also is great and shows how the dimension of the QK part is actually important to getting good retrevial [7].

[1] https://arxiv.org/abs/2309.17453

[2] https://arxiv.org/abs/2410.01104

[3] https://arxiv.org/abs/2505.17190

[4] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5240330

[5] https://arxiv.org/abs/2406.04267

[6] https://arxiv.org/abs/2410.23506

[6] https://arxiv.org/abs/2508.21038

ACCount37

3 months ago

2 replies

If all your problems with attention are actually just problems with softmax, then that's an easy fix. Delete softmax lmao.

No but seriously, just fix the fucking softmax. Add a dedicated "parking spot" like GPT-OSS does and eat the gradient flow tax on that, or replace softmax with any of the almost-softmax-but-not-really candidates. Plenty of options there.

The reason why we're "benchmaxxing" is that benchmarks are the metrics we have, and the only way by which we can sift through this gajillion of "revolutionary new architecture ideas" and get at the ones that show any promise at all. Of which there are very few, and fewer still that are worth their gains when you account for: there not being an unlimited amount of compute. Especially not when it comes to frontier training runs.

Memorization vs generalization is a well known idiot trap, and we are all stupid dumb fucks in the face of applied ML. Still, some benchmarks are harder to game than others (guess how we found that out), and there's power in that.

thousand_nights

3 months ago

1 reply

reason we're benchmaxxing is because there's a huge monetary incentive now to have the best performing model on these synthetic benchmarks and that status is worth a lot of money

literally every new release of something point X model of every major player includes some benchmark graphs to show off

mycall

3 months ago

1 reply

benchmaxxing has also been identified as one of the causes of hallucination.

svnt

3 months ago

1 reply

hallucination is just built in, what am I missing?

ACCount37

2 months ago

That LLMs have some basic metaknowledge and metacognitive skills that they can use to reduce the hallucination rate.

Which is what humans do too - it's not magic. Humans just get more metacognitive juice for free. Resulting in a hallucination rate significantly lower than that of LLMs, but significantly higher than zero.

Now, having the skills you need to avoid hallucinations is good, even if they're weak and basic skills. But is an LLM willing to actually put them to use?

OpenAI cooked o3 with reckless RL using hallucination-unaware reward calculation - which punished reluctance to answer and rewarded overconfident guesses. And their benchmark suite didn't catch it, because the benchmarks were hallucination-unaware too.

skissane

3 months ago

2 replies

> Add a dedicated "parking spot" like GPT-OSS does and eat the gradient flow tax on that

Not familiar with this topic, but intrigued-anywhere I can read more about it?

qcnguy

2 months ago

OpenAI have talked about it. The neural architecture needs to let the model handle the case where there's nothing worth attending to, as softmax requires attention to be allocated to all tokens but sometimes there's nothing worth it.

ACCount37

2 months ago

Looked for it briefly, think the best I got is this older discussion:

https://news.ycombinator.com/item?id=44834918

krychu

3 months ago

1 reply

BDH

tim333

3 months ago

Yeah that thing is quite interesting - baby dragon hatchling https://news.ycombinator.com/item?id=45668408 https://youtu.be/mfV44-mtg7c

eldenring

3 months ago

I think something with more uniform training and inference setups, and otherwise equally hardware friendly, just as easily trainable, and equally expressive could replace transformers.

mxkopy

3 months ago

I agree, gradient descent implicitly assumes things have a meaningful gradient, which they don’t always. And even if we say anything can be approximated by a continuous function, we’re learning we don’t like approximations in our AI. Some discrete alternative of SGD would be nice.

eli_gottlieb

3 months ago

2 replies

> probabilistic graphical models- of which transformers is an example

Having done my PhD in probabilistic programming... what?

dekhn

3 months ago

I was talking about things inspired by (for example) hidden markov models. See https://en.wikipedia.org/wiki/Graphical_model

In biology, PGMs were one of the first successful forms of "machine learning"- given a large set of examples, train a graphical model using probabilities using EM, and then pass many more examples through the model for classification. The HMM for proteins is pretty straightforward, basically just a probabilistic extension of using dynamic programming to do string alignment.

My perspective- which is a massive simplification- is that sequence models are a form of graphical model, although the graphs tend to be fairly "linear" and the predictions generate sequences (lists) rather than trees or graphs.

pishpash

3 months ago

It's got nothing to do with PGM's. However, there is the flavor of describing graph structure by soft edge weights vs. hard/pruned edge connections. It's not that surprising that one does better than the other, and it's a very obvious and classical idea. For a time there were people working on NN structure learning and this is a natural step. I don't think there is any breakthrough here, other than that computation power caught up to make it feasible.

cyanydeez

3 months ago

Cancer is also fertile. Its more addiction than revolution, im afraid.

pigeons

3 months ago

Not doubting in any way, but what are some fields it transformed

bangaladore

3 months ago

3 replies

> Now, as CTO and co-founder of Tokyo-based Sakana AI, Jones is explicitly abandoning his own creation. "I personally made a decision in the beginning of this year that I'm going to drastically reduce the amount of time that I spend on transformers," he said. "I'm explicitly now exploring and looking for the next big thing."

So, this is really just a BS hype talk. This is just trying to get more funding and VCs.

htrp

3 months ago

2 replies

anyone know what they're trying to sell here?

gwbas1c

3 months ago

1 reply

The ability to do original, academic research without the pressure to build something marketable.

YeGoblynQueenne

3 months ago

Scandalous.

aydyn

3 months ago

probably AI

YC3498723984327

3 months ago

3 replies

His AI company is called "Fish AI"?? Does it mean their AI will have the intelligence of a fish?

bangaladore

3 months ago

1 reply

Without transformers, maybe.

prmph

3 months ago

Hope were not talking about eels

astrange

3 months ago

It's about collective intelligence, as seen in swarms of ants or fish.

v3ss0n

3 months ago

Or Fishy?

ivape

3 months ago

1 reply

He sounds a lot like how some people behave when they reach a "top". Suddenly that thing seems unworthy all of a sudden. It's one of the reasons you'll see your favorite music artist totally go a different direction on their next album. It's an artistic process almost. There's a core arrogance involved, that you were responsible for the outcome and can easily create another great outcome.

bigyabai

3 months ago

When you're overpressured to succeed, it makes a lot of sense to switch up your creative process in hopes of getting something new or better.

It doesn't mean that you'll get good results by abandoning prior art, either with LLMs or musicians. But it does signal a sort of personal stress and insecurity, for sure.

64 more comments available on Hacker News

View full discussion on Hacker News

ID: 45690840Type: storyLast synced: 11/20/2025, 8:28:07 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN