Openai Researcher Announced GPT-5 Math Breakthrough That Never Happened

Posted3 months agoActive3 months ago

Topfi

430 points

234 comments

the-decoder.comTechstoryHigh profile

heatednegative

Debate

80/100

AIOpenaiGPT-5HypeMisinformation

Key topics

Openai

GPT-5

Hype

Misinformation

OpenAI researcher made a misleading announcement about GPT-5 solving unsolved math problems, sparking controversy and criticism about the company's hype and transparency.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

0-3h

Avg / period

12.3

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 19, 2025 at 7:30 AM EDT
3 months ago
Step 01
02First comment
Oct 19, 2025 at 8:46 AM EDT
1h after posting
Step 02
03Peak activity
90 comments in 0-3h
Hottest window of the conversation
Step 03
04Latest activity
Oct 21, 2025 at 5:36 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (234 comments)

Showing 160 comments of 234

amelius

3 months ago

1 reply

> Summary (from the article)

* OpenAI researchers claimed or suggested that GPT-5 had solved unsolved math problems, but in reality, the model only found known results that were unfamiliar to the operator of erdosproblems.com.

* Mathematician Thomas Bloom and Deepmind CEO Demis Hassabis criticized the announcement as misleading, leading the researchers to retract or amend their original claims.

* According to mathematician Terence Tao, AI models like GPT-5 are currently most helpful for speeding up basic research tasks such as literature review, rather than independently solving complex mathematical problems.

HarHarVeryFunny

3 months ago

> GPT-5 had only surfaced existing research that Bloom had missed

So GPT-5 didn't derive anything itself - it was just an effective search engine for prior research, which is useful, but not any sort of breakthough whatsoever.

bbor

3 months ago

1 reply

This is just tit-for-tat clickbait. The researcher’s wording was a bit unclear for sure, but far from incorrect.

resoluteteeth

3 months ago

1 reply

I disagree. There is no way to interpret "GPT-5 just found solutions to 10 (!) previously unsolved Erdos problems" as saying something other than GPT-5 having solved them.

If it just found existing solutions then they obviously weren't "previously unsolved" so the tweet is wrong.

He clearly misunderstood the situation and jumped to the conclusion that GPT-5 had actually solved the problems because that's what he wanted to believe.

That said, the misunderstanding is understandable because the tweet he was responding to said they had been listed as "open", but solving unsolved erdos problems by itself would be such a big deal that he probably should have double checked it.

bbor

3 months ago

But… it found them… and they were previously marked unsolved…

Note that “solved” does not equal “found”

andrewstuart

3 months ago

8 replies

Humans hallucinating about AI.

MattGaiser

3 months ago

6 replies

Humans "hallucinate" in the AI way constantly, which is why I don't see them as a barrier to LLMs replacing humans in many contexts. It really isn't unusual for a human to make stuff up or be unaware of stuff.

zeknife

3 months ago

2 replies

A human being informed of a mistake will usually be able to resolve it and learn something in the process, whereas an LLM is more likely to spiral into nonsense

MattGaiser

3 months ago

2 replies

You must know people without egos. Humans are better at correcting their mistakes, but far worse at admitting them.

But yes, as an edge case handler humans still have an edge.

topaz0

3 months ago

2 replies

LLMs by contrast love to admit their mistakes and self-flagellate, and then go on to not correct them. Seems like a worse tradeoff.

skeeter2020

3 months ago

Not when your goal is to create ASI: Artificial Sycophant Intelligence

thaumasiotes

3 months ago

It's true that the big public-facing chatbots love to admit to mistakes.

It's not obvious to me that they're better at admitting their mistakes. Part of being good at admitting mistakes is recognizing when you haven't made one. That humans tend to lean too far in that direction shouldn't suggest that the right amount of that behavior is... less than zero.

tonyhart7

3 months ago

and this is why LLM is getting cooked

they feed an internet data into that shit, they basically "told" LLM to behave because surprise surprise, human sometimes can be more nasty

alimw

3 months ago

You must know better humans than I do.

lapcat

3 months ago

> Humans "hallucinate" in the AI way constantly

This claim is ambiguous. The use of the word "Humans" here obscures rather than clarifies the issue. Individual humans typically do not "hallucinate" constantly, especially not on the job. Any individual human who is as bad at their job as an LLM should indeed be replaced, by a more competent individual human, not by an equally incompetent LLM. This was true long before LLMs were invented.

In the movie "Bill and Ted's Excellent Adventure," the titular characters attempt to write a history report by asking questions of random strangers in a convenience store parking lot. This of course is ridiculous and more a reflection of the extreme laziness of Bill and Ted than anything else. Today, the lazy Bill and Ted would ask ChatGPT instead. It's equally ridiculous to defend the wild inaccuracy and hallucinations of LLMs by comparing them to average humans. It's not the job of humans to answer random questions on any subject.

Human subject matter experts are not perfect, but they’re much better than average and don’t hallucinate on their subjects. They also have accountability and paper trails, can be individually discounted for gross misconduct, unlike LLMs.

skeeter2020

3 months ago

Do you think the OpenAI human, when informed of their "oopsie" replied "You're right, there is existing evidence that this problem has already been solved. Blah Blah Blah ... and that's why our new model has made a huge breakthrough against previously unsolved math problems!"

topaz0

3 months ago

> Humans "hallucinate" in the AI way constantly

This is more and more clearly false. Humans get things wrong certainly, but the manner in which they get things wrong is just not comparable to how the LLMs get things wrong, beyond the most superficial comparison.

pas

3 months ago

it's the same thing with self-driving, if you can make it safer than a good human driver it's enough. but the bar is pretty low with driving (as evidenced by the hundreds of thousands of collisions and deaths and permanent disabilities each year). and rather high in scientific publishing.

zeroonetwothree

3 months ago

Humans are a bit better at knowing which things are important and doing more research. Also better at being honest when directly pressed. And infinitely better at learning from errors.

(Yes, not everyone, but we do have some mechanisms to judge or encourage)

random9749832

3 months ago

1 reply

Best case: Hallucination

Worst case (more probable): Lying

MPSimmons

3 months ago

2 replies

Hanlon's Razor

forgetfulness

3 months ago

1 reply

Lying is a stupid way of selling something and making money

reaperducer

3 months ago

Lying is a stupid way of selling something and making money

Works for Elon.

random9749832

3 months ago

They are expanding into the adult market because they are running out of ideas. I think common sense is enough to decide what is what here.

JKCalhoun

3 months ago

3 replies

"OpenAI Researcher Hallucinates GPT-5 Math Breakthrough" could be a headline from The Onion.

reaperducer

3 months ago

"OpenAI Researcher Hallucinates GPT-5 Math Breakthrough" could be a headline from The Onion.

Off topic, but I saw The Onion on sale in the magazine rack of Barnes and Noble last month.

For those who miss when it was a free rag in sidewalk newsstands, and don't want to pony up for a full subscription, this is an option.

antegamisou

3 months ago

Seriously those headlines are getting DailyMail sensationalism levels of ridiculous.

nicce

3 months ago

I the old world we would just use the word bullshit.

pera

3 months ago

1 reply

Heh stockholders are not hallucinating: They know very well what they are doing.

skeeter2020

3 months ago

1 reply

retail investors? no way. The fever-dream may continue for a while but eventually it will end. Meanwhile we don't even know our full exposure to AI. It's going to be ugly and beyond burying gold in my backyard I can't even figure out how to hedge against this monster.

pera

3 months ago

Yeah no I didn't mean retail investors, OpenAI is not publicly traded, but yeah I do share your concern...

alkyon

3 months ago

1 reply

They started believing the very lies they invented.

moffkalast

3 months ago

"The truth is usually just an excuse for a lack of imagination."

Palmik

3 months ago

More like humans hallucinating about humans hallucinating about AI, see here: https://news.ycombinator.com/item?id=45634120

rixed

3 months ago

These days AI just obsequiously praise whatever stupid ideas the human throw at them, which encourage humans into hallucinating breakthroughs.

But it's only a matter of time before AI gets better at prompt engineering.

/s?

Kiboneu

3 months ago

No no, openai is actually secretly run by AI.

mentalgear

3 months ago

1 reply

Another instance of openAI manipulating results to prolong their unsustainable circular hype bubble.

The inevitable collapse could be even more devastating than the 2008 financial crisis.

All while so vast resources are being wasted on non-verifiable gen AI slob, while real approaches (neuro-symbolic like DeepMind's AlphaFold) are mostly ignored financially because they don't generate the quick stock market increases that hype does.

the_duke

3 months ago

5 replies

People keep spouting this, but I don't see how the AI bubble bursting would be all that devastating.

2008 was a systemic breakdown rippling through the foundations of the financial system.

It would lead to a market crash (80% of gains this year were big tech/AI) and likely a full recession in the US, but nothing nearly as dramatic as a global systemic crisis.

In contrast to the dot com bubble, the huge AI spending is also concentrated on relatively few companies, many with deep pockets from other revenue sources (Google, Meta, Microsoft, Oracle), and the others are mostly private companies that won't have massive impact on the stock market.

A sudden stop in AI craze would be hard for hardware companies and a few big AI only startups , but the financial fallout would be much more contained than either dot com or 2008.

TopfiAuthor

3 months ago

1 reply

Isn't the dot com bubble a far better proxy? Notably, todays spending is both higher and more concentrated in a few companies that a large part of the population has exposure to (most dot com companies weren't publicly traded and far smaller vs MSFT, Alphabet, Meta, Oracle, NVDA making up most investment today) by way of pension funds, ETFs, etc.?

the_duke

3 months ago

1 reply

Sure, but all of the above have solid businesses that rake in lots of money, revenue based on AI is a small percentage for them.

An AI bust would take the stock price down a good deal, but the stock gains have been relatively moderate. Year on year: Microsoft +14%, Meta +24%, Google +40, Oracle +60%, ... And a notable chunk of those gains have indirectly come from the dollar devaluing.

Nvidia would be hit much harder of course.

There is a good amount of smaller AI startups, but a lot of the AI development is concentrated on the big dogs, it's not nearly as systemic as in dot com, where a lot of businesses went under completely.

And even with an AI freeze, there is plenty of value and usage there already that will not go away, but will keep expanding (AI chat, AI coding, etc) which will mitigate things.

rhetocj23

3 months ago

The problem is do you know much 14% etc is? We are talking about valuations in the trillions my friend!

techblueberry

3 months ago

2 replies

I think my theory into contagion would be that There’s been lots of talk about these companies starting to rack up debt, and I think AI is so tied into the US GDP that things like -

If the stock market crashes, there’s lots of talk about how wealth and debt are interlinked. Could the crash be general enough to start calls on debt backed by stocks.

My recollection in 2008 was that we didn’t learn about how bad it was until after. The tech companies have been so desperate for a win, I wonder if some of them are over their skis in some way, and if there are banks that are risking it all on AI. (We know for some tech bros think the bet on AI is a longtermist like bet; closer to religion than reason and that it’s worth risking everything because the payback could be in the hundreds of trillions)

Combine this with the fact that AI is like what - 30% of the US economy? Magnificent 7 are 60%?

What happens if sustainable PE ratios in tech collapse. Does it take out Tesla?

Maybe the contagion is just the impact on the US economy which, classically anyways has been intermingled with everything.

I would bet almost everything that there is some lie at the center of this thing that we aren’t really aware of yet.

SJC_Hacker

3 months ago

1 reply

> Combine this with the fact that AI is like what - 30% of the US economy? Magnificent 7 are 60%?

Nowhere close. US GDP is like $30 trillion. Open AI revenue is ~$4 billion. All the other AI companies revenue might amount to $10 billion at most, and that is being generous. $10 billion/ $30 trillaion is not even 1%.

You are forgetting all those "boring" sectors that form the basis of economies like agriculture and energy. They have always been bigger than the tech sector at any point, but they aren't "sexy" because there isn't the potential "exponential growth" that tech companies

ben_w

3 months ago

Small quibble, doesn't challenge your overall point, but as I understand it their revenue is somewhat higher than you say.

The Open AI revenue was ~$4 billion for the first half of the year; Anthropic recently reported a rate (which isn't total revenue, I know) equivalent to about $10 billion/year; NVIDIA's sales are supposed to be up 78% this quarter due to AI sales, reaching $39.33 billion, so plausibly ($39.33/1.78)*0.78 ~= $17 billion from AI in that quarter (rate, again yes I know, of $68 billion/year). So I can believe AI is order-of $100 billion/year economically… to US businesses with customers almost everywhere important except possibly China.

But just to re-iterate, this doesn't change your point. Even 100 B / 30 T is only one third of a percent.

the_duke

3 months ago

It may well be that an AI bubble burst is the tipping point, but I think that tipping point was coming either way.

The US admin has been (almost desperately) trying to prop up markets and an already struggling economy. If it wasn't AI, it could have been another industry.

I think AI is more of a sideshow in this context. The bigger story is the dollar losing its dominant position , money draining out into Gold/Silver/other stock markets, India buying oil from Russia in Yen, a global economy that has for years been propped up by government spending (US/China/Europe/...), large and lasting geopolitical power balance shifts, ...

These things don't happen overnight, and in fact over many years for USD holdings, but the effects will materialize.

Some of the above (dollar devaluation) is actually what the current admin wanted, which I would see as an admission of global shifts. We might see much larger changes to the whole financial system in the coming decades, which will have a lot of effects.

Theodores

3 months ago

1 reply

When America sneezes, the rest of the world catches a cold. This was said after the OG 1929 crash and I can remember it said in the 80s. Nobody says it anymore.

Due to exorbitant privilege, with the dollar as the only currency that matters, every country that trades with America is swapping goods and services for 'bits of green paper'. Unless buying oil from Russia, these bits of green paper are needed to buy oil. National currencies and the Euro might as well be casino chips, mere proxies for dollars.

Just last week the IMF issued a warning regarding AI stocks and the risk they pose to the global economy if promises are not delivered.

With every hiccup, whether that be the dot com boom, 2008 or the pandemic, the way out is to print more money, with this money going in at the top, for the banks, not the masses. This amounts to devaluation.

When the Ukraine crisis started, the Russian President stopped politely going along with Western capitalism and called the West out for printing too much money during the pandemic. Cut off from SWIFT and with many sanctions, Russia started trading in other currencies with BRICS partners. We are now at a stage of the game where the BRICS countries, of which there are many, already have a backup plan for when the next US financial catastrophe happens. They just won't use the dollar anymore. Note that currently, China doesn't want any dollars making it back to its own economy, since that would cause inflation. So they invest their dollars in Belt and Road initiatives, keeping those green bits of paper safely away from China. They don't even need exports to the USA or Europe since they have a vast home market to develop.

Note that Russia's reserve of dollars and euros was confiscated. They have nothing to lose so they aren't going to come back into the Western financial system.

Hence, you are right. A market crash won't be a global systematic crisis, it just means that Shanghai becomes the financial capital of the world, with no money printing unless it is backed up by mineral, energy or other resources that have tangible value. This won't be great for the collective West, but pretty good for the rest of the world.

the_duke

3 months ago

I have similar views on many points, see my response to a sibling comment.

I just think that effects of the AI bubble bursting would be at most a symptom or trigger of much larger geopolitical and financial shifts that would happen anyway.

MattGaiser

3 months ago

> People keep spouting this, but I don't see how the AI bubble bursting would be all that devastating.

Well, an enormous amount of debt is being raised and issued for AI and US economic growth is nearly entirely AI. Crypto bros showed the other day that they were leveraged to the hilt on coins and it wouldn't surprise me if people are the same way on AI. It is pretty heavily tied to the financial system at this point.

jcranmer

3 months ago

There's a few variables which can make it much worse.

The first is how much of the capital expenditures are being fueled by debt that won't be repaid, and how much that unpaid debt harms lending institutions. This is fundamentally how a few bad debts in 2008 broke the entire financial system: bad loans felled Lehman Brothers, which caused one money market fund to break the buck, which spurred a massive exodus from the money markets rather literally overnight.

The second issue is the psychological impact of 40% of market value just evaporating. A lot of people have indirect exposure to the stock market and these stocks in particular (via 401(k)s or pensions), and seeing that much of their wealth evaporate will definitely have some repercussions on consumer confidence.

kif

3 months ago

3 replies

This honestly doesn’t surprise me. We have reached a point where it’s becoming clearer and clearer that AGI is nowhere to be seen, whereas advances in LLM ability to ‘reason’ have slowed down to (almost?) a halt.

dawnerd

3 months ago

1 reply

But if you ask an AI hype person they’ll say we’re almost there we just need a bit more gigawatts of compute!

rhetocj23

3 months ago

I hate to say this but I think the LLM story is going to go the same way as Teslas stock - everyone knows its completely detached from fundamentals and driven by momentum and hype but nobody wants to do the right thing.

vbezhenar

3 months ago

3 replies

In my book, chat-based AGI has been reached years ago, when I couldn't reliably distinguish computer from human.

Solving problems that humanity couldn't solve is super-AGI or something like that. It's not there indeed.

3836293648

3 months ago

1 reply

Beating the Turing Test is not AGI, but it is beating the Turing Test and that was impressive enough when it happened

mrguyorama

3 months ago

1 reply

So you were impressed by ELIZA right? Because that's what first "beat the turing test"

Which, actually is not a real thing. Nor has it ever really been meaningful.

Trolls on IRC "beat the turing test" with bots that barely even had any functionality.

3836293648

3 months ago

I wasn't alive back then, but I was absolutely impressed by it the first time I heard about it. I don't know how that is supposed to be a gotcha.

jdiff

3 months ago

1 reply

We're not even solving problems that humanity can solve. There's been several times where I've posed to models a geometry problem that was novel but possible for me to solve on my own, but LLMs have fallen flat on executing them every time. I'm no mathematician, these are not complex problems, but they're well beyond any AI, even when guided. Instead, they're left to me, my trusty whiteboard, and a non-negligible amount of manual brute force shuffling of terms until it comes out right.

They're good at the Turing test. But that only marks them as indistinguishable from humans in casual conversation. They are fantastic at that. And a few other things, to be clear. Quick comprehension of an entire codebase for fast queries is horribly useful. But they are a long way from human-level general intelligence.

vbezhenar

3 months ago

1 reply

I'm pretty sure there are billions of people on the Earth unable to solve your geometry problem. That doesn't make them less human. It's not a benchmark. You should think about something almost any human can do, not selected few. That's the bar. Casual conversation is one of the examples that almost any human can do.

jdiff

3 months ago

Any human could do it, given the training. Humans largely choosing not to specialize in this way doesn't make them less human, nor did I imply that. Humans have the capacity for it, LLMs fall short universally.

gitaarik

3 months ago

What do you mean reliably distinguish a computer from a human? I haven't been surprised one time yet. I always find out eventually when I'm talking to an AI. It's easy usually, they get into loops, forget about conversation context, don't make connections between obvious things and do make connections between less obvious things. Etc.

Of course they can sound very human like, but you know you shouldn't be that naive these days.

Also you should of course not judge based on a few words.

steveBK123

3 months ago

Hence the pivot into ads, shop-in-chat and umm.. adult content.

strangescript

3 months ago

6 replies

This entire thing has been pretty disingenuous on both sides of the fence. All the anti-AI (or anti OpenAI) people are doing victory laps, but what GPT-5 Pro did is still very valuable.

1) What good is your open problem set if really its a trivial "google search" away from being solved. Why are they not catching any blame here?

2) These answers still weren't perfectly laid out for the most part. GPT-5 was still doing some cognitive lifting to piece it together.

If a human would have done this by hand it would have made news and instead the narrative would have been inverted to ask serious questions about the validity of some these style problem sets and/or ask the question how many other solutions are out there that just need pieced together from pre-existing research.

But, you know, AI Bad.

puttycat

3 months ago

1 reply

This is a strawman argument. No anti-AI sentiment was involved here. Simply the fact that finding and matching text on the Internet is several orders of magnitude easier than finding novel solutions to hard math problems.

strangescript

3 months ago

You didn't read the X replies if you believe that

lukev

3 months ago

1 reply

Framing this question as "AI good" OR "AI bad" is culture-war thinking.

The real problem here is that there's clearly a strong incentive for the big labs to deceive the public (and/or themselves) about the actual scientific and technical capabilities of LLMs. As Karpathy pointed out on the recent Dwarkesh podcast, LLMs are quite terrible at novel problems, but this has become sort of an "Emperor's new clothes" situation where nobody with a financial stake will actually admit that, even though it's common knowledge if you actually work with these things.

And this directly leads to the misallocation of billions of dollars and potentially trillions in economic damage as companies align their 5-year strategies towards capabilities that are (right now) still science fiction.

The truth is at stake.

strangescript

3 months ago

1 reply

Except they weren't intentionally trying to deceive anyone. They made the faulty assumption that these problems were non-trivial to solve and didn't think it was simply GPT-5 aggregating solutions in the wild.

lukev

3 months ago

Knowing what I know about LLMs, from their internal architecture and from extensive experience working with them daily, I would find this kind of result highly surprising and in a clear violation of my mental model of how these things work. And I'm very far from an expert.

If a purported expert in the field can is willing to credulously publish this kind of result, it's not unreasonable to assume that either they're acting in bad faith, or (at best) are high on their own supply regarding what these things can actually do.

andrepd

3 months ago

> 1) What good is your open problem set if really its a trivial "google search" away from being solved. Why are they not catching any blame here?

Please explain how this is in any way related to the matter at hand. What is the relation between the incompleteness of an math problem database, and AI hypesters lying about the capabilities of GPT5? I fail to see the relevance.

> If a human would have done this by hand it would have made news

If someone updated information on an obscure math problem aggregator database this would be news?? Again, I fail to see your point here.

TopfiAuthor

3 months ago

> What good is your open problem set if really its a trivial "google search" away from being solved. Why are they not catching any blame here?

They are a community run database, not the sole arbiter and source of this information. We learned the most basic research back in highschool, I'd hope researchers from top institutions now working for one of the biggest frontier labs can do the same prior to making a claim, but microblogging has and continues to be a blight on any accurate information so nothing new there.

> GPT-5 was still doing some cognitive lifting to piece it together.

Cognitive lifting? It's a model, not a person, but besides that fact, this was already published literature. Handy that a LLM can be a slightly better search, but calling claims of "solving maths problems" out as irresponsible and inaccurate is the only right choice in this case.

> If a human would have done this by hand it would have made news [...]

"Researcher does basic literature review" isn't news in this or any other scenario. If we did a press release every journal club, there wouldn't be enough time to print a single page advert.

> [...] how many other solutions are out there that just need pieced together from pre-existing research [...]

I am not certain you actually looked into the model output or why this was such an embarrassment.

> But, you know, AI Bad.

AI hype very bad. AI anthropomorphism even worse.

nurettin

3 months ago

AI great, but AI not creative, yet.

matsemann

3 months ago

You're moving the goal post.

random9749832

3 months ago

1 reply

You are telling me a language model trained on Reddit can't solve novel problems? Shocking.

Edit: we are in peak damage control phase of the hype cycle.

creativeCak3

3 months ago

Can't wait for the Buble to burst so we can get back to solving real problems (like the fact that we have very little competition in the CPU market right now, AMD is getting way too comfortable...). I do think though that when this bubble bursts it will hurt the machine learning field (which does have people doing practically useful stuff like protein folding) and investors might pull all funding because this generative nonsense(which has no real use beyond generating porn) will taint the entire field, even the parts of it that are actually useful.

phplovesong

3 months ago

1 reply

How fing obvious was it that AI slop did not do anything other than scarpe some websites.

Jweb_Guru

3 months ago

1 reply

I felt like I was going crazy when people uncritically accepted the original claim from OpenAI. Have people actually used these models?

phplovesong

3 months ago

1 reply

From what i have seen, people using AI somehow get the mindset that the AI generated result is "godlike" and the "ultimate truth". Its really, really scary and im not that hopeful for what we will see int he next decade.

Once i told a coworker that a piece if his code looked rather funky (without doing a more deep CR), and he told me its "proven correct by AI". I was stunned, and asked him if he knows how LLMs generate their responses? He was genuinely in the belief that it was in fact "artificial intelligence" and was some sort of "all knowing entity".

Jweb_Guru

3 months ago

It just drives me nuts when I see people say things like "yeah I asked ChatGPT about this extremely famous open problem, wish me luck!" Like what do you expect to happen exactly with an engine that can't even consistently keep track of what you wrote ten thousand tokens ago?

amirhirsch

3 months ago

4 replies

The sad truth about this incident is that it reveals that OpenAI does not have a serious effort to actually work on unsolved math problems.

rowanG077

3 months ago

2 replies

How so? I wouldn't put much stock into a roque employee announcing something wrong.

mrbungie

3 months ago

1 reply

That's not any employee, its their VP of Science.

amirhirsch

3 months ago

The people involved are very smart and must know that AI doing novel math is a canary for AGI. A serious effort around solving open problems would not fuck up this kind of announcement.

coldtea

3 months ago

"rogue employee"

jebarker

3 months ago

1 reply

That’s a non sequitur. They’re a fairly large organization, I’d be amazed if they don’t have multiple research sub-teams pursuing all sorts of different directions.

amirhirsch

3 months ago

It is not about whether some team somewhere is dabbling in math, it’s about institutional emphasis. When serious math work exists, it tends to be checked before being broadcast. The fact that this didn’t happen suggests that genuine mathematical rigor isn’t a central focus.

grafmax

3 months ago

3 replies

I realized they jumped the shark when they announced the pivots to ads and porn. Markets haven’t caught on yet.

zeroonetwothree

3 months ago

2 replies

They know where the money is.

grafmax

3 months ago

It’s standard practice for VC companies to enshittify after building a moat, relying on user lock-in. What’s remarkable is how quickly they’ve had to shift gears. And with this rapid pivot it’s questionable how large that moat really is.

raincole

3 months ago

I think people hugely overestimate how profitable porn (at least "actual" porn) is. Aylo (the owner of Pornhub) makes peanuts compared to Youtube or Disney.

HarHarVeryFunny

3 months ago

2 replies

The porn / sex-chat one is really disappointing. It seems they've given up even pretending that they are trying to do something beneficial for society. This is just a pure society-be-damned money grab.

bradly

3 months ago

2 replies

My hunch is that they don't have a way to stop anything, so they are creating verticals to at least contain porn, medical, higher-ed users.

HarHarVeryFunny

3 months ago

1 reply

I'm pretty sure that if they didn't deliberately chose to train on sex chat/stories, etc, then the LLM wouldn't be any good at it. The model isn't getting this capability by training on WikiPedia or Reddit.

So, it's not a matter of them not being able to do a good job of preventing the model from doing it, therefore giving up and instead encouraging it to do it (which anyways makes no sense), but rather them having chosen to train the model to do this. OpenAI is targetting porn as one of their profit centers.

derektank

3 months ago

1 reply

>The model isn't getting this capability by training on WikiPedia or Reddit

I don't know about the former, but the latter absolutely has sexually explicit material that could make the model more likely to generate erotic stories, flirty chats, etc.

HarHarVeryFunny

3 months ago

OK, maybe bad example, but it would be easy to create a classifier to identify stuff like that and omit it from the training data if they wanted to, and now that they are going to be selling this I'd assume they are explicitly seeking out and/or paying for creation of training material of this type.

j_maffe

3 months ago

Ah... The classic "If we don't do it, someone else will"

Tell that to the thousands of 18 year olds who'll be captured by this predatory service and get AI psychosis

disgruntledphd2

3 months ago

They've raised far too much money for those kinda ethics, unfortunately.

goalieca

3 months ago

4 replies

The porn pivot makes perfect sense. Porn is already quite fake and unconvincing and none of that matters.

throwacct

3 months ago

1 reply

Unfortunately, the porn pivot might be their path to "profitability".

goalieca

3 months ago

Global porn industry revenue is 100B. They won’t take 10% of that. Real humans are already selling themselves pretty cheap or free en masse.

mrbombastic

3 months ago

2 replies

It might not matter as far as profitability is concerned, ethically the second order effects will be very problematic. I am no puritan but the widespread availability of porn has already affected peoples sexual expectations greatly. AI generated porn is going to remove even more guardrails for behavior previously considered deviant, people will view and bring those expectations back to real life.

swat535

3 months ago

1 reply

This is the same argument that people used for video games, "rock music" and violent movies.

I would argue that AI generated porn might be more ethical than traditional porn because the risk of the models being abused or trafficked is virtually zero.

malfist

3 months ago

1 reply

> because the risk of the models being abused or trafficked is virtually zero.

That's not really true. Look at one if the more common uses for AI porn: taking a photo of someone and making them nude.

Deepfake porn exists and it does harm

derektank

3 months ago

2 replies

The harms associated with someone creating a deep fake of you are real but they're pretty insignificant compared to the harms associated with being sex trafficked or being exposed to an STI or being unable to find traditional employment after working in the industry.

ummonk

3 months ago

1 reply

Would you support installing public spy cams in everyone's bedrooms so as to end the demand for human trafficking in porn?

derektank

3 months ago

No? And I didn't suggest deepfakes should be legal.

I was just pointing out that when you're talking about the scale of harm caused by the existing sex industry compared to the scale of harm caused by AI generated pornographic imagery, one far outweighs the other.

throwaway-0001

3 months ago

2 replies

You couldn’t just photoshop that before ai came out?

What if you get a model that is 99% similar to your “target” - what we do with that?

malfist

3 months ago

1 reply

Sure, someone skilled could spend an hour or so photoshoping someone nude. But any teenager can do that to a classmate in 30 seconds with ai

throwaway-0001

3 months ago

So just because the poor can do what the rich could do before what it means?

Before only rich can afford to pay a pro to do photoshop. Now any poor person can get.

So why when rich can is fine and when everyone can is a problem?

acdha

3 months ago

Think about the change we saw in combat death tolls when things went from flintlock muskets to machine guns, or when battleships gave way to aircraft, and how many people died unnecessarily due to the generals who were slow to update their tactics. Deepfakes are like that because they lower the cost and improve the success rates enough to be transformative and they cause harm which can’t easily be countered. We’re not going to instantly train society to be better at media literacy and the police can’t just ignore reports of sex crimes, so we’re just having to accept that it’s easier to hurt people than it used to be.

glenstein

3 months ago

To perhaps make the same point as you in a different way, I have no issue with "deviancy" but I think it can accelerate the cycle of chasing a sugar high.

grafmax

3 months ago

People spin up ablated models for pennies. You don’t need advanced reasoning for this crap. OpenAI has 8 billion plus in burn. I guess it’s all effectively paying for brand awareness?

chanux

3 months ago

And there's no escape. The Internet was built for gambling and this.

make3

3 months ago

they're hiring people for that right now if you look at their LinkedIn posts

llm_nerd

3 months ago

2 replies

Yann LeCun's "Hoisted by their own GPTards" is fantastic.

frays

3 months ago

4 replies

I might be missing context here, but I'm surprised to see Yann using language that plays on 'retard.'

That seems out of character for him - more like something I'd expect from Elon Musk. What's the context I'm missing?

nova22033

3 months ago

1 reply

It's a play on the word petard

microtonal

3 months ago

I found this background useful as a non-native speaker: https://en.wikipedia.org/wiki/Hoist_with_his_own_petard

znkr

3 months ago

1 reply

I don’t think it’s a wordplay with the r-word, but rather a reference to the famous Shakespeare quote: “Hoist with his own petard”. It’s become an English proverb. (A petard is a smallish bomb)

card_zero

3 months ago

From péter, to fart.

Possibly entered the language as a saying due to Shakespeare being scurrilous.

blitzar

3 months ago

You have been Hoisted with your own retard

grey-area

3 months ago

Hoist (thrown in the air) by your own petard (bomb) is a common phrase.

NitpickLawyer

3 months ago

3 replies

While Yann is clearly brilliant, and has a deeper understanding of the roots of the filed than many of us mortals, I think he's been on a debbie downer trend lately, and more importantly, some of his public stances have been proven wrong in mere months / years after he made them.

I remember a public talk, where he was on the stage with some young researcher from MS. (I think it was one of the authors of the "sparks of brilliance in gpt4" paper, but not sure).

Anyway, throughout that talk he kept talking above the guy, and didn't seem to listen, even though he obviously didn't try the "raw", "unaligned" model that the folks at MS were talking about.

And he made 2 big claims:

1) LLMs can't do math. He went on to "argue" that LLMs trick you with poetry that sounds good, but is highly subjective, and when tested on hard verifiable problems like math, they fail.

2) LLMs can't plan.

Well, merely one year later, here we are. AIME is saturated (with tool use), gold at IMO, and current agentic uses clearly can plan (and follow up with the plan, re-write parts, finish tasks, etc etc).

So, yeah, I'd take everything any one singular person says with a huge grain of salt. No matter how brilliant said individual is.

Edit: oh, and I forgot another important argument that Yann made at that time:

3) because of the nature of LLMs, errors compound. So the longer you go in a session, the more errors accumulate so they devolve in nonsense.

Again, mere months later the o series of models came out, and basically proved this point moot. Turns out RL + long context mitigate this fairly well. And a year later, we have all SotA models being able to "solve" problems 100k+ tokens deep.

mrbungie

3 months ago

1 reply

Pretty sure you can fill a room with serious researchers that at the very least will doubt about 2) being solved with LLMs, especially when talking about formal planning with pure LLMs and without a planning framwork.

PS: So just we're clear: formal planning in AI </> making a coding plan in Cursor.

NitpickLawyer

3 months ago

3 replies

> with pure LLMs and without a planning framwork.

Sure, but isn't that moving the goalposts? Why shouldn't we use LLMs + tools if it works? If anything it shows that the early detractors weren't even considering this could work. Yann in particular was skeptical that long-context things can happen in LLMs at all. We now have "agents" that can work a problem for hours, with self context trimming, planning to md files, editing those plans and so on. All of this just works, today. We used to dream about it a year ago.

badsectoracula

3 months ago

1 reply

> Sure, but isn't that moving the goalposts? Why shouldn't we use LLMs + tools if it works?

Personally i do not see it like that at all as one is referring to LLMs specifically while the other is referring to LLMs plus a bunch of other stuff around them.

It is like person A claiming that GIF files can be used to play Doom deathmatches, person B responding that, no, a GIF file cannot start a Doom deathmatch, it is fundamentally impossible to do so and person A retorting that since the GIF format has a provision for advancing a frame on user input, a GIF viewer can interpret that input as the user wanting to launch Doom in deathmatch mode - ergo, GIF files can be used to play Doom deathmatches.

NitpickLawyer

3 months ago

1 reply

At the end of the day LLM + tools is asking the LLM to create a story with very specific points where "tool calls" are parts of the story, and "tool results" are like characters that provide context. The fact that they can output stories like that, with enough accuracy to make it worthwhile is, IMO, proof that they can "do" whatever we say they can do. They can "do" math by creating a story where a character takes NL and invokes a calculator, and another character provides the actual computation. Cool. It's still the LLM driving the interaction. It's still the LLM creating the story.

badsectoracula

3 months ago

I think you have that last part backwards, it is not the LLM driving the interaction, it is the program that uses the LLM to generate the instructions that does the actual driving - that is the bit that makes the LLM start doing things. Though that is just splitting hairs.

The original point was about the capabilities LLMs themselves since the context was about the technology itself, not what you can do by making them part of a larger system that combines LLMs (perhaps more than one) with other tools.

Depending on the use case and context this distinction may or may not matter, e.g. if you are trying to sell the entire system, it probably is not any more important how the individual parts of the system work than what libraries you used to make the software.

However it can be important in other contexts, like evaluating the abilities of LLMs themselves.

For example i have written a script on my PC that my window manager calls to grab whatever text i have selected on whatever application i'm running and passes it to a program i've written in llama.cpp to load Mistral Small with a prompt that makes it check for spelling and grammar mistakes which in turn produces some script-readable input that another script displays in a window.

This, in a way, is an entire system. This system helps me find grammar and spelling mistakes in the text i have selected when i'm writing documents where i care about finding such mistakes. However it is not Mistral Small that has the functionality of finding grammar and spelling mistakes in my selected text, it only provides the part that does the text checking, the rest is done by other external non-LLM pieces. An LLM cannot intercept keystrokes in my computer, it cannot grab my selected text nor can create a window on my desktop, it doesn't even understand these concepts. In a way this can be thought as a limitation from the perspective of the end result i want, but i work around it with the other software i have attached to it.

mrbungie

3 months ago

> Sure, but isn't that moving the goalposts?

It can be considered as that, sure, but anytime I see Lecun talking about this, he does recognize that you can patch your way around LLMs, the point is that you are going to hit limits eventually anyways. Specific planning benchmarks like Blockworld and the like show that LLMs (with frameworks) hit limits when they're exposed to out-of-distribution problems, and that's a BIG problem.

> We now have "agents" that can work a problem for hours, with self context trimming, planning to md files, editing those plans and so on. All of this just works, today. We used to dream about it a year ago.

I use them everyday but I still woulnd't really let them work for hours in greenfield projects. And we're seeing big vibe coders like Karpathy say the same.

pessimizer

3 months ago

> Why shouldn't we use

So weird that you immediately move the goalposts after accusing somebody of moving the goalposts. Nobody on the planet told you not to use "LLMs + tools if they work." You've moved onto an entirely different discussion with a made-up person.

> All of this just works, today.

Also, it definitely doesn't "just work." It slops around, screws up, reinserts bugs, randomly removes features, ignores instructions, lies, and sometimes you get a lucky result or something close enough that you can fix up. Nothing that should be in production.

Not that they're not very cool and very helpful in a lot of ways. But I've found them more helpful in showing me how they would do something, and getting me so angry that they nerd-snipe me into doing it correctly. I have to admit, 1) however, that sometimes I'm not sure that I'd have gotten there if I hadn't seen it not getting there, and 2) sometimes "doing it correctly" involves dumping the context and telling it almost exactly how I want something implemented.

TopfiAuthor

3 months ago

1 reply

> AIME is saturated (with tool use) [...]

But isn't tool use kinda the crux here?

Correct me if I'm mistaken, but wasn't the argument back then on whether LLMs could solve maths problems without e.g. writing python to solve? Cause when "Sparks of AGI" came out in March, prompting gpt-3.5-turbo to code solutions to assist solving maths problems over just solving them directly was already established and seemed like the path forward. Heck, it is still the way to go, despite major advancements.

Given that, was he truly mistaken on his assertions regarding LLMs solving maths? Same for "planning".

NitpickLawyer

3 months ago

AIME was saturated with tool use (i.e. 99%) for SotA models, but pure NL, no tool still perform "unreasonably well" on the task. Not 100% but still within 90%. And with lots of compute it can reach 99% as well, apparently [1] (@512 rollouts, but still)

[1] - https://arxiv.org/pdf/2508.15260

goalieca

3 months ago

3 replies

> LLMs can't do math. He went on to "argue" that LLMs trick you with poetry that sounds good, but is highly subjective, and when tested on hard verifiable problems like math, they fail.

They really can’t. Token prediction based on context does not reason. You can scramble to submit PRs to ChatGPT to keep up with the “how many Rs in blueberry” kind of problems but it’s clear they can’t even keep up with shitposters on reddit.

And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.

NitpickLawyer

3 months ago

1 reply

> They really can’t. Token prediction based on context does not reason.

Debating about "reasoning" or not is not fruitful, IMO. It's an endless debate that can go anywhere and nowhere in particular. I try to look at results:

https://arxiv.org/pdf/2508.15260

Abstract:

> Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages modelinternal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.

goalieca

3 months ago

1 reply

> Debating about "reasoning" or not is not fruitful, IMO.

Thats kind of the whole need isn’t it? Humans can automate simple tasks very effectively and cheaply already. If I ask my pro versions of LLM what the Unicode value of a seahorse is, and it shows a picture of a horse and gives me the Unicode value for a third completely related animal then it’s pretty clear it can’t reason itself out of a wet paper bag.

NitpickLawyer

3 months ago

1 reply

Sorry perhaps I worded that poorly. I meant debating about if context stuffing is or isn't "reasoning". At the end of the day, whatever RL + long context does to LLMs seems to provide good results. Reasoning or not :)

goalieca

3 months ago

Well that’s my point and what I think the engineers are screaming at the top of their lungs these days.. that it’s net negative. It makes a really good demo but hasn’t won anything except maybe translating and simple graphics generation.

travelalberta

3 months ago

> LLMs can't do math.

Ignoring conversations about 'reasoning', at a fundamental level LLMs do not 'do math' in the way that a calculator or a human does math. Sure we can train bigger and bigger models that give you the impression of this but there are proofs out there that with increased task complexity (in this case multi-digit multiplication) eventually the probability of incorrect predictions converges to 1 (https://arxiv.org/abs/2305.18654)

> And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.

The same issue applies here, really with any complex multi-step problem.

> Again, mere months later the o series of models came out, and basically proved this point moot. Turns out RL + long context mitigate this fairly well. And a year later, we have all SotA models being able to "solve" problems 100k+ tokens deep.

If you go hands on in any decent size codebase with an agent session length and context size become noticeable issues. Again, mathematically error propagation eventually leads to a 100% chance of error. Yann isn't wrong here, we've just kicked the can a little further down the road. What happens at 200k+ tokens? 500k+ tokens? 1M tokens? The underlying issue of a stochastic system isn't addressed.

>While Yann is clearly brilliant, and has a deeper understanding of the roots of the filed than many of us mortals, I think he's been on a debbie downer trend lately

As he should be. Nothing he said was wrong at a fundamental level. The transformer architecture we have now cannot scale with task complexity. Which is fine, by nature it was not designed for such tasks. The problem is that people see these models work on a subset of small scope complex projects and make claims that go against the underlying architecture. If a model is 'solving' complex or planning tasks but then fails to do similar tasks at a higher complexity it's a sign that there is no underlying deterministic process. What is more likely: the model is genuinely 'planning' or 'solving' complex tasks, or that the model has been trained with enough planning and task related examples that it can make a high probability guess?

> So, yeah, I'd take everything any one singular person says with a huge grain of salt. No matter how brilliant said individual is.

If anything, a guy like Yann with a role such as his at a Mag7 company being realistic (bearish if you are a LLM evangelist) about what the transformer architecture can do is a relief. I'm more inclined to listen to him than a guy like Altman who touts LLMs as the future of humanity meanwhile is path to profitability is AI Tik-Tok, sex chatbots, and a third party way to purchase things from Walmart during a recession.

astrange

3 months ago

> You can scramble to submit PRs to ChatGPT to keep up with the “how many Rs in blueberry” kind of problems but it’s clear they can’t even keep up with shitposters on reddit.

Nobody does that. You can't "submit PRs" to an LLM. Although if you pick up new pretraining data you do get people discussing all newly discovered problems, which is a bit of a neat circularity.

> And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.

Unsolvable in the first place. "Planning" is GOFAI metaphor-based development where they decided humans must do "planning" on no evidence and therefore if they coded something and called it "planning" it would give them intelligence.

Humans don't do or need to do "planning". Much like they don't have or need to have "world models", the other GOFAI obsession.

Palmik

3 months ago

2 replies

The original tweet was clearly misunderstood...

https://x.com/SebastienBubeck/status/1977181716457701775:

> gpt5-pro is superhuman at literature search:

> it just solved Erdos Problem #339 (listed as open in the official database https://erdosproblems.com/forum/thread/339) by realizing that it had actually been solved 20 years ago

https://x.com/MarkSellke/status/1979226538059931886:

> Update: Mehtaab and I pushed further on this. Using thousands of GPT5 queries, we found solutions to 10 Erdős problems that were listed as open: 223, 339, 494, 515, 621, 822, 883 (part 2/2), 903, 1043, 1079.

It's clearly talking about finding existing solutions to "open" problems.

The main mistake is by Kevin Weil, OpenAI CTO, who misunderstood the tweet:

https://x.com/kevinweil/status/1979270343941591525:

> you are totally right—I actually misunderstood @MarkSellke's original post, embarrassingly enough. Still very cool, but not the right words. Will delete this since I can't edit it any longer I think.

Obviously embarassing, but completely overblown reaction. Just another way for people to dunk on OpenAI :)

zozbot234

3 months ago

1 reply

"you are totally right—I actually misunderstood" ...like, seriously? Did an AI come up with this retraction, or are humans actually talking like robots now?

TopfiAuthor

3 months ago

Guess even the CTO of OpenAI relies on Anthropic models in a pinch...

TopfiAuthor

3 months ago

If holding the CTO of OpenAI accountable for his wildly inaccurate statement constitutes "dunking on OpenAI", then I'd say dunk away.

He, more than anyone else, should be able to for one parse the original statements correctly and for another maybe realize that if one of their models had accomplished what he seemed to think GPT-5 had, that may require some more scrutiny and research before posting it. That would have, after all, been a clear and incredibly massive development for the space, something the CTO of OpenAI should recognize instantly.

The amount of people that told me this is clear and indisputable proof that AGI/ASI/whatever is either around the corner or already here is far more than zero and arguing against their misunderstanding was made all the more challenging because "the CTO of OpenAI knows more than you" is quite a solid appeal to authority.

I'd recommend maybe a waiting period of 48h before any authority in any field can send a tweet, that might resolve some of the inaccuracies and the incredibly annoying need to just jump on wild bandwagons...

gpjt

3 months ago

4 replies

To be fair to the OpenAI team, if read in context the situation is at worst ambiguous.

The deleted tweet that the article is about said "GPT-5 just found solutions to 10 (!) previously unsolved Erdös problems, and made progress on 11 others. These have all been open for decades." If it had been posted stand-alone then I would certainly agree that it was misleading, but it was not.

It was a quote-tweet of this: https://x.com/MarkSellke/status/1979226538059931886?t=OigN6t..., where the author is saying he's "pushing further on this".

The "this" in question is what this second tweet is in turn quote-tweeting: https://x.com/SebastienBubeck/status/1977181716457701775?t=T... -- where the author says "gpt5-pro is superhuman at literature search: [...] it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thread/3…) by realizing that it had actually been solved 20 years ago"

So, reading the thread in order, you get

  * SebastienBubeck: "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"
  * MarkSellke: "Now it's done ten more"
  * kevinweil: "Look at this cool stuff we've done!"

I think the problem here is the way quote-tweets work -- you only see the quoted post and not anything that it in turn is quoting. Kevin Weil had the two previous quotes in his context when he did his post and didn't consider the fact that readers would only see the first level, so wouldn't have Sebastien Bubek's post in mind when they read his.

That seems like an easy mistake to entirely honestly make, and I think the pile-on is a little unfair.

card_zero

3 months ago

1 reply

So the first guy said "solved [...] by realizing that it had actually been solved 20 years ago", and the second guy said "found solutions to 10 (!) previously unsolved Erdös problems".

Previously unsolved. The context doesn't make that true, does it?

glenstein

3 months ago

1 reply

Right, and I would even go a step further and say the context from SebastienBubeck is stretching "solved" past its breaking point by equating literature research with self-bootsrapped problem solving. When it's later characterized as "previously unsolved" it's doubling down on the same equivocation.

Don't get me wrong, effectively surfacing unappreciated research is great and extremely valuable. So there's a real thing here but with the wrong headline attached to it.

watwut

3 months ago

2 replies

> Don't get me wrong, effectively surfacing unappreciated research is great and extremely valuable. So there's a real thing here but with the wrong headline attached to it.

If I said that I solved a problem, but actually I took a solution for an old book, people would call me a liar. If I was prominent person, it would be academic fraud incident. No one would be saying that "I did extremely valuable thing" or "there was a real thing here".

3form

3 months ago

If you said you "solved", yes - if you said "found a solution" however, there's ambiguity to it, which is part of the confusion here.

glenstein

3 months ago

Some of the most important advancements in the history of science came from reviewing underappreciated discoveries that already existed in the literature. Mendel's work on genetics went under appreciated for decades before being effectively rediscovered, and proved to be integral to the modern synthesis, which provided a genetic basis for evolution, and is the most important development in the history of our understanding of evolution since Darwin and Wallace's original formulation.

Henrietta Leavitt's work on the relation between a stars period of pulsation and brightness was tucked away in a Harvard Journal, which had revolutionary potential not appreciated until Hubbel recalled and applied her work years later to demonstrate galactic redshift in Andromeda, understanding that it was an entirely separate galaxy, that it was receding away from us and contributing to the bedrock of modern cosmology.

The pathogenic basis for ulcers was proposed in the 1940s, which later became instrumental to explaining data in the 1980s and led to a Nobel prize in 2005.

It is and has always been fundamental to the progress of human knowledge to not just propose new ideas but to pull pertinent ones from the literature and apply them in new contexts, and depending on the field, the research landscape can be inconceivably vast, so efficiencies in combing through it can create the scaffolding for major advancements in understanding.

So there's more going on here than "lying".

moefh

3 months ago

1 reply

> Kevin Weil had the two previous quotes in his context when he did his post and didn't consider the fact that readers would only see the first level, so wouldn't have Sebastien Bubek's post in mind when they read his.

No, Weil said he himself misunderstood Sellke's post[1].

Note Weil's wording (10 previously unsolved Erdos problems) vs. Sellke's wording (10 Erdos problems that were listed as open).

[1] https://x.com/kevinweil/status/1979270343941591525

GodelNumbering

3 months ago

Also, previous comment omitted the part that now-deleted tweet from Bubeck begins with "Science revolution via AI has officially begun...".

Frieren

3 months ago

3 replies

> "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"

Survivor bias.

I can assure you that GPT-5 fucks up even relatively easy searches. I need to have a very good idea how the results looks like and the ability to test it to be able to use any result from GPT-5.

If I throw the dice 1000 times and post about it each time that I got a double six. Am I the best dice thrower that there is?

saghm

3 months ago

One time when I was a kid my dad and I were playing Yahtzee, and he rolled five 5s on his first roll of the turn. He was absolutely stunned, and at the time I was young enough that I didn't understand just how unlikely it was. If I only I knew that I was playing against the best dice thrower!

zacmps

3 months ago

For literature search that might be ok. It doesn't need to replace any other tools, and if 1/10 it surfaces something you wouldn't have found otherwise it could be worth the time on the dud attempts.

wasabi991011

3 months ago

I'm not really sure what you mean. Literature search is about casting a wide net to make a reading list that is relevant to your research.

It is pretty hard to fuck that up, since you aren't expected to find everything anyway. The idea of "testing" and "using any result from GPT" is just, like, reading the papers and seeing if they are tangentially related.

If I may speak to my own experience, literature search has been the most productive application I've personally used, more than coding, and I've found many interesting papers and research directions with it.

OtherShrezzing

3 months ago

2 replies

Am I correct in thinking this is the 2nd such fumble by a major lab? DeepMind released their “matrix multiplication better than SOTA” paper a few months back, which suggested Gemini had uncovered a new way to optimally multiply two matrices in fewer steps than previously known. Then immediately after their announcement, mathematicians pointed out that their newly discovered SOTA had been in the literature for 30-40 years, and was almost certainly in Gemini’s training set.

glenstein

3 months ago

It's an interesting type of fumble too, because it's easy to (mistakenly!) read it as "LLM tries and fails to solve problem but thinks it solved it" when really it's being credited with originality for discovering or reiterating solutions already out there in the literature.

It sounds like the content of the solutions themselves are perfectly fine, so it's unfortunate that the headline will leave the impression that these are just more hallucinations. They're not hallucinations, they're not wrong, they're just wrongly assigned credit for existing work. Which, you know, where have we heard that one before? It's like the stylistic "borrowing" from artists, but in research form.

card_zero

3 months ago

Well, it is important that we have some technology to prevent us from going round in circles by reinventing things, such as search.

JKCalhoun

3 months ago

I try not to lose sight of the first time that I heard (some years back) that people were using this new LLM thing for DM'ing ("dungeon mastering", leading) a game of Dungeons and Dragons. I thought, this must be bullshit or some kind of witchcraft.

Definitely not anti-AI here. I think I have been disappointed though, since then, to slowly learn that they're (still) little beyond that.

Still amazing though. And better than a Google search (IMHO).

827a

3 months ago

This happening the same week as DeepMind’s seemingly legitimate AI-assisted cancer treatment breakthrough is a startlingly bad look for OpenAI.

My boss always used to say “our only policy is, don’t be the reason we need to create a new policy”. I suspect OpenAI is going to have some new public communication policies going forward.

Analemma_

3 months ago

“AGI achieved internally”

Another case of culture flowing from the top I guess.

jgalt212

3 months ago

After the circular financing schemes involving hundreds of billions of dollars were uncovered, nothing I read about the AI business and its artificial hype machine surprises me anymore.

74 more comments available on Hacker News

View full discussion on Hacker News

ID: 45633482Type: storyLast synced: 11/20/2025, 8:28:07 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN