The Wall Confronting Large Language Models
Mood
heated
Sentiment
mixed
Category
other
Key topics
A research paper argues that large language models face fundamental limitations, sparking debate among commenters about the paper's validity, the authors' credentials, and the implications for AI research.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
6h
Peak period
129
Day 1
Avg / period
26.7
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 3, 2025 at 7:40 AM EDT
3 months ago
Step 01 - 02First comment
Sep 3, 2025 at 1:58 PM EDT
6h after posting
Step 02 - 03Peak activity
129 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 9, 2025 at 4:58 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).
Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.
Applied demon summoning is ruled by empiricism and experimentation. The best summoners in the field are the ones who have a lot of practical experience and a sharp, honed intuition for the bizarre dynamics of the summoning process. And even those very summoners, specialists worth their weight in gold, are slaves to the experiment! Their novel ideas and methods and refinements still fail more often than they succeed!
One of the first lessons you have to learn in the field is that of humility. That your "novel ideas" and "brilliant insights" are neither novel nor brilliant - and the only path to success lies through things small and testable, most of which do not survive the test.
With that, can you trust the demon summoning knowledge of someone who has never drawn a summoning diagram?
> One of the first lessons you have to learn in the field is that of humility.
I suggest then that you make your statements less confidently.
1. Sequence Models relying on a markov chain, with and without summarization to extend beyond fixed length horizons. 2. All forms of attention mechanisms/dense layers. 3. A specific Transformer architecture.
That there exists a limit on the representation or prediction powers of the model for tasks of all input/output token lengths or fixed size N input tokens/M output tokens. *Based On* a derived cost growth schedule for model size, data size, compute budgets.
Separately, I would have expected a clear literature review of existing mathematical studies on LLM capabilities and limitations - for which there are *many*. Including studies that purport that Transformers can represent any program of finite pre-determined execution length.
While it's not a requirement to have published in a field before publishing in a field. Having a coauthor who is from the target field or a peer review venue in that field as an entry point certainly raises credibility.
From my limited claim to be in either Machine Learning or Large Language Models the paper does not appear to demonstrate what it claims. The author's language addresses the field of Machine Learning and LLM development as you would a young student - which does not help make their point.
I'm not saying anything about the content, merely making a remark.
Seth Lloyd, Wolpert, Landauer, Bennet, Fredkin, Feynman, Sejnowski, Hopfield, Zechinna, parisi,mezard, and zdebvora, Crutchfeld, Preskill, Deutsch, Manin, Szilard, MacKay....
I wish someone told them to shut up about computing. And I wouldn't dare claim von Neumann as merely a physicist, but that's where he was coming from. Oh and as much as I dislike him, Wolfram.
But today, most people hold opinions about LLMs, both as to their limits and their potential, without any real knowledge of computational linguistics nor of deep learning.
Here's another example in case you still don't get the point - Schrodinger had no business talking about biology because he wasn't trained in it, right? Nevermind him being ahead of the entire field on understanding the role of "DNA"(yet undiscovered, but he correctly posited the crystal-ish structure) and information in evolution and inspiring Watson's quest to figure out DNA.
Judge ideas on the merit of the idea itself. It's not about whether they have computing backgrounds, its about the ideas.
Hell, look at the history of deep learning with Minsky's book. Sure glad everyone listened to the linguistics expert there...
> Lots of chemists and physicists like to talk about computation without having any background in it.
I'm confused. Physicists deal with computation all the time. Are you confusing computation with programming? There's a big difference. Physicists and chemists are frequently at odds with the limits of computability. Remember, Turing, Church, and even Knuth obtained degrees in mathematics. The divide isn't so clear cut and there's lots of overlaps. I think if you go look at someone doing their PhD in Programming Languages you could easily be mistake them for a mathematician.Looking at the authors I don't see why this is out of their domain. Succi[0] looks like he deals a lot with fluid dynamics and has a big focus on Lattice Boltzmann. Modern fluid dynamics is all about computability and its limits. There's a lot of this that goes into the Navier–Stokes problem (even Terry Tao talks about this[1]), which is a lot about computational reproducibility.
Coveney[2] is a harder read for me, but doesn't seem suspect. Lots of work in molecular dynamics, so shares a lot of tools with Succi (seems like they like to work together too). There's a lot of papers there, but sorting by year there's quite a few that scream "limits of computability" to me.
I can't make strong comments without more intimate knowledge of their work, but nothing here is a clear red flag. I think you're misinterpreting because this is a position paper, written in the style you'd expect from a more formal field, but also is kinda scatterd. I've only done a quick read, -- don't get me wrong, I have critiques -- but there's no red flags that warrant quick dismissal. (My background: physicist -> computational physics -> ML) There's things they are pointing to that are more discussed within the more mathematically inclined sides of ML (it's a big field... even if only a small subset are most visible). I'll at least look at some of their other works on the topic as it seems they've written a few papers.
[0] https://scholar.google.com/citations?user=XrI0ffIAAAAJ
[1] I suspect this well above the average HN reader, but pay attention to what they mean by "blowup" and "singularity" https://terrytao.wordpress.com/tag/navier-stokes-equations/
I'm saying that lots of people like to post their opinions of LLMs regardless of whether or not they actually have any competence in either computational linguistics or deep learning.
There's plenty more room to grow with agents and tooling, but the core models are only slightly bumping YoY rather than the rocketship changes of 2022/23.
From Anthropic's press release yesterday after raising another $13 billion:
"Anthropic has seen rapid growth since the launch of Claude in March 2023. At the beginning of 2025, less than two years after launch, Anthropic’s run-rate revenue had grown to approximately $1 billion. By August 2025, just eight months later, our run-rate revenue reached over $5 billion—making Anthropic one of the fastest-growing technology companies in history."
$4 billion increase in 8 months. $1 billion every two months.
If work produced by LLMs forever has to be checked for accuracy, the applicability will be limited.
This is perhaps analogous to all the "self-driving cars" that still have to be monitored by humans, and in that case the self-driving system might as well not exist at all.
understandable. the real innovation was the process/technique underlying LLMs. the rest is just programmers automating it. similar happened with blockchain, everything after was just tinkering the initial idea
Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.
So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.
(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)
"if you can implement it in a brain"
But we didn't. You have no idea how a brain works. Neither does anyone.
Your line of attack which is to dismiss from a pretend point of certainty, rather than inquiry and curiosity, seems indicative of the cog-sci/engineering problem in general. There's an imposition based in intuition/folk psychology that suffuses the industry. The field doesn't remain curious to new discoveries in neurobiology, which supplants psychology (psychology is being based, neuro is neural based). What this does is remove the intent of rhetoric/being and suggest brains built our external communication. The question is how and by what regularities. Cog-sci has no grasp of that in the slightest.
We don't understand what LLMs are doing. You can't go from understanding what a transformer is to understanding what an LLM does any more than you can go from understanding what a Neuron is to what a brain does.
I guess that you are most likely going to have cereal for breakfast tomorrow, I also guess that it's because it's your favourite.
vs
I understand that you don't like cereal for breakfast, and I understand that you only have it every day because a Dr told you that it was the only way for you to start the day in a way that aligns with your health and dietary needs.
Meaning, I can guess based on past behaviour and be right, but understanding the reasoning for those choices, that's a whole other ballgame. Further, if we do end up with an AI that actually understands, well, that would really open up creativity, and problem solving.
Why do you need to ask me, isn't a guess based on past answers good enough?
Or, do you understand that you need to know more, you need to understand the reasoning based on what's missing from that post?
There's _always_ something missing, left unsaid in every example, it's the nature of language.
As for your example, the LLM can be trained to know the underlying reasons (doctor's recommendation, etc.). That knowledge is not fundamentally different from the knowledge that someone tends to eat cereal for breakfast. My question to you, was an attempt to highlight that the dichotomy you were drawing, in your example, doesn't actually exist.
Maybe, maybe one is based on correlation, the other causation.
In either case, the results are the same, he's eating cereal for breakfast. We can know this fact without knowing the underlying cause. Many times, we don't even know the cause of things we choose to do for ourselves, let alone what others do.
On top of which, even if you think the "cause" is that the doctor told him to eat a healthy diet, do you really know the actual cause? Maybe the real cause, is that the girl he fancies, told him he's not in good enough shape. The doctor telling him how to get in shape is only a correlation, the real cause is his desire to win the girl.
These connections are vast and deep, but they're all essentially the same type of knowledge, representable by the same data structures.
Yeah, no.
Understanding the causation allows the system to provide a better answer.
If they "enjoy" cereal, what about it do they enjoy, and what other possible things can be had for breakfast that also satisfy that enjoyment.
You'll never find that by looking only at the fact that they have eaten cereal for breakfast.
And the fact that that's not obvious to you is why I cannot be bothered going into any more depth on the topic any more. It's clear that you don't have any understanding on the topic beyond a superficial glance.
Bye :)
If you think there is a threshold at which point some large enough feedforward network develops the capability to backtrack then I'd like to see your argument for it.
Simply have a deterministic Markov chain where each state is a possible value of the tape+state of the TM and which transitions accordingly.
Why does it matter how it does it or whether this is strictly LLM or LLM with tools for any practical purpose?
What you're suggesting is akin to me saying you can't build a house, then you go and hire someone to build a house. _You_ didn't build the house.
Have each of the Markov chain's states be one of 10^81 possible sudoku grids (a 9x9 grid of digits 1-9 and blank), then calculate the 10^81-by-10^81 transition matrix that takes each incomplete grid to the valid complete grid containing the same numbers. If you want you could even have it fill one square at a time rather than jump right to the solution, though there's no need to.
Up to you what you do for ambiguous inputs (select one solution at random to give 1.0 probability in the transition matrix? equally weight valid solutions? have the states be sets of boards and map to set of all valid solutions?) and impossible inputs (map to itself? have the states be sets of boards and map to empty set?).
Could say that's "cheating" by pre-computing the answers and hard-coding them in a massive input-output lookup table, but to my understanding that's also the only sense in which there's equivalence between Markov chains and LLMs.
Edit: I see you added questions for the ambiguities but modulo those choices your solution will almost work b/c it is not extensionally equivalent entirely. The transition graph and solver are almost extensionally equivalent but whereas the Prolog solver will backtrack there is no backtracking in the Markov chain and you have to re-run the chain multiple times to find all the solutions.
If you want it to give all possible solutions at once, you can just expand the state space to the power-set of sudoku boards, such that the input board transitions to the state representing the set of valid solved boards.
I think it can be done. I started a chatbot that works like this some time back (2024) but paused work on it since January.
In brief, you shorten the context by discarding the context that didn't work out.
Of course this would be pointless, but it demonstrates that a system where an LLM provides the logic can backtrack, as there's nothing computationally special about backtracking.
That current UIs to LLMs are set up for conversation-style use that makes this harder isn't an inherent limitation of what we can do with LLMs.
Now consider that you have a probability for each state instead of a definite state. The transitions of the Turing machine induce transitions of the probabilities. These transitions define a Markov chain on a N^T dimensional probability space.
Is this useful? Absolutely not. It's just a trivial rewriting. But it shows that high dimensional spaces are extremely powerful. You can trade off sophisticated transition rules for high dimensionality.
General intelligence may not be SAT/SMT solving but it has to be able to do it, hence, backtracking.
Today I had another of those experiences of the weaknesses of LLM reasoning, one that happens a lot when doing LLM-assisted coding. I was trying to figure out how to rebuild some CSS after the HTML changed for accessibility purposes and got a good idea for how to do it from talking to the LLM but at that point the context was poisoned, probably because there was a lot of content about the context describing what we were thinking about at different stages of the conversation which evolved considerably. It lost its ability to follow instructions and I'd tell it specifically to do this or do that and it just wouldn't do it properly and this happens a lot if a session goes on too long.
My guess is that the attention mechanism is locking on to parts of the conversation which are no longer relevant to where I think we're at and in general the logic that considers the variation of either a practice (instances) or a theory over time is a very tricky problem and 'backtracking' is a specific answer for maintaining your knowledge base across a search process.
Back when I was thinking about commonsense reasoning with logic it was obviously a much more difficult problem to add things like "P was true before time t", "there will be some time t in the future such at P is true", "John believes Mary believes that P is true", "It is possible that P is true", "there is some person q who believes that P is true", particularly when you combine these qualifiers. For one thing you don't even have a sound and complete strategy for reasoning over first-order logic + arithmetic but you also have a combinatorical explosion over the qualifiers.
Back in the day I thought it was important to have sound reasoning procedures but one of the reasons none of my foundation models ever became ChatGPT was that I cared about that and I really needed to ask "does change C cause an unsound procedure to get the right answer more often?" and not care if the reasoning procedure was sound or not.
Just to add some more color to this. For problems that completely reduce to formal methods or have significant subcomponents that involve it, combinatorial explosion in state-space is a notorious problem and N variables is going to stick you with 2^N at least. It really doesn't matter whether you think you're directly looking at solving SAT/search, because it's too basic to really be avoided in general.
When people talk optimistically about hallucinations not being a problem, they generally mean something like "not a problem in the final step" because they hope they can evaluate/validate something there, but what about errors somewhere in the large middle? So even with a very tiny chance of hallucinations in general, we're talking about an exponential number of opportunities in implicit state-transitions to trigger those low-probability errors.
The answer to stuff like this is supposed to be "get LLMs to call out to SAT solvers". Fine, definitely moving from state-space to program-space is helpful, but it also kinda just pushes the problem around as long as the unconstrained code generation is still prone to hallucination.. what happens when it validates, runs, and answers.. but the spec was wrong?
Personally I'm most excited about projects like AlphaEvolve that seem fearless about hybrid symbolics / LLMs and embracing the good parts of GOFAI that LLMs can make tractable for the first time. Instead of the "reasoning is dead, long live messy incomprehensible vibes", those guys are talking about how to leverage earlier work, including things like genetic algorithms and things like knowledge-bases.[0] Especially with genuinely new knowledge-discovery from systems like this, I really don't get all the people who are still staunchly in either an old-school / new-school camp on this kind of thing.
[0]: MLST on the subject: https://www.youtube.com/watch?v=vC9nAosXrJw
Seen that way the text is a set of constraints with a set of variables for all the various choices you make determining it. And of course there is a theory of the world such that "causes must precede their effects" and all the world knowledge about instances such as "Chicago is in Illinois".
The problem is really worse than that because you'll have to parse sentences that weren't generated by sound reasoners or that live in a different microtheory, deal with situations that are ambiguous anyway, etc. Which is why that program never succeeded.
[1] in short: database rows
The fundamental autoregressive architecture is absolutely capable of backtracking… we generate next token probabilities, select a next token, then calculate probabilities for the token thereafter.
There is absolutely nothing stopping you from “rewinding” to an earlier token, making a different selection and replaying from that point. The basic architecture absolutely supports it.
Why then has nobody implemented it? Maybe, this kind of backtracking isn’t really that useful.
https://arxiv.org/html/2502.04404v1
https://arxiv.org/abs/2306.05426
And I was wrong that nobody has implemented it, as these papers prove people have… it is just the results haven’t been sufficiently impressive to support the transition from the research lab to industrial use - or at least, not yet
Take a finite tape Turing machine with N states and tape length T and N^T total possible tape states.
Now consider that you have a probability for each state instead of a definite state. The transitions of the Turing machine induce transitions of the probabilities. These transitions define a Markov chain on a N^T dimensional probability space.
Is this useful? Absolutely not. It's just a trivial rewriting. But it shows that high dimensional spaces are extremely powerful. You can trade off sophisticated transition rules for high dimensionality.
You _can_ continue this line of thought though in more productive directions. E.g. what if the input of your machine is genuinely uncertain? What if the transitions are not precise but slightly noisy? You'd expect that the fundamental capabilities of a noisy machine wouldn't be that much worse than those of a noiseless ones (over finite time horizons). What if the machine was built to be noise resistant in some way?
All of this should regularize the Markov chain above. If it's more regular you can start thinking about approximating it using a lower rank transition matrix.
The point of this is not to say that this is really useful. It's to say that there is no reason in my mind to dismiss the purely mathematical rewriting as entirely meaningless in practice.
This is impossible. When driven by a sinusoid, a linear system will only ever output a sinusoid with exactly the same frequency but a different amplitude and phase regardless of how many states you give it. A non-linear system can change the frequency or output multiple frequencies.
Of course, in practice you don't actually get arbitrary degree polynomials but some finite degree, so the approximation might still be quite bad or inefficient.
If you limit yourself to Markov chains where the full transition matrix can be stored in a reasonable amount of space (which is the kind of Markov chain that people usually have in mind when they think that Markov chains are very limited), LLMs cannot be represented as such a Markov chain.
If you want to show limitations of LLMs by reducing them to another system of computation, you need to pick one that is more limited than LLMs appear to be, not less.
This is not true. Do you mean anything that is possible to compute? If yes than you missed the point entirely.
The whole analogy is just pointless. You might as well call an elephant an Escalade because they weigh the same.
¹ Depending on context window implementation details, but that is the maximum, because the states n tokens back were computed from the n tokens before that. The minimum of course is an order n-1 Markov chain.
I would like to comment that there are a lot of papers out there on what transformers can or can't do that are misleading, often misunderstood, or abstract so far from transformers as implemented and used that they are pure theory.
Ability to win a gold medal as if they were scored similarly to how humans are scored?
or
Ability to win a gold medal as determined by getting the "correct answer" to all the questions?
These are subtly two very different questions. In these kinds of math exams how you get to the answer matters more than the answer itself. i.e. You could not get high marks through divination. To add some clarity, the latter would be like testing someone's ability to code by only looking at their results to some test functions (oh wait... that's how we evaluate LLMs...). It's a good signal but it is far from a complete answer. It very much matters how the code generates the answer. Certainly you wouldn't accept code if it does a bunch of random computations before divining an answer.
The paper's answer to your question (assuming scored similarly to humans) is "Don’t count on it". Not a definitive "no" but they strongly suspect not.
> Any discrete-time computation (including backtracking search) becomes Markov if you define the state as the full machine configuration. Thus “Markov ⇒ no reasoning/backtracking” is a non sequitur. Moreover, LLMs can simulate backtracking in their reasoning chains. -- GPT-5
> The observable reality is that LLMs can do mathematical reasoning
I still can't get these machines to reliably perform basic subtraction[0]. The result is stochastic, so I can get the right answer, but have yet to reproduce one where the actual logic is correct[1,2]. Both [1,2] perform the same mistake and in [2] you see it just say "fuck it, skip to the answer" > You cannot counter observable reality
I'd call [0,1,2] "observable". These types of errors are quite common, so maybe I'm not the one with lying eyes.[0] https://chatgpt.com/share/68b95bf5-562c-8013-8535-b61a80bada...
[1] https://chatgpt.com/share/68b95c95-808c-8013-b4ae-87a3a5a42b...
[2] https://chatgpt.com/share/68b95cae-0414-8013-aaf0-11acd0edeb...
- Gemini 2.5 Pro[0], the top model on LLM Arena. This SOTA enough for you? It even hallucinated Python code!
- Claude Opus 4.1, sharing that chat shares my name, so here's a screenshot[1]. I'll leave that one for you to check.
- Grok4 getting the right answer but using bad logic[2]
- Kimi K2[3]
- Mistral[4]
I'm sorry, but you can fuck off with your goal post moving. They all do it. Check yourself. > I am being serious
Don't lie to yourself, you never werePeople like you have been using that copy-paste piss-poor logic since the GPT-3 days. The same exact error existed since those days on all those models just as it does today. You all were highly disingenuous then, and still are now. I know this comment isn't going to change your mind because you never cared about the evidence. You could have checked yourself! So you and your paperclip cult can just fuck off
[0] https://g.co/gemini/share/259b33fb64cc
[2] https://grok.com/s/c2hhcmQtNA%3D%3D_e15bb008-d252-4b4d-8233-...
[4] https://chat.mistral.ai/chat/8e94be15-61f4-4f74-be26-3a4289d...
There's lots of people doing theory in ML and a lot of these people are making strides which others stand on (ViT and DDPM are great examples of this). But I never expect these works to get into the public eye as the barrier to entry tends to be much higher[1]. But they certainly should be something more ML researchers are looking at.
That is to say: Marcus is far from alone. He's just loud
[0] I'll never let go how Yi Tay said "fuck theorists" and just spent his time on Twitter calling the KAN paper garbage instead of making any actual critique. There seems to be too many who are happy to let the black box remain a black box because low level research has yet to accumulate to the point it can fully explain an LLM.
[1] You get tons of comments like this (the math being referenced is pretty basic, comparatively. Even if more advanced than what most people are familiar with) https://news.ycombinator.com/item?id=45052227
Besides, it is patently false. Not every Markov chain is an LLM, an actual LLM outputs human-readable English, while the vast majority of Markov chains do not map onto that set of models.
I read your link btw and I just don't know how someone can do all that work and not establish the Markov Property. That's like the first step. Speaking of which, I'm not sure I even understand the first definition of your link. I've never heard the phrase "computably countable" before, but I have head "computable number," which these numbers are countable. This does seem to be what it is referring to? So I'll assume that? (My dissertation wasn't on models of computation, it was on neural architectures) In 1.2.2 is there a reason for strictly uniform noise? It also seems to run counter to the deterministic setting.
Regardless, I agree with Calf, it's very clear MCs are not equivalent to LLMs. That is trivially a false statement. But the question of if an LLM can be represented via a MC is a different question. I did find this paper on the topic[0], but I need to give it a better read. Does look like it was rejected from ICLR[1], though ML review is very noisy. Including the link as comments are more informative than the accept/reject signal.
(@Calf, sorry, I didn't respond to your comment because I wasn't trying to make a comment about the relationship of LLMs and MCs. Only that there was more fundamental research being overshadowed)
Neural networks are stateless, the output only depends on the current input so the Markov property is trivially/vacuously true. The reason for the uniform random number for sampling from the CDF¹ is b/c if you have the cumulative distribution function of a probability density then you can sample from the distribution by using a uniformly distributed RNG.
¹https://stackoverflow.com/questions/60559616/how-to-sample-f...
Or the inverse of this? That all Markov Chains are Neural Networks? Sure. Well sure, here's my transition matrix [1].
I'm quite positive an LLM would be able to give you more examples.
> the output only depends on the current input so the Markov property is trivially/vacuously true.
It's pretty clear you did not get your PhD in ML. > The reason for the uniform random number
I think you're misunderstanding. Maybe I'm misunderstanding. But I'm failing to understand why you're jumping to the CDF. I also don't understand why this answers my question since there are other ways to sample from a distribution knowing only its CDF and without using the uniform distribution. I mean you can always convert to the uniform distribution and there's lots of tricks to do that. Or I mean the distribution in that SO post is the Rayleigh Distribution so we don't even need to do that. My question was not about that uniform is clean, but that it is a requirement. But this just doesn't seem relevant at all.I would love to tell you that I don't meet many people working in AI that share this sentiment, but I'd be lying.
And just for fun, here's a downvoted comment of mine, despite my follow-up comments that evidence my point being upvoted[1] (I got a bit pissed in that last one). The point here is that most people don't want to hear the truth. They are just glossing over things. But I think the two biggest things I've learned from the modern AI movement is: 1) gradient descent and scale are far more powerful than I though, 2) I now understand how used car salesmen are so effective on even people I once thought smart. People love their sycophants...
I swear, we're going to make AGI not by making the AI smarter but by making the people dumber...
GPT5 said it thinks it's fixable when I asked it:
>Marcus is right that LLMs alone are not the full story of reasoning. But the evidence so far suggests the gap can be bridged—either by scaling, better architectures, or hybrid neuro-symbolic approaches.
Unless you either claim that humans can't do logical reasoning, or claim humans exceed the Turing computable, then given you can trivially wire an LLM into a Turing complete system, this reasoning is illogical due to Turing equivalence.
And either of those two claims lack evidence.
> you can trivially wire an LLM into a Turing complete system
Please don't do the "the proof is trivial and left to the reader"[0].If it is so trivial, show it. Don't hand wave, "put up or shut up". I think if you work this out you'll find it isn't so trivial...
I'm aware of some works but at least every one I know of has limitations that would not apply to LLMs. Plus, none of those are so trivial...
Since temperature zero makes it deterministic, you only need to test one step for each state and symbol combination.
Are you suggesting you don't believe you can't make a prompt that successfully encodes 6 trivial state transitions?
Either you're being intentionally obtuse, or you don't understand just how simple a minimal Turing machine is.
> Are you suggesting you don't believe you can't make a prompt that successfully encodes 6 trivial state transitions?
Please show it instead of doubling down. It's trivial, right? So it is easier than responding to me. That'll end the conversation right here and now.Do I think you can modify an LLM to be a Turing Machine, yeah. Of course. But at this point it doesn't seem like we're actually dealing with an LLM anymore. In other comments you're making comparisons to humans, are you suggesting humans are deterministic? If not, well I see a flaw with your proof.
> That'll end the conversation right here and now.
We both know that isn't true, because it is so trivial that if you had any intention of being convinced, you'd have accepted the point already.
Do you genuinely want me to believe that you think an LLM can't act as a simple lookup from 6 keys (3 states, 2 symbols) to 6 tuples?
Because that is all it takes to show that an LLM + a loop can act like a Turing machine given the chance.
If you understand Turing machines, this is obvious. If you don't, even executing the steps personally per the example I gave in another comment is not likely to convince you.
> Do I think you can modify an LLM to be a Turing Machine, yeah. Of course.
There's no need to modify one. This can be done by enclosing an LLM in simple scaffolding, or you can play it out in a chat as long as you can set temperature to 0 (it will work without that as well to an extent, but you can't guarantee that it will keep working)
> But at this point it doesn't seem like we're actually dealing with an LLM anymore.
Are humans no longer human because we can act like a Turing machine?
The point is that anything that is Turing complete is computationally equivalent to anything else that is Turing complete, so demonstrating Turing-completeness is, absent any evidence that it is possible to compute functions outside the Turing computable, sufficient for it to be reasonable to assert equivalence in computational power.
The argument is not that any specific given LLM is capable of reasoning like a human, but to argue that there is no fundamental limit preventing LLMs from reasoning like a human.
> are you suggesting humans are deterministic?
I'm outright claiming we don't know of any mechanism by which we can calculate functions exceeding the Turing computable, nor have we ever seen evidence of it, nor do we know what that would even look like.
If you have any evidence that we can, or any evidence it is even possible - something that'd get you a Nobel Prize if you could show it - then by all means, enlighten us.
It's Searle's Chinese Room scenario all over again, which everyone seems to have forgotten amidst the bs marketing storm around LLMs. A person with no knowledge of Chinese following a set of instructions and reading from a dictionary translating texts is a substitute for hiring a translator who understands chinese, however we would not claim that this person understands Chinese.
An LLM hooked up to a Turing Machine would be similar wrt to logical reasoning. When we claim someone reasons logically we usually don't imagine they randomly throw ideas at the wall and then consult outputs to determine if they reasoned logically. Instead, the process of deduction makes the line of reasoning decidedly not stochastic. I can't believe we've gotten to such a mad place that basic notions like that of logical deduction are being confused for stochastic processes. Ultimately, I would agree that it all comes back to the problem of other minds and you either take a fully reductionist stance and claim the brain and intellection is nothing more than probabilistic neural firing or you take a non-reductionist stance and assume there may be more to it. In either case, I think that claiming that LLMs+tools are equivalent to whatever process humans perform is kind of silly and severely underrated what humans are capable of^1.
1: Then again, this has been going on since the dawn of computing, which has always put forth its brain=computer metaphors more on grounds of reducing what we mean by "thought" than by any real substantively justified connection.
I definitely imagine that and I'm surprised to hear you don't. To me it seems obvious that this is how humans reason logically. When you're developing a complex argument, don't you write a sloppy first draft then review to check and clean up the logic?
And you failed to understand my argument. You are a Turing machine. I am a Turing machine. The LLM in a loop is a Turing machine.
Unless you can show evidence that
unlike the LLMs* we can execute more than the Turing computable, the theoretical limits on our reasoning are exactly the same as that of the LLM.Absent any evidence at all that we can solve anything outside of the Turing computable, or that any computable function exists outside the Turing computable, the burden of proof is firmly in those making such an outrageous assumption to produce at least a single example of such a computation.
This argumebt doesn't mean any given LLM is capable of reasoning at the level of a human on its own any more than it means a given person is able to translate Chinese on its own, but it does mean there's no basis in any evidence for claiming no LLM can be made to reason just like a human any more than there's a basis for claiming no person can learn Chinese.
> When we claim someone reasons logically we usually don't imagine they randomly throw ideas at the wall and then consult outputs to determine if they reasoned logically
This isn't how LLMs work either, so this is entirely irrelevant.
From Wikipedia:
Suppose that the program simulated in fine detail the action of every neuron in the brain of a Chinese speaker.[83][w] This strengthens the intuition that there would be no significant difference between the operation of the program and the operation of a live human brain. Not a useful definition of thinking if you ask me.
Searle replies that such a simulation does not reproduce the important features of the brain—its causal and intentional states. He is adamant that "human mental phenomena [are] dependent on actual physical–chemical properties of actual human brains."[26]
Assertion is not an argument
2. Agentic AI already does this in the way that you do it.
There's a formal equivalence between Markov chains and literally any system. The entire world can be viewed as a Markov chain. This doesn't tell you anything of interest, just that if you expand state without bound you eventually get the Markov property.
Why can't an LLM do backtracking? Not only within its multiple layers but across token models as reasoning models already do.
You are a probabilistic generative model (If you object, all of quantum mechanics is). I guess that means you can't do any reasoning!
40 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.