Chomsky and the Two Cultures of Statistical Learning (2011)
Key topics
A 2011 article by Peter Norvig critiquing Noam Chomsky's views on statistical learning has resurfaced, sparking a lively debate about the renowned linguist's legacy. While some commenters are distracted by Chomsky's association with Jeffrey Epstein, others argue that this doesn't diminish the value of Norvig's original piece. As one commenter astutely points out, discussing a person's work shouldn't be tainted by their personal life or eventual mortality, just as we continue to examine Einstein's contributions long after his passing. The discussion highlights the tension between separating a person's work from their personal actions and the ongoing relevance of their ideas.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
5d
Peak period
57
Day 6
Avg / period
28.8
Based on 115 loaded comments
Key moments
- 01Story posted
Dec 16, 2025 at 4:33 AM EST
18 days ago
Step 01 - 02First comment
Dec 20, 2025 at 9:50 PM EST
5d after posting
Step 02 - 03Peak activity
57 comments in Day 6
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 24, 2025 at 6:05 PM EST
9 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The article by Peter Norvig is still interesting.
honestly, I'm surprised Noam is even still alive (aged 97), he is not long for this world and will be gone very soon.
But his politics centers around the moral failings of the West so I think yes, if he was involved in the sexual exploitation of trafficked children, then this would devalue his criticism of the morality of the Western political system.
Essentially it can be summed as any Western action must be rationalized as evil, and any anti-west action is therefore good. This is also in line with Christian dualism so the cultural building blocks are already in place.
Then you get Khmer Rouge, Putin, Hezbollah, Iran apologetism or downright support
It's difficult to summarise so many years of writing in a few sentences but from my own reading, he pointed out
a) many things done by the US lead to death or destruction b) many of these things are justified in the name of good that doesn't stand up to scrutiny c) the US government is often hypocritical d) US citizens are heavily propagandized both for foreign policy and domestic policy e) as a US citizen, it his duty to try and oppose these actions and since he's not a citizen of Iran, he isn't in a position to do anything about Iran f) a) through d) explain why he is often seen as an apologist, to use your word, for Iran; he tries to explain, from his point of view, why Iran etc. do the things they do g) a strong support of freedom of speech and opposition to censorship, including what he regards as private censorship as opposed to merely government censorship.
He of course has very complex rationalizing but essentially he assumes the opposite of mainstream western opinion and then tries to build ideological structures upon that.
That creates a very simplified version of reality wrapped in a nice intellectual wrapping
For example, during the 2003 US invasion of Iraq, Germany and France were opposed to the invasion, leading to "Freedom Fries" to insult French opposition to the war. The British public was also opposed to the war, although the the Blair government went along with it anyway. Australia had a similar position - public opposition but government went along with it anyway. Canada official refused entry into the Iraq war. Chomsky was also opposed to the Iraq war. Does this mean that France, Germany, Canada and the British and Australian general public are "anti-Western"? Since Chomsky agreed with these countries, does that make him anti-Western or pro-Western? Does it make the US anti-Western since they proceeded with a war despite formal or popular opposition in many Western countries?
I fear you have a certain definition of the "Western" that simply excludes Western opinions that don't fit your understanding.
As to who Chomsky met him; well as part of this Epstein story, Chomsky met with former Israeli prime minister Ehud Barak. In your opinion, does this make him anti-Western? Indeed, prior to his stroke Chomsky explained that this kind of meeting is why Chomsky associated with Epstein - for the contacts.
I suspect Chomsky is just generally interested in understanding an issues and not bothered by what it's seen as, seemingly to his detriment in this Epstein story.
Why would it devalue his criticism assuming he was right?
Morality arguments are social and contextual. That 2+2 is 4 won’t change and captures some sort of eternal truth while what is deemed moral is constantly changing over time and differs across different societies and social groupings.
So morality arguments require and appeal to a particular shared sense of right and wrong. If Chomsky was guilty of sexually abusing children, then I do not share his moral foundation and so his appeals to morality arguments do not convince me.
He's also, for better and mostly worse, one of the most prominent political thinkers on the American hard left for the last half century.
There's a joke going around for a while now that you either know Chomsky for his politics, or for his work in linguistics ad discrete mathematics, and are shocked to discover his moonlighting work. I guess we can extend that to a third category of fame, or infamy.
There's also still a lot to his arguments that we are much more sample efficient and likely have some built in capacity from genetic endowment rather than strictly learned.
(I don't like Chomsky for other reasons, but having an obituary ain't no reason to disregard someone's thoughts.)
It's innuendo and guilt by association, mainly by his political opponents, both on the left and right, that are taking advantage of his inability to defend himself due to his stroke. I think many people are being _justly maligned_ by their association with Epstein, but in a way that distracts from the wider issue of what exactly does it mean when so many powerful and prominent people are found in compromising or potentially compromising situations and to what ends it served. It's US kompromat and the discussion is largely restricted to maligning people without discussing the significance of it.
In terms of Chomsky himself, given his career spanned both linguistics and politics, an honest critique would either deal with their disagreements with Chomsky like how Norvig did in this essay, or how Hitchens did over the Afghan and Iraq wars rather than saying "he had dinner with Epstein" or "he had dinner with Bannon".
In terms of the Epstein issue, the best criticism I can see is that his association with Epstein, Bannon etc. makes him a hypocrite although I don't find this personally convincing. Part of the problem for me here is that his present infirmities make it difficult for him to defend or explain himself and I find it poor form to kick the man when he's down, mainly by people who just didn't like that Chomsky didn't agree with them personally. Especially when he largely made a contribution to the debates even if one doesn't agree with him.
https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf
To Explain Or To Predict?
Nice quote
We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.
Hagerty+Srinivasan (1991)
They certainly didn't think that a better fit => "truer".
They used the term "truer" to describe a model that more accurately captures the underlying causal structure or "true" relationship between variables in a population.
As for the paper I linked, I still haven't read it closely enough to confirm that the comment below this is a good dismissal..
Unfortunately, studying the behavior of a system doesn't necessarily provide insight into why it behaves that way.
Whether that model provides "insight" (or a "cause"; I still don't know if that's supposed to mean something different) is a deeper question, and e.g. the topic of countless papers trying to make sense of LLM activations. I don't think the answer is obvious, but I found Norvig's discussion to be thoughtful. I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples.
It isn't much worth engaging with because it is unfortunately quite out of touch with (or just ignores) the core issues and ignores the major advances in causal modeling and causal modeling theory, i.e. Judea Pearl and do-calculus, structural equation modeling, counterfactuals, etc [1].
It also, IMO, makes a (highly idiosyncratic) distinction between "statistical" (meaning, trained / fitted to data) and "probabilistic" models, that doesn't really hold up too well.
I.e. probabilistic models in quantum physics are "fit" too in that the values of fundamental constants are determined by experimental data, but these "statistical" models are clearly causal models regardless. Even most quantum physical models can be argued to be causal, just the causality is probabilistic rather than absolute (i.e. A ==> B is fuzzy implication rather than absolute implication).
IMO I don't want to engage much with the arguments because it starts on the wrong foot and begins by making, in my opinion, an incoherent / unsound distinction, while also ignoring (innocently, or deliberately) the actual scientific and philosophical progress already made here.
[1] https://plato.stanford.edu/entries/causal-models/
So in the meantime, Norvig et al. have built statistical models that can do stuff like predicting whether a given sequence of words is a valid English sentence. I can invent hundreds of novel sentences and run their model, checking each time whether their prediction agrees with my human judgement. If it doesn't, then their prediction has been falsified; but these models turned out to be quite accurate. That seems to me like clear evidence of some kind of progress.
You seem unimpressed with that work. So what do you think is better, and what falsifiable predictions has it made? If it doesn't make falsifiable predictions, then what makes you think it has value?
I feel like there's a significant contingent of quasi-scientists that have somehow managed to excuse their work from any objective metric by which to evaluate it. I believe that both Chomsky and Judea Pearl are among them. I don't think every human endeavor needs to make falsifiable predictions; but without that feedback, it's much easier to become untethered from any useful concept of reality.
> You seem unimpressed with that work
I didn't say anything about Norvig's work, I was saying the linked essay is bad. It is correct that Chomsky is wrong, but is a bad essay because it tries to argue against Chomsky with a poorly-developed distinction while ignoring much stronger arguments and concepts that more clearly get at the issues. IMO the essay is also weirdly focused on language and language models, when this is a general issue about causal modeling and scientific and technological progress, and so the narrow focus here also just weakens the whole argument.
Also, Judea Pearl is a philosopher, and do-calculus is just one way to think about and work with causality. Talking about falsifiability here is odd, and sounds almost to me like saying "logic is unfalsifiable" or "modeling the world mathematically is unfalsifiable". If you meant something like "the very concept of causality is incoherent", that would be the more appropriate criticism here, and more arguable.
I feel like Norvig is coming from that standpoint of solving problems well-known to be difficult. This has the benefit that it's relatively easy to reach consensus on what's difficult--you can't claim something's easy if you can't do it, and you can't claim it's hard if someone else can. This makes it harder to waste your life on an internally consistent but useless sidetrack, as you might even agree (?) Chomsky has.
You, Chomsky, and Pearl seem to reject that worldview, instead believing the path to an important truth lies entirely within your and your collaborators' own minds. I believe that's consistent with the ancient philosophers. Such beliefs seem to me halfway to religious faith, accepting external feedback on logical consistency, but rejecting external evidence on the utility of the path. That doesn't make them necessarily bad--lots of people have done things I consider good in service of religions I don't believe in--but it makes them pretty hard to argue with.
I'm not saying LLMs are a particularly good model, just that everything else is worse. This includes Chomsky's formal grammars, which fail to capture the ways humans actually use language per Norvig's many examples. Do you disagree? If so, what model is better and why?
If you believe that some of human cognition is linguistic (even if e.g. inner monologue and spoken language are just the surface of deeper more unconscious processes), then, yes, we might say LLMs can predictively model some aspects of human cognition, but, again, they are certainly not causal models, and they are not predictive models of human cognition generally (as cognition is clearly far, far more than linguistic).
* I avoid calling LLMs "statistical" because they really aren't even that. They are not calibrated, and including a softmax and log-loss in things doesn't magically make your model statistical (especially since other loss functions and simplex mappings, e.g. sparsemax, often work better). LLMs really are more accurately just doing curve/manifold-fitting.
As I read Norvig's essay, it's about that tradeoff, of whether a simple and comprehensible but inaccurate model shows more promise than a model that's incomprehensible except in statistical terms with the aid of a computer, but far more accurate. I understand there's a large group of people who think Norvig is wrong or incoherent; but when those people have no accomplishments except within the framework they themselves have constructed, what am I supposed to think?
Beyond that, if I have a model that tells me whether a sentence is valid, then I can always try different words until I find one that makes it valid. Any sufficiently good model is thus capable of generation. Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.
As to the relationship between signals from biological neurons and ANN activations, I mean something like the paper linked below, whose authors write:
> Thus, even though the goal of contemporary AI is to improve model performance and not necessarily to build models of brain processing, this endeavor appears to be rapidly converging on architectures that might capture key aspects of language processing in the human mind and brain.
https://www.biorxiv.org/content/10.1101/2020.06.26.174482v3....
I emphasize again that I believe these results have been oversold in the popular press, but the idea that an ANN trained on brain output (including written language) might provide insight into the physical, causal structure of the brain is pretty mainstream now.
https://news.ycombinator.com/item?id=46288415
Pearl defines a ladder of causation:
1. Seeing (association) 2. Doing (intervention) 3. Imagining (counterfactuals)
In his view - most ML algos are at level 1 - they look at data and draw associations, and "agents" have started some steps in level 2 - doing.
The smartest of humans operate mostly in level (3) of abstractions - where they see things, gain experience, and later build up a "strong causal model" of the world and become capable of answering "what if" questions.
> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.
He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.
I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.
The question then becomes on of actual novelty versus the learned joint probabilities of internalised sentences/phrases/etc.
Generation or regurgitation? Is there a difference to begin with?
If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc.
And again the question being, whether there is a difference at all between the two? Novelty in the human sense is also often a process of chaining and combining existing tools and thought.
There's no point minimizing his intelligence and achievements, though.
His linguistics work (eg: grammars) is still relevant in computer science, and his cynical view of the West has merit in moderation.
The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP.
As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead.
Chomsky already was very active and well-known by 1960.
He pioneered areas in Computer Science, before Computer Science was a formal field, that we still use today.
His political views haven't changed much, but they were beneficial back when America was more naive. They are harmful now only because we suffer from an absurd excess of cynicism. If Nixon had been president in the current environment, he would have served his full term (just imagine "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?")
How would you feel about Chomsky and his influence if we ignored everything past 1990 (two years after Manufacturing Consent)?
I think Chomsky's political views were pretty terrible, especially before 1990. He spoke favorably of the Khmer Rouge. He dismissed "Murder of a Gentle Land", one of the first Western reports of their mass killing, as a "third rate propaganda tract". As the killing became impossible to completely deny, he downplayed its scale. Concern for human rights in distant lands tends to be a left-leaning concept in the West, but Chomsky's influence neutralized that here. This contributed significantly to the West's indifference, and the killing continued. (The Vietnamese communists ultimately stopped it.)
Anyone who thinks Chomsky had good political ideas should read the opinions of Westerners in Cambodia during that time. I'm not saying he didn't have other good ideas; but how many good ideas does it take to offset 1.5-2M deaths?
Today it would not matter in the least if Nixon were understood to have covered up a conspiracy to break into the DNC headquarters. Most of his party would approve of it and the rest would support him anyway so as not to damage "their side".
It's of rather limited use for natural languages.
That's really subtle, because deciding Regex universality (i.e. whether a regex accepts every input) is PSPACE-COMPLETE. And since NFAs make it efficient to decide whether a regex matches NO inputs, any attempts to combine NFAs with regex Complement would trip on a massive landmine.
The complement of a regular language is a regular language, and for any given regular language we can check whether a string is a member of that language in O(length of the string) time.
Yes, depending on how you represent your regular language, the complement operator might not work play nicely with that representation. But eg it's fairly trivial for finite state machines or when matching via Brzozowski derivatives. See https://en.wikipedia.org/wiki/Brzozowski_derivative
See also https://github.com/google/redgrep
https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
I'm not sure it required Chomsky's work.
In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative.
In more detail: Chomsky is/was not concerned with the models themselves, but rather with the distinction between statistical modelling in general, and "clean slate" models in particular on the one hand, and structural models discovered through human insight on the other.
With "clean slate" I mean models that start with as little linguistically informed structure as possible. E.g., Norvig mentions hybrid models: these can start out as classical rule based models, whose probabilities are then learnt. A random neural network would be as clean as possible.
The title should say (2011), otherwise the whole piece is confusing.
1: https://news.ycombinator.com/item?id=2591154
https://hn.algolia.com/?query=Chomsky%20and%20the%20Two%20Cu...
The oldest submission is from 15 y.o ago - that is 2010.
I resubmitted it - thinking - with the success of LLMs - felt this was worth a revisit from "how real-world scientific progress works" point of view.
Chomsky is wrong by the standards of his time and is making things worse rather than better.
It was very much the opposite of Chomsky's ideology as well. So it additionally means he's fake. BOTH on his morals and politics/activism, from both sides (ie. both helping a paedophile, and helping/entertaining a billionnaire).
So it's (yet another) case of an important figure that supposedly stands for something, not just demonstrating he stands for nothing at all, but being a disgusting human being as well.
On the contrary. Chomsky was open about his civil-libertarian principles: If you are convicted, and you complete your court-ordered obligations, you have a clean slate.
Some of his books are deeply insightful even if you decide to draw the opposite conclusion. I wouldn’t say anything would create disgust unless you had a conclusion you wanted supported before reading the book.
Regarding the Epstein thing, bizarre to bring that up when discussing his works, seems like you hate him on a personal level.
Not sure the approach holds.
Could it be this?
> https://www.youtube.com/watch?v=eIzRV4TxHo8
It's crazy how wrong Chomsky was about machine learning. Maybe the real truth is that humans are stochastic parrots who have an underlying probability distribution - and because gradient descent is so good at reproducing probability distributions - LLMs are incredibly good at reproducing language.
The answers to "why" that Chomsky pushes so hard for are very valuable to adult language learners. There are basic syntactic rules to generating broadly correct language. Having these rules discovered and explained in the simplest possible form is irreplaceable by statistical models. Neural networks, much like native speakers can say "well this just sounds right," but adult learners need a mathematical theory of how and why they can generate sentences. Yes, this changes with time and circumstances, but the simple rules and theories are there if we put the effort in to look for them.
There are many languages with a very small corpus of training data. The LLMs fail miserably at communicating with them or explaining things about their grammar, but if we look hard for the underlying theories Chomsky was looking for, we can make huge leaps and bounds in understanding how to use them.
I have a yet to witness a man so smart yet who ended up being so profoundly wrong on everything he did in his life.
Both on the linguistics side of things and on his politics.
And to see him at such an advanced age still rejecting what is an absolutely clear and painful proof that all he's done in linguistics was wrong ... how sad.
What a terrible waste of an intellect.