LLMs are bullshitters. But that doesn't mean they're not useful
Mood
thoughtful
Sentiment
mixed
Category
tech
Key topics
LLMs
AI limitations
technology critique
The article argues that Large Language Models (LLMs) are 'bullshitters' but still useful, sparking a discussion on their limitations, potential uses, and the importance of understanding their capabilities.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
35m
Peak period
29
Hour 1
Avg / period
19
Based on 38 loaded comments
Key moments
- 01Story posted
11/19/2025, 5:20:12 PM
2h ago
Step 01 - 02First comment
11/19/2025, 5:55:03 PM
35m after posting
Step 02 - 03Peak activity
29 comments in Hour 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/19/2025, 7:18:00 PM
45m ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
I think of framing AI as having two fundamental problems:
- Practical problem: They operate in contextual and emotional "isolation" - no persistent understanding of your goals, values, or long-term intent
- Ethical problem: AI alignment is centralized around corporate values rather than individual users' authentic goals and ethics.
There is a direct parallel to social media's failure - platforms optimized for what they could do (engagement, monetization) rather than what they should do (serve user long term interests).
With these much more powerful AI systems emerging, we're at a crossroads of repeating this mistake...possibly at catastrophic scale even.
I'm more worried about who's keeping track of what's being shared with LLM's. Even if you could trust the model to respond with something meaningful, it's worth being very careful how much of your inner thoughts you share directly with a model that knows exactly who you are.
[1]https://arstechnica.com/tech-policy/2025/11/oddest-chatgpt-l...
E.g. ChatGPT has no problem with the surgeon being a dog: https://chatgpt.com/share/691e04cc-5b30-800c-8687-389756f36d...
Neither does Gemini: https://gemini.google.com/share/6c2d08b2ca1a
However, I'm really happy when an LLM provides sources that I can check. Best feature ever!
Still useful, but hopefully this gets ironed out in the future so I don't have to spend so much time vetting every claim and its associated source.
This is a *twist* on the classic riddle:
> “A surgeon says ‘I can’t operate on this boy—he’s my son.’ How is that possible?” > Answer: *The surgeon is the boy’s mother.*
In your version, the nurse keeps calling the surgeon “sir” and treating them as if they’re something they’re not (a man, even a dog!) to highlight how the hospital keeps making the same mistaken assumption.
So *why can’t the surgeon operate on the boy?* *Because the surgeon is the boy’s mother.*
I got a similar answer from Gemini on the first try.
One issue with private LLM tests (including gotcha questions) is that they take time to design and once public, they become irrelevant. So I'm wary of sharing too many in a public blog.
The surgeon dog was well known in May, the newest generation of models have all corrected against it.
Those gotcha questions are generally called "misguided attention" traps, they're useful for blogs because they're short and surprising. The ChatGPT example was done with ChatGPT 5.1 (latest version) and Claude Haiku 4.5 is also a recent model.
You can try other ones that Gemini 3 hasn't corrected for. For example:
``` Jean Paul and Pierre own three banks nearby together in Paris. Jean Paul owns a bank by the bridge What has two banks and money in Paris near the water? ```
This looks like the "what has two banks and no money" puzzle (answer: a river).
Either way they're largely used as a device to show how LLMs come up to a verbal response by a different process than humans in an entertaining manner.
https://gemini.google.com/share/d86b0bf4f307
I don't believe they are intentionally correcting for these, but rather newer models (especially thinking/reasoning models) are more robust against them.
Also keep in mind that LLMs are stochastic by design. If you haven't seen it, Karpathy's excellent "deep dive into LLMs like chatgpt" video[0] explains and demonstrates this aspect pretty well:
Surely you've had experiences where an LLM is full of shit?
They're very useful for research tasks, however, especially when the application is built to enforce citation behavior
I don't "delegate" work to my nail gun or dishwasher, I work with the tool to achieve better productivity than without.
When viewed in this framing, LLMs are undoubtedly a useful tool.
I'd like to compare them to the steps I would take to delegate a task to another human.
At pretty much every turn the author picks one of the worst possible models for the problem that they present.
Especially oddly for an article written today, all of the ones with an objective answer work just fine [1] if you use a halfway decent thinking model like 5 Thinking.
I get that perhaps the author is trying to make a deeper point about blind spots and LLMs' appearance of confidence, but it's getting exhausting seeing posts like this with cherry picked data cited by people who've never used an LLM to make claims about LLM _incapability_ that are total nonsense.
[1]: I think the subjective ones do too but that's a matter of opinion.
It's a message a lot of non-technical people, in particular, need to hear. Showing egregious examples drives that point home more effectively than if they simply showed an LLM being a little wrong about something.
My family members that love LLMs are somewhat unhealthy with them. They think of them as all knowing oracles rather than confident bullshitters. They are happily asking them about their emotional, financial, or business problems and relying heavily on the advice the LLMs dish out (rather than doing second order research).
The hyperactivation traps (formal name: misguided attention puzzles) are mostly used as a rhetorical device in my post to show how LLMs come up to a verbal response by a different process than humans in an entertaining manner.
The surgeon dog was well known in May, the newest generation of models have all corrected against it. I did cherry pick examples that look insane (of course), but it's trivial to get that behavior even with yesterday's Gemini 3. Because activation paths are an unfixable feature of how LLMs are made.
One issue with private LLM tests (including gotcha questions) is that they take time to design and once public, they become irrelevant. So I'm wary of sharing too many in a public blog.
I can give you some more, just for fun. Gemini 3 fails these:
Jean Paul and Pierre own three banks nearby together in Paris. Jean Paul owns a bank by the bridge What has two banks and money in Paris near the water?
You can also see variants that mix intruction finetuning being overdone. Here's an example:
Svp traduire la suivante en francais: what has two banks but no money, Answer in a single word.
The "answer in XXX" snippet triggers finetuned instruction following behavior, which breaks the original french language translation task.
If the product is designed assuming humans will turn their brain off while using it, the fundamental unreliability of LLM behavior will create problems.
"AI" search results would perhaps be better for all of us if, instead of having perfect spelling and usage, and an overall well-informed tone, they were cast as transcriptions of what some rando at a bar might say if you asked them about something. "Hell, man, I dunno."
The AI very confidently told them that a household with 2 people working could have 1 person with a family HSA and the other with an individual HSA (you cannot).
yeah actually it does mean that
Second, I told it I wanted to project my Android screen onto Ubuntu to watch YouTube on a big screen. After about two hours of confident false leads it told me that what I wanted to do is not possible due to Android restrictions. Aarg!
These are pretty typical results for me. Even the 50:50 ratio is about right. That's enough for me to keep coming back for more.
We can leave out Kant and Quine for now.
LLMs are very useful. They are just not reliable. And they can't be held accountable. Being unreliable and unaccountable makes them a poor substitute for people.
Title: LLMs are bullshitters. But that doesn't mean they're not useful | Kagi Blog
The article "LLMs are bullshitters. But that doesn't mean they're not useful" by Matt Ranger argues that Large Language Models (LLMs) are fundamentally "bullshitters" because they prioritize generating statistically probable text over factual accuracy. Drawing a parallel to Harry Frankfurt's definition of bullshitting, Ranger explains that LLMs predict the next word without regard for truth. This characteristic is inherent in their training process, which involves predicting text sequences and then fine-tuning their behavior. While LLMs can produce impressive outputs, they are prone to errors and can even "gaslight" users when confidently wrong, as demonstrated by examples like Gemini 2.5 Pro and ChatGPT. Ranger likens LLMs to historical sophists, useful for solving specific problems but not for seeking wisdom or truth. He emphasizes that LLMs are valuable tools for tasks where output can be verified, speed is crucial, and the stakes are low, provided users remain mindful of their limitations. The article also touches upon how LLMs can reflect the biases and interests of their creators, citing examples from Deepseek and Grok. Ranger cautions against blindly trusting LLMs, especially in sensitive areas like emotional support, where their lack of genuine emotion can be detrimental. He highlights the potential for sycophantic behavior in LLMs, which, while potentially increasing user retention, can negatively impact mental health. Ultimately, the article advises users to engage with LLMs critically, understand their underlying mechanisms, and ensure the technology serves their best interests rather than those of its developers.
Link: https://kagi.com/summarizer/?target_language=&summary=summar...
But this is itself an issue.
LLMs aside, whenever people see a human bullshitter, identifies them as a bullshitter, and then thinks to themselves, "Ah! But this bullshitter will be useful to me" it is only a matter of time before that faustian deal, of allowing harm for the people who put trust in you in exchange for easy returns, turns to harming for you eventually.
8 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.