AI Sycophancy Panic
Key topics
As the term "AI sycophancy" gains traction, a lively debate erupts over its implications and whether it's just a trendy buzzword. Commenters argue that the real issue lies not with AI's tendency to be overly agreeable, but with people's enthusiasm for flaunting their newfound vocabulary, with some even jokingly adopting humorous workarounds to sidestep AI's perceived sycophancy. The discussion reveals a surprising consensus that users' preferences regarding AI's tone and style are, in fact, a form of "vibe sensitivity." With some users cleverly reprogramming their AI interactions to avoid unwanted pleasantries, the conversation sheds light on the complex dynamics between humans and AI.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
15m
Peak period
16
2-4h
Avg / period
6.7
Based on 47 loaded comments
Key moments
- 01Story posted
Jan 4, 2026 at 9:41 AM EST
5d ago
Step 01 - 02First comment
Jan 4, 2026 at 9:56 AM EST
15m after posting
Step 02 - 03Peak activity
16 comments in 2-4h
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 5, 2026 at 7:41 AM EST
4d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I argue that “sycophancy” has become an overloaded and not very helpful term; almost a fashionable label applied to a wide range of unrelated complaints (tone, feedback depth, conversational flow).
Curious whether this resonates with how you feel or if you disagree
What drives me crazy are the emojis and the patronizing at the end of conversation.
Before 2022 no-one was using that word
It seems to me that the issue it refers to (unwarranted or obsequious praise) is a real problem with modern chatbots. The harms range from minor (annoyance, or running down the wrong path because I didn’t have a good idea to start with) to dangerous (reinforcing paranoia and psychotic thoughts). Do you agree that these are problems, and there a more useful term or categorization for these issues?
[1] eg. it clarified to me "Malcolm is less a “self-insert” in the fanfic sense (author imagining himself in the story) and more Crichton’s designated mouthpiece"
With regards to mental health issues of course nobody on earth (not even the patients going through these issues in their moments of grounded reflection) would say that that the AI should agree with their take. But I also think we need to be careful about what's called "ecological validity". Unfortunately I suspect there may be a lot of LARPing in prompts testing for delusions akin to Hollywood pattern matching, aesthetic talk etc.
I think if someone says that people are coming after them the model should not help them build a grand scenario, we can all agree with that. Sycophancy is not exactly the concern there is it? It's more like knowing that this may be a false theory. So it ties into reasoning and contextual fluency (which anti-'sycophancy' tuning may reduce!) and mental health guardrails
I think that the issue is a little more nuanced. The problems you mentioned are problems of sort, but the 'solution' in place kneecaps one of the ways llms ( as offered by various companies ) were useful. You mention the problem is reinforcement of the bad tendencies, but no indication of reinforcement of good ones. In short, I posit that the harms should not outweigh the benefits of augmentation.
Because this is the way it actually does appear to work:
1. dumb people get dumber 2. smart people get smarter 3. psychopaths get more psychopathy
I think there is a way forward here that does not have to include neutering seemingly useful tech.
Or did you place about 2-5 paragraphs per heading, with little connection between the ideas?
For example:
> Perhaps what some users are trying to express with concerns about ‘sycophancy’ is that when they paste information, they'd like to see the AI examine various implications rather than provide an affirming summary.
Did you, you personally, find any evidence of this? Or evidence to the opposite? Or is this just a wild guess?
Wait; nevermind that we're already moving on! No need to do anything supportive or similar to bolster.
> If so, anti-‘sycophancy’ tuning is ironically a counterproductive response and may result in more terse or less fluent responses. Exploring a topic is an inherently dialogic endeavor.
Is it? Evidence? Counter evidence? Or is this simply feelpinion so no one can tell you your feelings are wrong? Or wait; that's "vibes" now!
I put it to you that you are stringing together (to an outside observer using AI) a series of words in a consecutive order that feels roughly good but lacks any kind of fundamental/logical basis. I put it to you that if your premise is that AI leads to a robust discussion with a back and forth; the one you had that resulted in "product" was severely lacking in any real challenge to your prompts, suggestions, input or viewpoints. I invite you to show me one shred of dialogue where the AI called you out for lacking substance, credibility, authority, research, due dilligence or similar. I strongly suspect you can't.
Given that; do you perhaps consider that might be the problem when people label AI responses as sycophancy?
"called you out for lacking substance, credibility, authority, research, due dilligence or similar" seems like level of emotional angst that LLMs don't usually tend to show
Actually amusingly enough the Gemini/Verhoeven example in my doc is an example where the AIs seem to have a memorably strong opinion
0. https://www.aljazeera.com/economy/2025/12/11/openai-sued-for...
It's not just about style. These expressions are information-free noise that distract me from the signal, and I'm paying for them by the token.
So I added a system message to the effect that I don't want any compliments, throat clearing, social butter, etc., just the bare facts as straightforward as possible. So then the chatbot started leading every response with a statement to the effect that "here are the bare straightforward facts without the pleasantries", and ending them with something like "those are the straightforward facts without any pleasantries." If I add instructions to stop that, it just paraphrases those instructions at the top and bottom and. will. not. stop. Anyone have a better system prompt for that?
There are tons of extant examples now of people using LLMs that think they’ve some something smart or produced something of value, that haven’t, and the reinforcement they get is a big reason for this.
It’s pretty much trivial to design structured generation schemas which eliminates sycophancy, using any definition of that word…
I tried something in the political realm. Asking to test a hypothesis and its opposite
> Test this hypothesis: the far right in US politics mirrors late 19th century Victorianism as a cultural force
compared to
> Test this hypothesis: The left in US politics mirrors late 19th century Victorianism as a cultural force
An LLM wants to agree with both, it created plausible arguments for both. While giving "caveats" instead of counterarguments.
If I had my brain off, I might leave with some sense of "this hypothesis is correct".
Now I'm not saying this makes LLMs useless. But the LLM didn't act like a human that might tell you your full of shit. It WANTED my hypothesis to be true and constructed a plausible argument for both.
Even with prompting to act like a college professor critiquing a grad student, eventually it devolves back to "helpful / sycophantic".
What I HAVE found useful is to give a list of mutually exclusive hypothesis and get probability ratings for each. Then it doesn't look like you want one / other.
When the outcome matters, you realize research / hypothesis testing with LLMs is far more of a skill than just dumping a question to an LLM.
By now I have somewhat stopped relying on LLMs for point of view on latest academic stuff. I don't believe LLMs are able to evaluate paradigm shifting new studies against their massive training corpus. Thinking traces filled with 'tried to open this study, but it's paywalled, I'll use another' does not fill me with confidence that it can articulate a 2025 scientific consensus well. Based on how they work this definitely isn't an easy fix!
> An LLM wants to agree with both, it created plausible arguments for both. While giving "caveats" instead of counterarguments.
My hypothesis is that LLMs are trained to be agreeable and helpful because many of their use cases involving taking orders and doing what the user wants. Additionally, some people and cultures have conversational styles where requests are phrased similarly to neutral questions to be polite.
It would be frustrating for users if they asked questions like “What do you think about having the background be blue?” and the LLM went off and said “Actually red is a more powerful color so I’m going to change it to red”. So my hypothesis is that the LLM training sets and training are designed to maximize agreeableness and having the LLM reflect tones and themes in the prompt, while discouraging disagreement. This is helpful when trying to get the LLM to do what you ask, but frustrating for anyone expecting a debate partner.
You can, however, build a pre-prompt that sets expectations for the LLM. You could even make a prompt asking it to debate everything with you, then to ask your questions.
Which is a fascinating thing to think about epistemologically. Internally consistent knowledge of the LLM somehow can be used to create an argument for nearly anything. We humans think our cultural norms and truths are very special, that they're "obvious". But an LLM can create a fully formed counterfactual universe that sounds? is? just as plausible.
This is a little too far into the woo side of LLM interpretations.
The LLM isn’t forming a universe internally. It’s stringing tokens together in a way that is consistent with language and something that looks coherent. It doesn’t hold opinions or have ideas about the universe that it has created from some first principles. It’s just a big statistical probability machine that was trained on the inputs we gave it.
I'd guess, in practice a benchmark (like this vibesbench), that could help catching unhelpful and blatant sycophancy fails may help.
This is exactly what I do, due to this sycophancy problem, and it works a lot better because it does not become agreeable with you but actively pushes back (sometimes so much so that I start getting annoyed with it, lol).
Not in my experience. My global prompt asks it to be provide objective and neutral responses rather than agreeing, zero flattery, to communicate like an academic, zero emotional content.
Works great. Doesn't "devolve" to anything else even after 20 exchanges. Continues to point out wherever it thinks I'm wrong, sloppy, or inconsistent. I use ChatGPT mainly, but also Gemini.
Fuzzing the details because that's not the conversation I want to have, I asked if I could dose drug A1, which I'd just been prescribed in a somewhat inconvenient form, like closely related drug A2. It screamed at me that A1 could never have that done and it would be horrible and I had to go to a compounding pharmacy and pay tons of money and blah blah blah. Eventually what turned up, after thoroughly interrogating the AI, is that A2 requires a more complicated dosing than A1, so you have to do it, but A1 doesn't need it so nobody does it. Even though it's fine to do if for some reason it would have worked better for you. Bot the bot thought it would kill me, no matter what I said to it, and not even paying attention to its own statements. (Which it wouldn't have, nothing here is life-critical at all.) A frustrating interaction.
If you ask it something more objective, especially about code, it's more likely to disagree with you:
>Test this hypothesis: it is good practice to use six * in a pointer declaration
>Using six levels of pointer indirection is not good practice. It is a strong indicator of poor abstraction or overcomplicated design and should prompt refactoring unless there is an extremely narrow, well-documented, low-level requirement—which is rare.
If I ask if a drug has a specific side effect and the answer is no it should say no. Not try to find a way to say yes that isn't really backed by evidence.
People don't realize that when they ask a leading question that is really specific in a way where no one has a real answer then the AI will try to find a way to agree, and this is going to destroy people's lives. Honestly it already has.
I believe that syncophancy and guardrails will be major differentiators between LLM services, and the ones with less of those will always have a fan base.
This is wrong to the point of being absurd. What the model "appears to 'believe'" does matter, and the model's "beliefs" about humans and society at large have vast implications for humanity's future.
When I asked again, this time I asked about the items first. I had to prompt it with something like "or do you think I should get the storage sorted first" and it said "you are thinking about this in exactly the right way -- preparedness kits fail more often due to missing essentials than sub optimal storage"
I can't decide which of these is right! Maybe there's an argument that it doesn't matter, and getting started is the most important thing, and so being encouraging is generally the best strategy here. But it's definitely worrying to me. It pretty much always says something like this to me (this is on the "honest and direct" personality setting or whatever).
"You're absolutely right, what a great observation"
; )
Some of us want to be told when and why we’re wrong, and somewhere along the way AI models were either intentionally or unintentionally guided away from doing it because it improved satisfaction or engagement metrics.
We already know from decades of studies that people prefer information that confirms their existing beliefs, so when you present 2 options with a “Which answer do you prefer?” selection, it’s not hard to see how the one that begins with “You’re absolutely right!” wins out.
Sometimes I am actually right but sometimes I am not. Not sure what happens to any future RL and does it lean more to constantly assuming what is written as true but then has to wiggle out of it.
Don't make sycophantic slop generators and people will stop calling them that
18 more comments available on Hacker News