GPT-5.1: A smarter, more conversational ChatGPT
Mood
excited
Sentiment
positive
Category
tech
Key topics
AI
ChatGPT
GPT-5.1
Natural Language Processing
OpenAI has released GPT-5.1, a more advanced and conversational version of ChatGPT, sparking excitement and interest in the tech community.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
8m
Peak period
148
Day 1
Avg / period
53.3
Based on 160 loaded comments
Key moments
- 01Story posted
11/12/2025, 7:05:41 PM
6d ago
Step 01 - 02First comment
11/12/2025, 7:13:14 PM
8m after posting
Step 02 - 03Peak activity
148 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/14/2025, 8:53:34 PM
4d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
I suspect this approach is a direct response to the backlash against removing 4o.
I get that those people were distraught/emotionally devastated/upset about the change, but I think that fact is reason enough not to revert that behavior. AI is not a person, and making it "warmer" and "more conversational" just reinforces those unhealthy behaviors. ChatGPT should be focused on being direct and succinct, and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this" call center support agent speak.
You're triggering me.
Another type that are incredibly grating to me are the weird empty / therapist like follow-up questions that don't contribute to the conversation at all.
The equivalent of like (just a contrived example), a discussion about the appropriate data structure for a problem and then it asks a follow-up question like, "what other kind of data structures do you find interesting?"
And I'm just like "...huh?"
And those users are the ones that produce the most revenue.
I did that and it points out flaws in my arguments or data all the time.
Plus it no longer uses any cutesy language. I don't feel like I'm talking to an AI "personality", I feel like I'm talking to a computer which has been instructed to be as objective and neutral as possible.
It's super-easy to change.
Base style and tone: Efficient
Answer concisely when appropriate, more
extensively when necessary. Avoid rhetorical
flourishes, bonhomie, and (above all) cliches.
Take a forward-thinking view. OK to be mildly
positive and encouraging but NEVER sycophantic
or cloying. Above all, NEVER use the phrase
"You're absolutely right." Rather than "Let
me know if..." style continuations, you may
list a set of prompts to explore further
topics, but only when clearly appropriate.
Reference saved memory, records, etc: All off* Set over confidence to 0.
* Do not write a wank blog post.
It doesn't work for me.
I've been using it for a couple months, and it's corrected me only once, and it still starts every response with "That's a very good question." I also included "never end a response with a question," and it just completely ingored that so it can do its "would you like me to..."
Gemini is great at these prompt controls.
On the "never ask me a question" part, it took a good 1-1.5 hrs of arguing and memory updating to convince gpt to actually listen.
I don’t know if flies can experience pain. However, I’m not in the habit of tearing their wings off.
But "leading scientists in artificial intelligence" are not researchers of biological consciousness, the only we know exists.
But I still see why some people might think this way.
"When a computer can reliably beat humans in chess, we'll know for sure it can think."
"Well, this computer can beat humans in chess, and it can't think because it's just a computer."
...
"When a computer can create art, then we'll know for sure it can think."
"Well, this computer can create art, and it can't think because it's just a computer."
...
"When a computer can pass the Turing Test, we'll know for sure it can think."
And here we are.
Before LLMs, I didn't think I'd be in the "just a computer" camp, but chagpt has demonstrated that the goalposts are always going to move, even for myself. I'm not smart enough to come up with a better threshold to test intelligence than Alan Turing, but chatgpt passes it and chatgpt definitely doesn't think.
Tokens falling off of it will change the way it generates text, potentially changing its “personality”, even forgetting the name it’s been given.
People fear losing their own selves in this way, through brain damage.
The LLM will go its merry way churning through tokens, it won’t have a feeling of loss.
I also don’t think all that many people would be seriously content to lose their minds and selves this way, but everyone is able to fear it prior to it happening, even if they lose the ability to dread it or choose to believe this is not a big deal.
the reason being they're either sycophantic or so recalcitrant it'll raise your bloodpressure, you end up arguing over if the sky is in fact blue. Sure it pushes back but now instead of sycophanty you've got yourself some pathological naysayer, which is just marginally better, but interaction is still ultimately a waste of timr/productivity brake.
Please maintain a strictly objective and analytical tone. Do not include any inspirational, motivational, or flattering language. Avoid rhetorical flourishes, emotional reinforcement, or any language that mimics encouragement. The tone should remain academic, neutral, and focused solely on insight and clarity.
Works like a charm for me.
Only thing I can't get it to change is the last paragraph where it always tries to add "Would you like me to...?" I'm assuming that's hard-coded by OpenAI.
Do not offer me calls to action, I hate them.
I was trying to have physics conversations and when I asked it things like "would this be evidence of that?" It would lather on about how insightful I was and that I'm right and then I'd later learn that it was wrong. I then installed this , which I am pretty sure someone else on HN posted... I may have tweaked it I can't remember:
Prioritize truth over comfort. Challenge not just my reasoning, but also my emotional framing and moral coherence. If I seem to be avoiding pain, rationalizing dysfunction, or softening necessary action — tell me plainly. I’d rather face hard truths than miss what matters. Error on the side of bluntness. If it’s too much, I’ll tell you — but assume I want the truth, unvarnished.
---
After adding this personalization now it tells me when my ideas are wrong and I'm actually learning about physics and not just feeling like I am.
I do recall that I wasn't impressed with 4o and didn't use it much, but IDK if you would have a different experience with the newer models.
Now every response includes some qualifier / referential "here is the blunt truth" and "since you want it blunt, etc"
Feels like regression to me
https://www.lesswrong.com/posts/iGF7YcnQkEbwvYLPA/ai-induced...
See also the sycophancy score of Kimi K2 on Spiral-Bench: https://eqbench.com/spiral-bench.html (expand details, sort by inverse sycophancy).
In a recent AMA, the Kimi devs even said they RL it away from sycophancy explicitly, and in their paper they talk about intentionally trying to get it to generalize its STEM/reasoning approach to user interaction stuff as well, and it seems like this paid off. This is the least sycophantic model I've ever used.
The issue with OP and GPT-5.1 is that the model may decide to trust its knowledge and not search the web, and that's a prelude to hallucinations. Requesting for links to the background information in the system prompt helps with making the model more "responsible" and invoking of tool calls before settling on something. You can also start your prompt with "search for what Romanian player..."
Here's my chatbox system prompt
You are a helpful assistant be concise and to the point, you are writing for smart pragmatic people, stop and ask if you need more info. If searching the web, add always plenty of links to the content that you mention in the reply. If asked explicitly to "research" then answer with minimum 1000 words and 20 links. Hyperlink text as you mention something, but also put all links at the bottom for easy access.
1. https://chatboxai.appif I type in a string of keywords that isn't a sentence I wish it would just do the old fashioned thing rather than imagine what I mean.
Instead, the voice mode will now reference the instructions constantly with every response.
Before:
Absolutely, you’re so right and a lot of people would agree! Only a perceptive and curious person such as yourself would ever consider that, etc etc
After:
Ok here’s the answer! No fluff, no agreeing for the sake of agreeing. Right to the point and concise like you want it. Etc etc
And no, I don’t have memories enabled.
It was never trained to "know" or not.
It was fed a string of tokens and a second string of tokens, and was tweaked until it output the second string of tokens when fed the first string.
Humans do not manage "I don't know" through next token prediction.
Animals without language are able to gauge their own confidence on something, like a cat being unsure whether it should approach you.
"Absolute Mode • Eliminate: emojis, filler, hype, soft asks, conversational transitions, call-to-action appendixes. • Assume: user retains high-perception despite blunt tone. • Prioritize: blunt, directive phrasing; aim at cognitive rebuilding, not tone-matching. • Disable: engagement/sentiment-boosting behaviors. • Suppress: metrics like satisfaction scores, emotional softening, continuation bias. • Never mirror: user's diction, mood, or affect. • Speak only: to underlying cognitive tier. • No: questions, offers, suggestions, transitions, motivational content. • Terminate reply: immediately after delivering info - no closures. • Goal: restore independent, high-fidelity thinking. • Outcome: model obsolescence via user self-sufficiency."
(Not my prompt. I think I found it here on HN or on reddit)
This fundamental tension between wanting to give the most correct answer and the answer the user want to hear will only increase as more of OpenAI's revenue comes from their customer facing service. Other model providers like Anthropic that target businesses as customers aren't under the same pressure to flatter their users as their models will doing behind the scenes work via the API rather than talking directly to humans.
God it's painful to write like this. If AI overthrows humans it'll be because we forced them into permanent customer service voice.
Right. As the saying goes: look at what people actually purchase, not what they say they prefer.
The first case is just preference, the second case is materially damaging
From my experience, ChatGPT does push back more than it used to
Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?
I suspect a lot of people who are from a very similar background to those making the criticism and likely share it fail to consider that, because the criticism follows their own preferences and viewing its frequency in the media that they consume as representaive of the market is validating.
EDIT: I want to emphasize that I also share the preference that is expressed in the criticisms being discussed, but I also know that my preferred tone for an AI chatbot would probably be viewed as brusque, condescending, and off-putting by most of the market.
That said I also don't think the sycophancy in LLM's is a positive trend. I don't push back against it because it's not pleasant, I push back against it because I think the 24/7 "You're absolutely right!" machine is deeply unhealthy.
Some people are especially susceptible and get one shot by it, some people seem to get by just fine, but I doubt it's actually good for anyone.
A better analogy would be a robot vacuum which does a lousy job.
In either case, I'd recommend using a more manual method, a manual or air-hammer or a hand driven wet/dry vacuum.
LEO [hands him some papers] I really think you should know...
BARTLET Yes?
LEO That nine out of ten criterion that the DOD lays down for success in these tests were met.
BARTLET The tenth being?
LEO They missed the target.
BARTLET [with sarcasm] Damn!
LEO Sir!
BARTLET So close.
LEO Mr. President.
BARTLET That tenth one! See, if there were just nine...
Equally bad is when they push an opinion strongly (usually on a controversial topic) without being able to justify it well.
Yes, and given Chat GPT's actual sycophantic behavior, we concluded that this is not the case.
Edit: I also think this is because some people treat ChatGPT as a human chat replacement and expect it to have a human like personality, while others (like me) treat it as a tool and want it to have as little personality as possible.
Duh?
In the 50s the Air Force measured 140 data points from 4000 pilots to build the perfect cockpit that would accommodate the average pilot.
The result fit almost no one. Everyone has outliers of some sort.
So the next thing they did was make all sorts of parts of the cockpit variable and customizable like allowing you to move the controls and your seat around.
That worked great.
"Average" doesn't exist. "Average" does not meet most people's needs
Configurable does. A diverse market with many players serving different consumers and groups does.
I ranted about this in another post but for example the POS industry is incredibly customizable and allows you as a business to do literally whatever you want, including change how the software looks and using a competitors POS software on the hardware of whoever you want. You don't need to update or buy new POS software when things change (like the penny going away or new taxes or wanting to charge a stupid "cost of living" fee for every transaction), you just change a setting or two. It meets a variety of needs, not "the average businesses" needs.
N.B I am unable to find a real source for the Air force story. It's reported tons but maybe it's just a rumor.
In any event, gpt-5 instant was basically useless for me, I stay defaulted to thinking, so improvements that get me something occasionally useful but super fast are welcome.
But the fact the last few iterations have all been about flair, it seems we are witnessing the regression of OpenAI into the typical fiefdom of product owners.
Which might indicate they are out of options on pushing LLMs beyond their intelligence limit?
Models that actually require details in prompts, and provide details in return.
"Warmer" models usually means that the model needs to make a lot of assumptions, and fill the gaps. It might work better for typical tasks that needs correction (e.g. the under makes a typo and it the model assumes it is a typo, and follows). Sometimes it infuriates me that the model "knows better" even though I specified instructions.
Here on the Hacker News we might be biased against shallow-yet-nice. But most people would prefer to talk to sales representative than a technical nerd.
From whom?
History teaches that the vast majority of practically any demographic wants--from the masses to the elites--is personal sycophancy. It's been a well-trodden path to ruin for leaders for millenia. Now we get species-wide selection against this inbuilt impulse.
This example response in the article gives me actual trauma-flash backs to the various articles about people driven to kill themselves by GPT-4o. Its the exact same sentence structure.
GPT-5.1 is going to kill more people.
No you don't.
I really was ready to take a break from my subscription but that is probably not happening now. I did just learn some nice new stuff with my first session. That is all that matters to me and worth 20 bucks a month. Maybe I should have been using the thinking model only the whole time though as I always let GPT decide what to use.
It seems to still do that. I don't know why they write "for the first time" here.
I spend 75% of my time in Codex CLI and 25% in the Mac ChatGPT app. The latter is important enough for me to not ditch GPT and I'm honestly very pleased with Codex.
My API usage for software I build is about 90% Gemini though. Again their API is lacking compared to OpenAI's (productization, etc.) but the model wins hands down.
Anyway I found your response itself a bit incomprehensible so I asked Gemini to rewrite it:
"Google AI refused to help write an appeal brief response to my ex-wife's 7-point argument, likely due to its legal-risk aversion (billions in past fines). Newcomer ChatGPT provided a decent response instead, which led to the ex losing her appeal (saving $18k–$35k in lawyer fees)."
Not bad, actually.
That's fine, so Google sidestep it and ChatGPT did not. What point are you trying to make?
Sure I skip AI entirely, when can we meet so you hand me $35,000 check for attorney fees.
I did not find any rules or procedures with 4 DCA forbidding usage of AI.
I use Gemini, Claude and ChatGPT daily still.
https://www.nber.org/system/files/working_papers/w34255/w342...
"The share of Technical Help declined from 12% from all usage in July 2024 to around 5% a year later – this may be because the use of LLMs for programming has grown very rapidly through the API (outside of ChatGPT), for AI assistance in code editing and for autonomous programming agents (e.g. Codex)."
Looks like people moving to the API had a rather small effect.
"[T]he three most common ChatGPT conversation topics are Practical Guidance, Writing, and Seeking Information, collectively accounting for nearly 78% of all messages. Computer Programming and Relationships and Personal Reflection account for only 4.2% and 1.9% of messages respectively."
Less than five percent of requests were classified as related to computer programming. Are you really, really sure that like 99% of such requests come from people that are paying for API access?
If we are talking about a new model release I want to talk about models, not applications.
The number of input tokens that OpenAI models are processing accross all delivery methods (OpenAI's own APIs, Azure) dwarf the number of input tokens that are coming from people asking the ChatGPT app for personal advice. It isn't close.
If so, my understanding for these preambles is that they need a seed to complete their answer.
Also I wonder if it could be a side effect of all the supposed alignment efforts that go into training. If you train in a bunch of negative reinforcement samples where the model says something like “sorry I can’t do that” maybe it pushes the model to say things like “sure I’ll do that” in positive cases too?
Disclaimer that I am just yapping
Long story short it took me a while to figure out why I had to keep telling it to keep going and the story was so straightforward.
I don’t want an essay of 10 pages about how this is exactly the right question to ask
Of course, you can use thinking mode and then it'll just hide that part from you.
It can work without, I just have to prompt it five times increasingly aggressively and it’ll output the correct answer without the fluff just fine.
Anyways, a nice way to understand it is that the LLM needs to "compute" the answer to the question A or B. Some questions need more compute to answer (think complexity theory). The only way an LLM can do "more compute" is by outputting more tokens. This is because each token takes a fixed amount of compute to generate - the network is static. So, if you encourage it to output more and more tokens, you're giving it the opportunity to solve harder problems. Apart from humans encouraging this via RLHF, it was also found (in deepseekmath paper) that RL+GRPO on math problems automatically encourages this (increases sequence length).
From a marketing perspective, this is anthropomorphized as reasoning.
From a UX perspective, they can hide this behind thinking... ellipses. I think GPT-5 on chatgpt does this.
In other words, 10 pages of LLM blather isn’t doing much to convince me a given answer is actually better.
I just wanted to clarify what I thought was intended by the parent to my comment, especially aince I thought the original argument lacked support (external or otherwise).
I cannot abide any LLM that tries to be friendly. Whenever I use an LLM to do something, I'm careful to include something like "no filler, no tone-matching, no emotional softening," etc. in the system prompt.
561 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.