Why Do Llms Freak Out Over the Seahorse Emoji?

3 months ago

2 replies

It correctly represents "seahorse emoji" internally AND it has in-built (but factually incorrect) knowledge that this emoji exists.

Example: "Is there a lime emoji?" Since it believes the answer is no, it doesn't attempt to generate it.

catigula

3 months ago

I feel like you're attesting to interior knowledge about a LLM's state that seems impossible to have.

ichik

3 months ago

Was the choice of example meaningful? Lime emoji does exist[0]

[0]: https://emojipedia.org/lime

Gigachad

3 months ago

4 replies

The fact that it's looking back and getting confused about what it just wrote is something I've never seen in LLMs before. I tried this on Gemma3 and it didn't get confused like this. It just said yes there is one and then sends a horse emoji.

Uehreka

3 months ago

2 replies

I’ve definitely seen Claude Code go “[wrong fact], which means [some conclusion]. Wait—hold on, wrong fact is wrong.” On the one hand, this is annoying. On the other hand, if the LLM is going to screw up (presumably preventing this is not in the cards) then I’m glad it can catch its own mistakes.

userbinator

3 months ago

9 replies

On the other hand, if the LLM is going to screw up (presumably preventing this is not in the cards) then I’m glad it can catch its own mistakes.

The odd thing is why it would output its own mistakes, instead of internally revising until it's actually satisfied.

112233

3 months ago

2 replies

There is no mechanism in transformer architecture for "internal" thinking ahead, or hierarchical generation. Attention only looks back from current token, ensuring that the model always falls into local maximum, even if it only leads to bad outcomes.

[1]: https://www.anthropic.com/research/tracing-thoughts-language...

3 months ago

1 reply

That's what reasoning models are for. You can get most of the benefit by saying an answer once in the reasoning section, because then it can read over it when it outputs it again in the answer section.

It could also have a "delete and revise" token, though you'd have to figure out how to teach it to get used.

112233

3 months ago

Given how badly most models degrade once reaching a particular context size (any whitepapers on this welcome), reasoning does seem like quick hack, instead of a thought out architecture.

ijk

3 months ago

1 reply

Not strictly true: while this was previously believed to be the case, Anthropic demonstrated that transformers can "think ahead" in some sense, for example when planning rhymes in a poem [1]:

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

They described the mechanism that it uses internally for planning [2]:

> Language models are trained to predict the next word, one word at a time. Given this, one might think the model would rely on pure improvisation. However, we find compelling evidence for a planning mechanism.

> Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.

[2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...

112233

3 months ago

Thank you for these links! Their "circuits" research is fascinating. In the example you mention, note how the planned rhyme is piggybacking on the newline token. The internal state that the emergent circuits can use is 1:1 mapped to the tokens. Model cannot trigger an insertion of a "null" token for the purpose of storing this plan-ahead information during inference. Neither there are any sort of "registers" available aside from the tokens. The "thinking" LLMs are not quite that, because the thinking tokens are still forced to become text.

elliotto

3 months ago

4 replies

I do this all the time. I start writing a comment then think about it some more and realize halfway through that I don't know what I'm saying

I have the luxury of a delete button - the LLM doesn't get that privilege.

VMG

3 months ago

2 replies

Isn't that what thinking mode is?

drdeca

3 months ago

AIUI, they generally do all of that at the beginning. Another approach, I suppose, could be to have it generate a second pass? Though that would probably ~double the inference cost.

elliotto

3 months ago

I tried it with thinking mode and it seems like it spiraled wildly internally, then did a web search and worked it out.

https://chatgpt.com/share/68e3674f-c220-800f-888c-81760e161d...

godshatter

3 months ago

If you didn't have the luxury of a delete button, such as when you're just talking directly to someone IRL, you would probably say something like "no, wait, that doesn't make any sense, I think I'm confusing myself" and then either give it another go or just stop there.

I wish LLMs would do this rather than just bluster on ahead.

What I'd like to hear from the AI about seahorse emojis is "my dataset leads me to believe that seahorse emojis exist... but when I go look for one I can't actually find one."

I don't know how to get there, though.

pixl97

3 months ago

An LLM is kind of like a human where every thought they had comes out of their mouth.

Most of us humans would sound rather crazy if we did that.

krackers

3 months ago

There have been attempts to give LLMs backspace tokens. Since no frontier model uses it I can only guess it doesn't scale as well as just letting it correct itself in COT

https://arxiv.org/abs/2306.05426

Uehreka

3 months ago

1 reply

It’s a lot easier if you (I know I know) stop thinking of them as algorithms and anthropomorphize them more. People frequently say stuff like this, and its pretty clear that our minds process thoughts differently when we directly articulate them than if we act on “latent thoughts” or impulses.

Yell at me all you want about how “LLMs don’t think”, if a mental model is useful, I’m gonna use it.

3 months ago

It’s only a useful mental model if it leads to useful predictions. Otherwise it’s a “just so” story.

Swizec

3 months ago

1 reply

> The odd thing is why it would output its own mistakes, instead of internally revising until it's actually satisfied.

Happens to me all the time. Sometimes in a fast-paced conversation you have to keep talking while you’re still figuring out what you’re trying to say. So you say something, realize it’s wrong, and correct yourself. Because if you think silently for too long, you lose your turn.

https://arxiv.org/abs/2502.04404

3 months ago

1 reply

That’s probably not the same reason the LLM is doing so though.

9dev

3 months ago

2 replies

Are you sure? Because LLMs definitely have to respond to user queries in time to avoid being perceived as slow. Therefore, thinking internally for too long isn’t an option either.

Sharlin

3 months ago

1 reply

There’s no "thinking internally" in LLMs. They literally "think" by outputting tokens. The "thinking modes" supported by online services are just the LLM talking to itself.

9dev

3 months ago

1 reply

That's not what I meant. "Thinking internally" referred to the user experience only, where the user is waiting for a reply from the model. And they are definitely optimised to limit that time.

Sharlin

3 months ago

I’m not sure what you meant then.

There’s no waiting for reply, there’s only the wait between tokens output, which is fixed and mostly depends on hardware and model size. Inference is slower on larger models, but so is training, which is more of a bottleneck than user experience.

The model cannot think before it starts emitting tokens, the only way for it to "think" privately is by the interface hiding some of its output from the user, which is what happens in "think longer" and "search the web" modes.

If a online LLM doesn’t begin emitting a reply immediately, more likely the service is waiting for available GPU time or something like that, and/or prioritizing paying customers. Lag between tokens is also likely caused by large demand or throttling.

Of course there are many ways to optimize model speed that also make it less smart, and maybe even SOTA models have such optimizations these days. Difficult to know because they’re black boxes.

rcxdude

3 months ago

LLMs spend a fixed amount of effort on each token they output, and in a feedforward manner. There's no recursion in the network other than through predicting predicated on the token that it just output. So it's not really time pressure in the same way that you might experience it, but it makes sense that sometimes the available compute is not enough for the next token (and sometimes it's excessive). Thinking modes try to improve this by essentially allowing the LLM to 'talk to itself' before sending anything to the user.

kingstnap

3 months ago

1 reply

It can't internally rewise. The last generation produces a distribution and sometimes the wrong answer gets sampled.

There is no "backspace" token, although it would be cool and fancy if we had that.

The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.

ijk

3 months ago

There's been a few attempts at training a backspace token, though.

e.g.:

https://arxiv.org/abs/2306.05426

grrowl

3 months ago

You're describing why reasoning is such a big deal. It can do this freakout in a safe, internal environment, and once it's recent output is confident enough flip into the "actual output" mode.

[1]: https://proceedings.neurips.cc/paper_files/paper/2023/file/f...

3 months ago

If you did hide its thinking it could do that. But I'm pretty sure what happens here is that it has to go through those tokens for it to be clear that it's doing things wrong.

What I think that happens:

1. There's a question about a somewhat obscure thing.

2. LLM will never know the answer for sure, it has access to this sort of statistical, probability based compressed database on all the facts of the World. Because this allows to store more facts by relating things to each other, but never with 100% certainty.

3. There are particular obscure cases where it hits its initial "statistical intuition" that something is true, so it starts outputting its thoughts as expected for a question where something is likely true. Perhaps you could analyze what it's indicating probabilities on "Yes" vs "No" to estimate its confidence. Perhaps it will show much less likelihood for "Yes", than if the question was for a horse emoji, but in this case "Yes" is still high enough threshold to go through instead of "No".

4. However when it has to explain the exact answer, it's impossible to output an answer because it's false. E.g. seahorse emoji does not exist and it has to output it, previous tokens where "Yes, it exists, it's X", the X will be answers semantically close in meaning.

5. The next token will have context that "Yes, seahorse emoji exists, it is "[HORSE EMOJI]". Now it's clear that there's a conflict here, it's able to see that HORSE emoji is not seahorse emoji, but it had to output it in the line of previous tokens because the previous tokens statistically required an output of something.

captainmuon

3 months ago

LLMs are just the speech center part of the brain, not a whole brain. It's like when you are speaking on autopilot, or reciting something by heart, it just comes out. There is no reflection or inner thought process. Now thinking models do actually do a bit of inner monologue before showing you the output so they have this problem to a much lesser degree.

ijk

3 months ago

So, what I think most people don't realize is that the amount of computation an LLM can do in one pass is strictly bounded. You can see that here with the layers. (This applies to a lot of neural networks [1].)

Remember, they feed in the context on one side of the network, pass it through each layer doing matrix multiplication, and get a value on the other end that we convert back into our representation space. You can view the bit in the middle as doing a kind of really fancy compression, if you like. The important thing is that there are only so many layers, and thus only so many operations.

Therefore, past a certain point they can't revise anything because it runs out of layers. This is one reason why reasoning can help answer more complicated questions. You can train a special token for this purpose [2].

[2]: https://arxiv.org/abs/2310.02226

godshatter

3 months ago

I wonder what would happen if LMs were built a bit at a time by:

  - add in some smallish portion of the data set
  - have LM trainers (actual humans) interact with it and provide feedback about where the LM is factually incorrect and provide it additional information as to why
  - add those chat logs into the remaining data set
  - rinse and repeat until the LM is an LLM

Would they be any more reliable in terms of hallucinations and factual correctness?

This would replicate to some extent how people learn things. Probably would really slow things down (not scale) and the trainers would need to be subject matter experts and not just random people on the net say whatever they want to say to it as it develops or it will just spiral out of control.

taylorlunt

3 months ago

2 replies

I have a pet theory that LLMs being confused about what they just wrote is why they use so many em dashes. It's a good way to conceptually pivot at any point -- or not.

https://futurism.com/commitment-jail-chatgpt-psychosis

3 months ago

I think the em-dashes are OpenAI's way of watermarking their outputs. Makes it easy to spot spambots on Twitter. Well, that and they all have noticeable favorite words.

Timwi

3 months ago

I disagree. I don't think LLMs use em-dashes any more than typical professional writing does. It only seems a lot to someone who only reads online chatrooms and social media.

glenstein

3 months ago

1 reply

I've seen it several times. He's one where I ask it about baseball teams that don't yet have a hall of fame player:

"Arizona Diamondbacks No inductees wearing their cap.

Randy Johnson is in, but as a Diamondback — so Arizona actually does have one (inducted 2015)."

I asked it to explain what happened and got this, which could be true to how it pulls up info or could be nonsense, not sure:

"Here’s the likely internal sequence:

When you first mentioned the Diamondbacks, my initial retrieval pattern surfaced the “teams without a Hall of Famer” mental list. That list, in my training data, often appears outdated (pre-2015) and still shows Arizona as having none.

A few sentences later, my context pulled up the other knowledge set—Randy Johnson’s induction in 2015 wearing a Diamondbacks cap. That newer, correct fact then overrode the earlier incorrect claim, but because generation is sequential, the earlier error had already been committed to “paper” before the override happened. I corrected mid-stream, but in human terms, it looked like I contradicted myself inside the same paragraph.

This is essentially a knowledge conflict resolution timing issue. I can retrieve and reconcile facts during a single answer, but if the resolution happens after part of the text is already generated, you see the “flip.” It’s not like I “knew it all along but withheld it,” it’s more that multiple memory traces compete, and the most context-relevant one wins—sometimes too late."

Workaccount2

3 months ago

Whats fascinating is that these models have excellent knowledge about AI/transformers/LLMs (the labs have clearly been specifically training them in hopes of an automated breakthrough), so they can reason really well about what probably happened.

But it's also just that, what probably happened. They still have no real insight into their own minds, they too are also just victims of whatever it outputs.

petesergeant

3 months ago

The inability to do this before was the lack of self-correcting sentences in the training data. Presumably new training corpuses add many more examples of self-correcting sentences / paragraphs?

diego_sandoval

3 months ago

3 replies

I have no mouth, and I must output a seahorse emoji.

someothherguyy

3 months ago

2 replies

Those are "souls" of humans that a AI is torturing in that story though, not exactly analogous, but it does sound funny.

bmacho

3 months ago

1 reply

They are not souls but normal humans with physical bodies. The story is just a normal torture story (with a cool title), and everyone better stop acting like it was relevant in most conversations, like in this one.

someothherguyy

3 months ago

The machine destroys and recreates characters over and over, and they remember what happens. So, I called them souls.

zenmac

3 months ago

>Those are "souls" of humans that a AI is torturing in that story though, not exactly analogous, but it does sound funny.

Yeah well there seems to be some real concerns regarding how people use AI chat[1]. Of course this could be also the case with these people on social media.

cycomanic

3 months ago

6 replies

That's my favorite short story and your post is the first time I have seen someone reference it online. I think I have never even met anyone who knows the story.

ileonichwiesz

3 months ago

1 reply

? It’s referenced all the time in posts about AI.

prashantsengar

3 months ago

1 reply

It's a reference to a short story "I Have No Mouth, and I Must Scream"

https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...

DonHopkins

3 months ago

1 reply

And then there's "I Have no Grass, and I Must Mow" by Larry Ellison.

IAmBroom

3 months ago

You got me with that lure.

ndsipa_pomu

3 months ago

1 reply

There's literally several of us that like that Harlan Ellison piece. Check out the video/adventure game of the same name, though it's very old.

cycomanic

3 months ago

1 reply

I've heard good things about the game, never got around to trying it. Maybe I take this as a prompt to do now.

ndsipa_pomu

3 months ago

I gave it a try a couple of months ago, but didn't get very far before getting bored. However, I tend to dismiss games unless they grab me within a couple of minutes of playing.

Maybe I should give it another go as I do love the short story and it used to be my favourite before discovering Ted Chiang's work.

magnusmundus

3 months ago

Really? I’m surprised. The original is quoted relatively often on reddit (I suspect by people unaware of the origin — as I was until I read your comment).

Consider it proof that HN has indeed not become reddit, I guess :)

loloquwowndueo

3 months ago

It’s referenced a lot as the inspiration for The Amazing Digital Circus.

vidarh

3 months ago

It's easy to miss, but it's been referenced many times on HN over the years, both as stories:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

and fairly often in comments as well:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

user_of_the_wek

3 months ago

There is also an old point-and-click adventure game based on the story, in case you didn't know.

arnavpraneet

3 months ago

better title for the piece of this post

ModernMech

3 months ago

3 replies

That doesn't explain why it freaks out though:

https://chatgpt.com/share/68e349f6-a654-8001-9b06-a16448c58a...

LostMyLogin

3 months ago

2 replies

To be fair, I’m freaking out now because I swear there used to be a yellow seahorse emoji.

3 months ago

I learned there really is a mermaid/merman/merperson emoji and now I just want to know why.

Melatonic

3 months ago

Someone needs to create one for comedy purposes and start distributing it as a very lightweight small gif with transparency

When I first heard this however I imagined it as brown colored (and not the simpler yellow style)

hexagonwin

3 months ago

404 for me, maybe try archive.is?

3 months ago

For an intuitive explanation see https://news.ycombinator.com/item?id=45487510. For a more precise (but still intuitive) explanation, see my response to that comment.

mkagenius

3 months ago

2 replies

> So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token. lm_head just picks the closest thing and the model doesn't realize until too late.

Isn't that classic hallucination? Making up something like a plausible truth.

ben_w

3 months ago

2 replies

Except they know it's wrong as soon as they say it and keep trying and trying again to correct themselves.

If normal hallucination is being confidently wrong, this is like a stage hypnotist getting someone to forget the number 4 and then count their fingers.

https://www.youtube.com/watch?v=0e2kaQqxmQ0&t=279s

3 months ago

3 replies

Arguably it's "hallucinating" at the point where it says "Yes, it exists". If hallucination => weights statistically indicating that something is probably true when it's not. Since everything about LLMs can be thought of as compressed, probability based database (at least to me). You take the whole truth of the World and compress all its facts in probabilities. Some truthness gets lost in the compression process. Hallucination is the truthness that gets lost since you don't have storage to store absolutely all World information with 100% accuracy.

In this case:

1. Statistically weights stored indicate Seahorse emoji is quite certain to exist. Through training data it has probably things like Emoji + Seahorse -> 99% probability through various channels. Either it has existed on some other platform, or people have talked about it enough, or Seahorse is something that you would expect to exist due to some other attributes/characteristics of it. There's 4k emojis, but storing all of 4k emojis takes a lot of space, it would be easier to store this information in such a way where you'd rather define it by attributes on how likely humankind would have developed a certain emoji, what is the demand for certain type of emoji, and seahorse seems like something that would be done within first 1000 of these. Perhaps it's anomaly in the sense that it's something that humans would have expected to statistically develop early, but for some reason skipped or went unnoticed.

2. Tokens that follow should be "Yes, it exists"

3. It should output the emoji to show it exists, but since there's no correct emoji, it will have best answers that are as close to it in meaning, e.g. just horse, or something related to sea etc. It will output that since the previous tokens indicate it was supposed to output something.

4. The next token that is generated will have context that it previously said the emoji should exist, but the token output is a horse emoji instead, which doesn't make sense.

5. Here it goes into this tirade.

But I really dislike thinking of this as "hallucinating", because hallucination to me is sensory processing error. This is more like non perfect memory recall (like people remembering facts slightly incorrectly etc). Whatever happens when people are supposed to tell something detailed about something that happened in their life and they are trained to not say "I don't remember for sure".

What did you eat for lunch 5 weeks ago on Wednesday?

You are rewarded for saying "I ate chicken with rice", but not "I don't remember right now for sure, but I frequently eat chicken with rice during mid week, so probably chicken with rice."

You are not hallucinating, you are just getting brownie points for concise, confident answers if they cross over certain likelihood to be true. Because maybe you eat chicken with rice 99%+ of Wednesdays.

When asked about capital of France, you surely will sound dumb if you were to say "I'm not really sure, but I've been trained to associate Paris really, really close to being capital of France."

"Hallucination" happens on the sweet spot where the statistical threshold seems as if it should be obvious truth, but in some cases there's overlap of obvious truth vs something that seems like obvious truth, but is actually not.

Some have rather called it "Confabulation", but I think that is also not 100% accurate, since confabulation seems a more strict memory malfunction. I think the most accurate thing is that it is a probability based database where output has been rewarded to sound as intelligent as possible. Same type of thing will happen in job interviews, group meetings, high pressure social situations where people think they have to sound confident. People will bluff that they know something, but sometimes making probability based guesses underneath.

Confabulation rather seems like that there was some clear error in how data was stored or how the pathway got messed up. But this is probability based bluffing, because you get rewarded for confident answers.

jjcob

3 months ago

1 reply

When I ask ChatGPT how to solve a tricky coding problem, it occasionally invents APIs that sound plausible but don't exist. I think that is what people mean when they talk about hallucinating. When you tell the model that the API doesn't exist, it apologises and tries again.

I think this is the same thing that is happening with the sea horse. The only difference is that the model detects the incorrect encoding on its own, so it starts trying to correct itself without you complaining first.

nomel

3 months ago

Neat demonstration of simple self awareness.

Melatonic

3 months ago

Associating the capital of France with a niche emoji doesn't seem similar at all - France is a huge, powerful country and a commonly spoken language.

Would anyone really think you sounded dumb for saying "I am not really sure - I think there is a seahorse emoji but it's not commonly used" ?

DonHopkins

3 months ago

>"Yes, it exists"

AAAAAAUUUGH!!!!!! (covers ears)

Jensson

3 months ago

2 replies

> Except they know it's wrong as soon as they say it and keep trying and trying again to correct themselves.

But it doesn't realize that it can't write it, because it can't learn from this experience as it doesn't have introspection the way humans do. A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.

I feel this sort of self reflection is necessary to be able to match human level intelligence.

jodrellblank

3 months ago

Humans do that, you need to read some Oliver Sacks, such as hemispheric blindness or people who don’t accept that one of their arms is their arm and think it’s someone else’s arm, or phantom limbs where missing limbs still hurt.

ben_w

3 months ago

> because it can't learn from this experience as it doesn't have introspection the way humans do.

A frozen version number doesn't; what happens between versions certainly includes learning from user feedback on the responses as well as from the chat transcripts themselves.

Until we know how human introspection works, I'd only say Transformers probably do all their things differently than we do.

> A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.

Humans are (like other mammals) a mess: https://en.wikipedia.org/wiki/Phantom_limb

nathias

3 months ago

1 reply

more like an artefact of the inability to lie than a hallucination

dotancohen

3 months ago

2 replies

No analogy needed. It's actually because "Yes it exists" is a linguistically valid sentence and each word is statistically likely to follow the former word.

LLMs produce linguistically valid texts, not factually correct texts. They are probability functions, not librarians.

3 months ago

1 reply

Those are not two different things. A transistor is a probability function but we do pretty well pretending it's discrete.

dotancohen

3 months ago

1 reply

Transitors at the quantum level are probability functions just like everything else is. And just like everything else, at the macro level the overall behavior follows a predictable known pattern.

LLMs have nondeterministic properties intrinsic to their macro behaviour. If you've ever tweaked the "temperature" of an LLM, that's what you are tweaking.

3 months ago

Temperature is a property of the sampler, which isn't strictly speaking part of the LLM, though they co-evolve.

LLMs are afaik usually evaluated nondeterministically because they're floating point and nobody wants to bother perfectly synchronizing the order of operations, but you can do that.

Or you can do the opposite: https://github.com/EGjoni/DRUGS

nathias

3 months ago

this was no analogy, it really can't lie...

3 months ago

1 reply

I would have thought that the cause is that it statistically has been trained that something like seahorse emoji should exist, so it does the tokens to say "Yes it exists, ..." but when it gets to outputting the token, the emoji does not exist, but it must output something and it outputs statistically closest match. Then the next token that is output has the context of it being wrong and it will go into this loop.

thomasahle

3 months ago

1 reply

You are describing the same thing, but at different levels of explanation Llamasushi's explanation is "mechanistic / representational", while yours is "behavioral / statistical".

If we have a pipeline: `training => internal representation => behavior`, your explanation argues that the given training setup would always result in this behavior, not matter the internal representation. Llamasushi explains how the concrete learned representation leads to this behavior.

3 months ago

I guess what do we mean by internal representation?

I would think due to training data it's stored the likelihood of certain thing to be as emoji as something like:

1. how appealing seahorses are to humans in general - it would learn this sentiment through massive amount of texts.

2. it would learn through massive amount of texts that emojis -> mostly very appealing things to humans.

3. to some more obvious emojis it might have learned that this one is for sure there, but it couldn't store that info for all 4,000 emojis.

4. to many emojis whether it exists it has the shortcut logic to: how appealing the concept is, vs how frequently something as appealing is represented as emoji. Seahorse perhaps hits 99.9% likelihood there due to strong appeal. In 99.9% of such cases the LLM would be right to answer "Yes, it ...", but there's always going to be 1 out of 1,000 cases where it's wrong.

With this compression it's able to answer 999 times out of 1000 correctly "Yes, it exists ...".

It could be more accurate if it said "Seahorse would have a lot of appeal for people so it's very likely it exists as emoji since emojis are usually made for very high appeal concepts first, but I know nothing for 100%, so it could be it was never made".

But 999 cases, "Yes it exists..." is a more straightforward and appreciated answer. The one time it's wrong, is going to take away less brownie points than 999 short confident answers give over the 1000 technically accurate but non confident answers.

But even the above sentence might not be the full truth. Since it might not be correct about truly why it has associated seahorse to be so likely to exist. It would just be speculating on it. So maybe it would be more accurate "I expect seahorse emoji to likely exist, maybe because of how appealing it is to people and how emojis usually are about appealing things".

derefr

3 months ago

1 reply

> So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token.

I wonder if the human brain (and specifically the striated neocortical parts, which do seemingly work kind of like a feed-forward NN) also runs into this problem when attempting to process concepts to form speech.

Presumably, since we don't observe people saying "near but actually totally incorrect" words in practice, that means that we humans may have some kind of filter in our concept-to-mental-utterance transformation path that LLMs don't. Sometihng that can say "yes, layer N, I know you think the output should be O; but when auto-encoding X back to layer N-1, layer N-1 doesn't think O' has anything to do with what it was trying to say when it gave you the input I — so that output is vetoed. Try again."

A question for anyone here who is multilingual, speaking at least one second language with full grammatical fluency but with holes in your vocabulary vs your native language: when you go to say something in your non-native language, and one of the word-concepts you want to evoke is one you have a word for in your native language, but have never learned the word for in the non-native language... do you ever feel like there is a "maybe word" for the idea in your non-native language "on the tip of your tongue", but that you can't quite bring to conscious awareness?

https://en.wikipedia.org/wiki/Paraphasia#Verbal_paraphasia

3 months ago

1 reply

> Presumably, since we don't observe people saying "near but actually totally incorrect" words in practice

> do you ever feel like there is a "maybe word" for the idea in your non-native language "on the tip of your tongue", but that you can't quite bring to conscious awareness?

Sure, that happens all the time. Well, if you include the conscious awareness that you don't know every word in the language.

For Japanese you can cheat by either speaking like a child or by just saying English words with Japanese phonetics and this often works - at least, if you look foreign. I understand this is the plot of the average Dogen video on YouTube.

It's much more common to not know how to structure a sentence grammatically and if that happens I can't even figure out how to say it.

derefr

3 months ago

Huh, neat; I knew about aphasia (and specifically anomic aphasia) but had never heard of paraphasia.

SavioMak

3 months ago

I would argue it is hallucinating, starting at when the model outputs "Yes".

madeofpalk

3 months ago

To me this feels much more like a hallucination than how that phrase has been popularly misused in LLM discussions.

Lammy

3 months ago

> So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token.

Interesting that a lot of humans seem to have this going on too:

- https://old.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...

- https://old.reddit.com/r/Retconned/comments/1di3a1m/does_any...

What does the LLM have to say about “Objects in mirror may be closer than they appear”? Not “Objects in mirror are closer than they appear”.

Xmd5a

3 months ago

And what can it mean when a slip of the tongue, a failed action, a blunder from the psychopathology of everyday life is repeated at least three times in the same five minutes? I don’t know why I tell you this, since it’s an example in which I reveal one of my patients. Not long ago, in fact, one of my patients — for five minutes, each time correcting himself and laughing, though it left him completely indifferent — called his mother “my wife.” “She’s not my wife,” he said (because my wife, etc.), and he went on for five minutes, repeating it some twenty times.

In what sense was that utterance a failure? — while I keep insisting that it is precisely a successful utterance. And it is so because his mother was, in a way, his wife. He called her as he ought to.

---

I must apologize for returning to such a basic point. Yet, since I am faced with objections as weighty as this one — and from qualified authorities, linguists no less — that my use of linguistics is said to be merely metaphorical, I must respond, whatever the circumstances.

I do so this morning because I expected to encounter a more challenging spirit here.

Can I, with any decency, say that I know? Know what, precisely? [...]

If I know where I stand, I must also confess [...] that I do not know what I am saying. In other words, what I know is exactly what I cannot say. That is the moment when Freud makes his entrance, with his introduction of the unconscious.

For the unconscious means nothing if not this: that whatever I say, and from whatever position I speak — even when I hold that position firmly — I do not know what I am saying. None of the discourses, as I defined them last year, offer the slightest hope that anyone might truly know what they are saying.

Even though I do not know what I am saying, I know at least that I do not know it — and I am far from being the first to speak under such conditions; such speech has been heard before. I maintain that the cause of this is to be sought in language itself, and nowhere else.

What I add to Freud — though it is already present in him, for whatever he uncovers of the unconscious is always made of the very substance of language — is this: the unconscious is structured like a language. Which language? That, I leave for you to determine.

Whether I speak in French or in Chinese, it would make no difference — or so I would wish. It is all too clear that what I am stirring up, on a certain level, provokes bitterness, especially among linguists. That alone suggests much about the current state of the university, whose position is made only too evident in the curious hybrid that linguistics has become.

That I should be denounced, my God, is of little consequence. That I am not debated — that too is hardly surprising, since it is not within the bounds of any university-defined domain that I take my stand, or can take it.

— Jacques Lacan, Seminar XVIII: Of a Discourse That Would Not Be of Pretence

matheusd

3 months ago

> Explains why RL helps. Base models never see their own outputs so they can't learn "this concept exists but I can't actually say it."

Say "Neuromancer" to the statue, that should set it free.

sharperguy

3 months ago

Reminds me of in the show "The Good Place", in the afterlife they are not able to utter expletives, and so when they try to swear, a replacement word comes out of their mouth instead, leading to the line "Somebody royally forked up. Forked up. Why can't I say fork?"

luxuryballs

3 months ago

that’s probably a decent description of how the Mandela effect works in people’s brains, despite the difference in mechanism

3 months ago

1 reply

So what's at loggerheads here is:

* The LLM has strong and deep rooted belief in its knowledge (that a seahorse emoji exist).

* It attempts to express that concept using language (including emojis) but the language is so poor and inaccurate at expressing the concept that as it speaks it keeps attempting to repair.

* It is trained to speak until it has achieved some threshold at correctly expressing itself so it just keeps babbling until the max token threshold triggers.

3 months ago

3 replies

This is too metaphorical, but, still, basically correct. Nice to see that.

Essentially, in the latent / embedding / semantic space, "seahorse emoji" is something that is highly probable. Actually, more accurately, since LLMs aren't actually statistical or probabilistic in any serious sense, "seahorse emoji", after tokenization and embedding, is very close to the learned manifold, and other semantic embeddings involving related emoji are very close to this "seahorse emoji" tokenization embedding.

An LLM has to work from this "seahorse emoji" tokenization embedding position, but can only make outputs through the tokenizer, which can't accurately encode "seahorse emoji" in the first place. So, you get a bunch of outputs that are semantically closest to (but still far from) a (theoretical) seahorse emoji. Then, on recursive application, since these outputs are now far enough from the the sort of root / foundational position on the manifold, the algorithm probably is doing something like an equivalent of a random walk on the manifold, staying close to wherever "seahorse emoji" landed, but never really converging, because the tokenization ensures that you can never really land back "close enough" to the base position.

I.e. IMO this is not as much a problem with (fixed) tokenization of the inputs, but moreso that tokenization of the outputs is fixed.

3 months ago

1 reply

You're missing one key point, which is what makes this failure mode unusual.

Namely, that there is (incorrect) knowledge in the training data that "seahorse emoji" exists.

So when prompted: "Does [thing you strongly believe exist]?" the LLM must answer: "Yes, ..."

(The second nuance is that the LLM is strongly encouraged to explain its answers so it receives a lower score just by saying only "Yes.")

But I and probably others appreciate your more detailed description of how it enters a repair loop, thank you.

[edit: I disagree that LLMs are not statistical or probabilistic, but I'm not sure this is worth discussing.]

[edit 2: Google is no longer telling me how many web pages a term responds, but "seahorse emoji" and "lime emoji" quoted both return over ten pages of results. The point being that those are both 'likely' terms for an LLM, but only the former is a likely continuation of 'Does X exist? Yes, ..."]

3 months ago

1 reply

You're right, seahorse emoji is almost certainly in the training data, so we should amend my explanation to say that "seahorse emoji" is not just close to the training manifold, but almost certainly right smack on it. The rest of what I said would still apply, and my explanation would also to apply to where other commenters note that this behaviour is emitted to some degree with similar other "plausible" but non-existent emoji (but which are less likely to be in the training data, a priori). EDIT FOR THIS PARAGRAPH ONLY: Technically, on reflection, since all fitting methods employ regularization methods, it is still in fact unlikely the fitted manifold passes exactly through all / most training data points, and saying that "seahorse emoji" is "very close" to the training manifold is still actually technically probably most accurate here.

You're also right that it is a long discussion to say to what extent LLMs are statistical or probabilistic, but, I would maybe briefly say that if one looks into issues like calibration, conformal prediction, and Bayesian neural nets, it is clear most LLMs that people are talking about today are not really statistical in any serious sense (softmax values are scores, not probabilities, and nothing about pre-training or tuning typically involves calibration—or even estimation—in LLMs).

Yes, you can use statistics to (help) explain the behaviour of deep models or certain layers (usually making assumptions that are of dubious relevance to actual practice), but geometric analogies, regularization methods, and matrix conditioning intuitions are what have clearly guided almost all major deep learning advances, with statistical language and theory largely being post-hoc, hand-wavey, and (IMO) for the purpose of publication / marketing. I really think we could de-mystify a huge amount of deep learning if we were just honest it was mostly fancy curve fitting with some intuitive tricks for smoothing and regularization that clearly worked long before any rigorous statistical justification (or which still clearly work in complicated ways, despite such an absence of statistical understanding; e.g. dropout, norm layers, the attention layer itself, and etc).

Just, it gets complicated when you get into diffusion models and certain other specific models that are in fact more explicitly driven by e.g. stochastic differential equations and the like.

3 months ago

1 reply

"my explanation would also to apply to where other commenters note that this behaviour is emitted to some degree with similar other "plausible" but non-existent emoji (but which are less likely to be in the training data, a priori)."

I agree with you partially. I just want to argue there are several factors that lead to this perverse behavior.

Empirically:

Use web gpt-5-instant in TEMPORARY mode. If you ask for "igloo emoji" it confidently (but ONLY in temporary mode) says that "Yes, igloo emoji is in Unicode 12 and is [house-emoji ice-emoji]." Then it basically stops. But it has satisfied its condition of confidently expressing its false knowledge. (Igloo emoji doesn't exist. gpt-5-instant in non-temporary mode says no. This is also weird because it suggests the temporary mode system prompt is laxer or different.)

The mechanism you describe partially explains why "seahorse emoji" leads to babbling: As it outputs the next token, it realizes that the explanation would be worse off it if next emits stop token, so instead it apologizes and attempts to repair. And cannot satisfy its condition of expressing something confidently.

The upstream failure is poor knowledge. That combined with being tuned to be helpful and explanatory, and having no grounding (e.g. websearch) forces it to continue. Finally, the token distance from the manifold is the final piece of the puzzle in this unholy pathological brew.

You're incorrect that statistical language modeling is "post-hoc", it's rather "pre-hoc" / "pre-hack". Most foundational works in language modeling started as pure statistical models (for example, classic ngram models and Bengio's original neural language model from 2003), and it was later that hacks got introduced that removed statistical properties but actually just worked (Collobert and Weston 2008, as influenced by Bottou and LeCun). Where I agree with you is that we should have done away with the statistical story long ago. LeCun's been on about energy-based models forever. Even on HN last week, punters criticize him that JEPA hasn't had impact yet, as if he were behind the curve instead of way ahead of it.

People like statistical stories but, similarly to you, I also think they are a distraction.

3 months ago

1 reply

Right, I kind of suspect we don't really disagree on anything too fundamental here re: the looping behaviour (or statistics, actually). E.g. when I said earlier:

>> "the algorithm probably is doing something like an equivalent of a random walk on the manifold, staying close to wherever 'seahorse emoji' landed, but never really converging, because the tokenization ensures that you can never really land back 'close enough' to the base position"

"converging" is deeply under-specified. Of course, we mean that a stop or <EOS> token of some kind is generated, and this happens when the generated sequence up to that stop token has some low enough score / loss. When I say "you can never really land back 'close enough' to the base position", this is really that the output tokenization is lossy enough that this threshold is never reached, since, when recursing, we keep getting weird output tokens contaminating the sequence, so that we don't get close enough to the original "seahorse emoji" embedding, and so prevent the score / loss from getting small enough. In your language, the model "cannot satisfy its condition of expressing something confidently".

The way you present your timelines, I think we basically actually are in agreement re: statistics. Yes, if you go back far enough, statistics did indeed guide model development and successes (and still does in some narrow cases). But, also yes, as soon as you get into "modern" neural nets that actually make huge progress on things like MNIST, CIFAR, and language modeling, yeah, we are way, way past statistical intuitions being necessary or superior to intuitions based on curve fitting and smoothing / gradient conditioning and the like.

For dating this shift, I was personally thinking to something like the Hinton dropout paper which I checked was around 2012 (my work has been more in computer vision), but, yeah, about 2008, as you say, also seems close enough if you consider NLP.

Really appreciate your comments here. EDIT: and yes, energy models are the bomb.

3 months ago

1 reply

Yeah, overall I think we agree.

If you want to read some mind blowing early neural language sequence modeling approaches that everyone completely slept on, look at Pollack's work on "recursive auto-associative memory" (RAAM) and Sperduti's later labeled RAAM (LRAAM) work. Both from the early 90s. Didn't have a probabilistic interpretation IIRC.

Yoshua was always sort of agnostic about probabilistic approaches and used them when they made sense. 50% of his work included them, and other like early deep vision works of his purely motivated the use of deep models in terms of circuit theory and compactness / model complexity.

Collobert and Weston taught us we could train Yoshua's NLM models much much faster using negative sampling and a hinge loss, thus dropping the probabilistic story entirely.

I suspect the historical reason is that in the mid 2000s, the NLP community only very broadly started adopting statistical methods. (i.e. grad started began to be more likely to use them than not, which hadn't been true historically when linguistics not stats drove many intuitions, and using a CRF felt sort of next-level). So once every got comfortable with stats as table-stakes, they felt a sort of whiplash to stop approaching things through this lens.

3 months ago

I would also broadly agree that the overuse of statistical language and explanations is probably more driven by historical trends in NLP. I was always more interested in computer vision (including segmentation) and even deep regression. Especially in the case of deep regression, with the absence of a softmax and the ease of constructing task-specific custom loss functions (or like you say, the hinge loss example), it always seemed to me pretty clear none of this was all ever really particularly statistical in the first place.

I will definitely check out those RAAM and LRAAM papers, thanks for the references. You definitely seem to have a more rich historical knowledge than I do on these topics.

kqr

3 months ago

1 reply

But wait, if the problem is the final tokenisation, what would happen if we stopped it one or two layers before the final layer? I get that the result would not be as readable to a human as the final layer, but would it not be as confused with its own output anymore?

Or would it still be a problem because we're collapsing a distribution of likely responses down to a single response, and it's not happy with that single response even if it is fuzzier than what comes out of the last layer?

https://www.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...

3 months ago

It's not so clear how one could use the output of an embedding layer recursively, so it is a bit ill-defined to know what you mean by "stopped it" and "confused with its own output" here. You are mixing metaphor and math, so your question ends up being unclear.

Yes, the outputs from a layer one or two layers before the final layer would be a continuous embedding of sorts, and not as lossy (compared to the discretized tokenization) at representing the meaning of the input sequence. But you can't "stop" here in a recursive LLM in any practical sense.

mh-

3 months ago

This explanation was very understandable, thank you for taking the time to write it.

psygn89

3 months ago

1 reply

I thought there was a yellow/pink seahorse emoji already but I guess not.

neom

3 months ago

1 reply

(Edit: There is another long thread that contains an image that I thought was the seahorse emoji (although apparently the seahorse emoji doesn't exist...but i thought this was it so I don't know what is going on...) https://www.reddit.com/r/Retconned/comments/1di3a1m/comment/...)

mh-

3 months ago

1 reply

The yellow one is exactly what I pictured. This is pretty surreal for me, because it's the first time one of these Mandela Effect things applied to me personally.

Memory is a tricky thing.

sparkie

3 months ago

1 reply

Are we certain that it didn't exist though? Unicode only got emoji in 2014, after ~5 year standardization effort. There were many different, incompatible formats around for about a decade before that, plus non-emoji like Kaoani.

Perhaps there was a seahorse somewhere that never made it to Unicode.

Or maybe people are just misremembering - perhaps mistaking the emoji for a unicorn or a chess's knight piece as a seahorse.

Melatonic

3 months ago

Supposedly it was in MSN Messenger and Skype previous to Unicode Emojis.

porphyra

3 months ago

2 replies

I always felt like tokenization is one of those double edged swords where it makes some stuff amazingly easier but gets tripped up on the weirdest bugs. The number of "r"s in "strawberry" being another well-known quirk.

historian1066

3 months ago

1 reply

Strawberry used to be spelled strawbery. Is it definitive that the 3 rs problem wasn’t caused by the prior spelling?

typpilol

3 months ago

1 reply

Yes because it'll output Strawberry correctly and still say wrongly the amount.

https://chatgpt.com/c/68e44360-ba40-8329-816f-b4cd74445d79

3 months ago

GPT-5 Instant does the opposite. It says the right number but prints the word wrong.

jsrozner

3 months ago

1 reply

This is not a tokenization issue. Models do not, in general, "know" what they know.

immibis

3 months ago

2 replies

It's a tokenization issue because there can't be a circuit to count letters because the same letters are represented in myriad different ways because of tokenization.

Jensson

3 months ago

1 reply

You are wrong, there can be a circuit to count letters because it can easily normalize them internally, as we know it can transform text to base64 just fine. So there is no reason there can't be a circuit to count letters.

The training just is too dumb to create such a circuit even with all that massive data input, but its super easy for a human to make such a neural net with those input tokens. Its just a kind of problem that transformers are exceedingly bad at solving, so they don't learn it very well even though its a very simple computation for them to do.

https://g.co/gemini/share/c244e5f51e37

3 months ago

Transformers have a limited computation budget related to the size of the context, so it can get better at math the longer the conversation is.

jsrozner

3 months ago

I was saying that the seahorse emoji failure is not a tokenization issue. If you ask an LLM to do research, you will sometimes get hallucinated articles -- potentially plausible articles that, if they existed, would have been embedded at the position from which the model tried to decode. This is what we see happening with the seahorse emoji. The model identifies where the seahorse emoji would have been embedded if it existed and then decodes from that position.

In the research case you get articles that were never written. In the seahorse case later layers hallucinate the seahorse emoji, but in the final decoding step, output gets mapped onto another nearby emoji.

Admittedly, in one way the seahorse example is different from the research case. Article titles, since they use normal characters, can be produced whether they exist or not (e.g., "This is a fake hallucinated article" gets produced just as easily as "A real article title"). It's actually nice that the model can't produce the seahorse emoji since it gets forced (by tokens, yes) to decode back into reality.

Yes, tokenization affects how the hallucination manifests, but the underlying problem is not a tokenization one.

Mistletoe

3 months ago

1 reply

Gemini 2.5 flash seems to nail it.

wavemode

3 months ago

1 reply

It appears to have used a web search to come up with that (correct) response.

Mistletoe

3 months ago

This may be why I like the Google Gemini app a lot. At least it knows when to just do a Google search instead of hallucinate?

joegibbs

3 months ago

2 replies

You'll also notice the same thing happens for other non-existent emojis that sound like they should exist: dragonflies, lemurs, possums, blackberries - even Claude 4.5 will start off by saying "Yes!" and then correcting itself. It will immediately give the right answer for very specific things that you wouldn't expect to get their own emojis though.

wonnage

3 months ago

Wow, dragonfly terminated relatively quickly for Claude but sent ChatGPT into an infinite loop that was even worse than seahorse

cubefox

3 months ago

Which proves that the popular "Mandela effect" explanation is false. People just jump to plausible sounding explanations instead of saying: This might be it, but really I don't know.

zten

3 months ago

6 replies

I realized if someone were to assign me the ticket for fixing this behavior, I would have no idea where to begin with solving it even with this blog post explaining the problem, so I'm very curious to know what the most practical solution is. (They obviously aren't adding "If someone asks you about a seahorse emoji, there isn't one available yet, no matter how strongly you believe one exists." to the system prompt.)

maxbond

3 months ago

2 replies

Petition the Unicode consortium to include a seahorse emoji.

Sharlin

3 months ago

If a LLM is wrong about facts, the facts have to change.

beeflet

3 months ago

It takes a strong man to deny what's right in front of him, and if the truth is undeniable... you create your own

Gigachad

3 months ago

1 reply

I bet they probably are adding that to the system prompt at least in the short term while people are paying attention before looking for a longer term answer.

The system prompts I've seen are absolutely massive.

serced

3 months ago

1 reply

I find interesting that their blog post on prompt/context engineering kind of stands against their ultra long system prompt. Maybe it is not too specific as in their visual example (too specific - just right - too vague). https://www.anthropic.com/engineering/effective-context-engi... and the system prompt https://docs.claude.com/en/release-notes/system-prompts#sept...

3 months ago

> This attention scarcity stems from architectural constraints of LLMs. LLMs are based on the transformer architecture, which enables every token to attend to every other token across the entire context. This results in n² pairwise relationships for n tokens.

The n² time complexity smells like it could be reduced by algorithm engineering. Maybe doing a preprocessing pass to filter out attending to tokens (not sure what the right term of art is here) that do not contribute significantly to the meaning of the input. Basically some sort of context compression mechanism.

username332211

3 months ago

2 replies

Couldn't you just add a large number of repetitions of "There is no seahorse emoji." to the training set?

Edit: Come to think of it, training on a Q&A format is probably better - "Is there a seahorse emoji? No, there isn't."

Jaxan

3 months ago

1 reply

If you had to do this for every falsity in the LLM, there wouldn’t be an end to it.

username332211

3 months ago

I agree, but I still suspect OpenAI and other LLM companies do stuff like that, when an example of a hallucination becomes popular.

If I see some example of an LLM saying dumb stuff here, I know it's going to be fixed quickly. If I encounter an example myself and refuse to share it, it may be fixed with a model upgrade in a few years. Or it may still exist.

layman51

3 months ago

Something about how you have to keep repeating "There is no seahorse emoji" or something similar reminded me of the Local 58 horror web series where it seems like the program is trying to get you to repeat "There are no faces" while showing the viewer faces: https://www.youtube.com/watch?v=NZ-vBhGk9F4&t=221

classified

3 months ago

1 reply

> the most practical solution

Maybe there is none, and this is just one example of a fundamental LLM limitation.

TillE

3 months ago

People really really want LLMs to output a highly reliable finished product, and I suspect we're probably never gonna get there. Lots of progress over the past couple years, but not on that.

I think it's much more interesting to focus on use cases which don't require that, where gen AI is an intermediate step, a creator of input (whether for humans or for other programs).

jfyi

3 months ago

"This behavior is a function of the core AI technology we use, we are unable to resolve this issue with a standard software patch or update at this time.

For the time being this issue can be mitigated by not asking about seahorse emoji.

We are closing this support ticket as the issue is an inherent limitation of the underlying technology and not a bug in our specific implementation."