Linear algebra explains why some words are effectively untranslatable
Mood
thoughtful
Sentiment
positive
Category
science
Key topics
linguistics
linear algebra
translation
The article explores how linear algebra can be used to understand why some words are difficult or impossible to translate between languages.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
113
Day 1
Avg / period
39.7
Based on 119 loaded comments
Key moments
- 01Story posted
11/14/2025, 2:46:27 PM
4d ago
Step 01 - 02First comment
11/14/2025, 4:55:34 PM
2h after posting
Step 02 - 03Peak activity
113 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/18/2025, 11:28:04 AM
22h ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
If the mere sight of the above is like a punch in the face for you, don't worry.
Almost makes me wonder if it was intentional.
I wasn't aware that that idea was in dispute.
This is also why lawyer speak is so particular. Language is fuzzy in most cases. Only language that relates to discrete physical objects gets closer to the binary state of exactness described in the article.
Happy: Joyful, cheerful, merry, delighted
Or
Beautiful: Lovely, pretty, attractive
The only truly identical synonym I can think of is flammable and inflammable
But what joyful means to you likely differs from what it means to me, simply because we haven’t read the exact same literature and had the same conversations.
I'm just on the thread following this idea: "The article seems to think that a word is untranslateable if there is no single word in the target language"
So we're talking about "translatability" of single words. Mapping multiple words of language X to one word of language Y is going to have some effect on translation.
These examples have just the meanings of a noun + adjective or of a noun + noun in genitive case, where some languages are lazier than others and omit the markers of case or of adjectival derivation from noun, which are needed in more strict languages.
There are also other kinds of compound nouns, where the compound noun does not have the meaning of its component words, but only some related meaning (usually either a pars pro toto meaning or a metaphorical meaning). Those are true compound nouns, not just abbreviated sequences of words from which the grammatical markers have been omitted.
Such compound words were very frequent in Ancient Greek, from where they have been inherited in the scientific and technical language, where they have been used to create names for new things and concepts, e.g. arthropod, television, phonograph, basketball, "bullet train" and so on.
This kind of compound words are almost never translatable, but they are frequently borrowed from one language to another and during the borrowing process sometimes the component words are translated, but the result is not a translated word, it is a new word that is added to the destination language.
The example that people often quote from German is “kummerspeck” which would literally translate as “grief bacon”, but means weight you put on through comfort eating having gone through a bereavement or other trauma.
Wouldn't cranberry morphemes be good examples this type of relationship? I don't know if, in the eponymous example, the cran- being bound precludes it from being counted as a closed compound word or not though.
An English speaker might be willing to accept componoma ("names placed together", Latin) or synthetonoma (also "names placed together", Greek) without breaking stride.
This is false; English loves using compound words. One example of such a compound word is "fire department", which has identical syntax to the German compound "Feuerwehr". Whether a compound word is spelled with or without internal spaces is not a fact about the language, it's a fact about the spelling.
But that's true of any language. Not only that, but English uses loanwords heavily which are often Anglicisations of words from other languages, which may not in themselves be just one word.
"Ho ho ho", the flag-waving Little Englander types say, "Gaelic is such a stupid language, they don't even have a word for 'television', they just say 'television' in a stupid accent!"
But English also has no word for "television". Worse, the word "television" isn't even just a loanword, it's two words from two different languages, "tele" from Greek and "vision" from Latin. What a bodge job! Imagine letting something like that slip through to production use!
The hypothetical Catalan-Hungarian inventor of it in another leg of the trousers of time may have called it llunylátás, and then where would we be?
Well, most languages would have some variant of that word to mean "television", as they do now, I expect.
The English word "galore" (meaning "sufficient" shading towards "more than enough") comes from the Gaelic words "gu leòr", (goo lyaawr, the grave accent above the o makes the vowel sound longer). What a silly language English is, doesn't have a word that means "more than you're ever likely to need", has to steal one from Gaelic and then spell it wrong.
Oh, they use this word "whisky". You know what that means? It means "uisge beatha" but they only say the first word, in a silly accent because they can't pronounce it properly.
Quite often there's no single word for a thing you're trying to translate but that doesn't mean it's untranslateable. English has only one single word for rain, for example, but Gaelic has about half a dozen of which the only ones I can reproduce here are "uisge" (that word again) which just means "water", and "fras" which is more like a gentle shower. The rest of the words in the Gaelic of the North-West of Scotland that refer to rainy weather are, of course, profane in the extreme.
But I think most of those words are in use somewhere, for something.
English people will say something like: Germans have a word for everything.
Many of which are just sentences with the spaces removed.
Australia’s have a lot of those too, or worse: our speech is often nothing but a handful of vowels and a swarm of apostrophes.
VLIW natural language.
(I was addressing the parent's claim of 'But English also has no word for "television"')
You're forgetting about synonyms. The common adage that English has the largest vocabulary stems from the fact that it often has multiple words for the same thing. Sofa, couch. Autumn, fall. Etc etc. Other languages generally don't do this. I've never heard anyone suggest that English has words for more concepts.
This becomes immediately apparent (and relevant) when writing fiction or poetry. At least it does to me.
Non-fiction and spoken English do not highlight the subtleties between these words because using them interchangeably in the same work is considered bad form.
Even if the number of words in a language were finite we wouldn't have a reasonable way of counting them. There are too many kinds of fuzziness involved in deciding what counts as a "word" and you can't ignore the borderline cases because the borderline cases vastly outnumber the straightforward cases.
I think people don't realize how weird language is. Like you could look at Chinese and call each sentence a "word" as there are no spaces. What's the difference between that and a compound word like "nighttime" or the whole German language where you got words like Krankenwagen ("patient" + "car").
Now this doesn't mean there aren't words or phrases that aren't translatable. But the thing is we can always translate the words themselves. What we can't always translate is the meaning behind them. I think the best example of this comes from Star Trek and the Tamarian Language[0,1]. "Sokath, his eyes open!" The problem with communication is not that the words don't translate, it is that the meaning behind them doesn't. Just as people struggle with idioms when learning American English or why someone might be confused about why someone "shit in the milk" or "fucked the dog". Words are an embedding. A compression.
The thing people are constantly forgetting, but is more important than ever in a globally connected world, is that words are not perfect representations of thoughts. We compress our thoughts into them and hope the person on the other side can decompress them. It is why you can more easily communicate with your close friends who have better context than you can with another person that natively speaks your language and is why someone that learns a new language can speak perfectly well but still struggle to communicate. Language is not just words, it is culture[2]. So in a much more connected world today we have these disconnects in culture and thus interpretation of what people say. I know every one of you has been told to "speak to your audience" but how do you speak to your audience when your audience is everybody and when you don't know who your audience is? The new paradigm requires us to be much better interpreters than we were before. Least everyone is going to sound crazy, other than those you frequently talk to and have that shared understanding.
[0] https://memory-alpha.fandom.com/wiki/Tamarian_language
[1] https://www.youtube.com/watch?v=3-wzr74d7TI
[2] This is, btw, why people argue for embodied AI being so critical. Not because LLMs can't appear to grasp the language, but because we as humans have embodied our language so deeply you probably didn't even realize that I used the word "grasp" to refer to an abstract concept and not something you can actually touch with your hand.
In another blog post where he uses "shibui" as an example of an untranslatable word, he says, "Saying shibui like that, in a mere second, conveys what would otherwise make a clunky and unnecessarily long digression."
At the root of nearly all the blog posts like this one (basically explaining why they don't agree with a widely held belief) is a redefinition of a term or word into something very specific that contradicts the common definition.
So in this case, I see the diagrams as representing the boundaries drawn when projecting / quantizing complex ideas into a set of central points that are insufficient for catching all of the nuance of the original. How well can you adapt a nuanced idea to a different space?
If Language A has an idea that exists at one point in space, which is the closest word in Language B that might be used to represent it? A Voronoi diagram is one possible way of illustrating it.
Tangent on your tangent: this GDC presentation from 2016 is probably my favorite real-world application of Voronoi Diagrams, and uses them for N-player split-screen camera control: https://www.youtube.com/watch?v=tu-Qe66AvtY&t=1594s
I have a lingering dream in the back of my mind to make a single-couch Liero-style casual game for N-players with good dynamic camera support using this technique.
I tried to find the really interesting article about language and color that describes how some cultures use different naming schemes for colors but couldn’t find it. It talked about how back in the day we don’t know orange as a color, we just thought it was red-yellow and only after the fruit was distributed did the word for the color catch on. Here’s the best article I can find that talks about this phenomena https://burnaway.org/magazine/blue-language-visual-perceptio...
I'm not quite sure I understand this—I do have mental sensations/processes sans language, but I would not characterize them as "thoughts". To me, a thought is inherently linguistic, even if they relate to non-linguistic mental processes. So to me, learning a new language is very literally learning how to think differently.
I take a slightly more narrow definition of “thoughts” that may be more akin to “expressions” - ideas that can be communicated, so excluding non-linguistic mental processes. I think that may be where we disconnect. A lot of my idea about thoughts comes from the Borges story, Funes the memorius (short story about a dude who could not forget - interesting read and really clarifies my feelings on my definition of “all possible thought”). In the story he talks about tree leaves, but instead imagine needing a unique linguistic scheme for every single unique snowflake you ever see. It would be a linguistic nightmare! Therefore language must generalize otherwise it becomes noncommunicable and that generalization to me induces the “lossy approximation” I attribute to language in my prior comment.
So, in my head Funes’s mind represent the abstract space of all possible thoughts. When we use language, we are stacking words/sentences/paragraphs/etc together almost like vector addition trying to reach a particular point in the thought vector space. Some languages have really clean ways of getting to certain thoughts while others take a mouthful and still don’t get you exactly there (物の哀れ example from link).
I agree with your statement on new languages being different thinking. As you follow that vector addition process to get to the “thought,” different languages will take you on different paths to get to your destination thought because languages encode those vectors differently, even if the destination thought is the same. In my mental model, the act of thinking is putting those language vectors together and tracing their path to get to your thought.
And if my comment still makes no sense - I might have to incubate this thought a bit more :) but I do recommend the story- it’s a quick, thought provoking read.
I was glad to read this because it seemed too neat and tidy for "thought" to necessarily be able to be encoded into language, especially in the presence of frequent miscommunication between people that share language, culture, and context.
On language and thinking, I agree that new languages promote thinking differently. But it seems that the difference has to fall short of the Sapir-Whorf hypothesis of informing perception or experience. Which would then limit the extent to which thought, as informed by language, would influence the way one would compose a linguistic representation of some thought/idea/"blob of meaning to be communicated." All to suggest that there is a broader landscape of "thinking-like activities" than those which would be able to be encoded linguistically.
Maybe it's simpler to say that I think of language as more lossy than thought.
Language is very effective at this, but I don't think thought is inherently linguistic.
To me language is just a way to label, group or organise these things. So when you learn a new one you learn a new 'labeling system/taxonomy' does that sound familiar?
Each language is a lossy approximation of all conceivable thought...
This ultimately boils down to the private language discussion started by Wittgenstein. If you admit public language is a lossy approximation of meaning, you're taking a position on the existence of private languages.Yes, that mathematical expression is like a punch in my face, but not for the reason you think. I am offended that the rank of the matrix does not match the dimension of the matrix, not that I'm seeing a matrix.
Concepts are only usefully distinguished by context and use.
By the author's own argumentation: nothing is translatable (or, generally, even communicatable) unless it has a fixed relative configuration to all other concepts that is precisely equivalent. In practice, we handle the fuzziness as part of communication and its useless to try and define a concept as untranslatable unless you're also of the camp that nothing is ever communicated (in which case, this response to the author's post is completely useless as nobody could possibly understand it enough internally for it to be useful. If you've read this far, congrats on squaring the circle somehow)
The same process that allows two speakers of the same language to communicate adequately allows one to translate from one language to another. If it were truly impossible to translate from one language to another, we would be unable to perceive this and argue about it. The recognition and correction of errors is part of the process of translation just as it is part of the process of communication in a single language.
That indeterminacy of translation isn’t mentioned is a huge shortcoming of this article.
In other words:
The source sentence is a vector in “language A space.”
The target sentence is a vector in “language B space.”
A good translation finds a vector that has the same direction (same meaning, intent, tone) even though it lies in a different coordinate system (the new language).
If both these thoughts are true, then it would appear that languages have topological characteristics. We can (topologically) map from one to another, 'thoughts' (that is a complex of words) form 'paths on the language manifold' and certain paths may be more 'natural' in one topological form than the other.
We can look no further than English: "man can do something," "man can do not do something" (i.e., can do but does not have to), then pretty straightforward "man can not do something" and, all of sudden, to express that man cannot decline some obligation, we say "man can not help but do."
It is not translation per se, but shows that some parts of language were evolved to tiptoe around non-customary things, in this case, double negation. And double negation is very easy in some other languages.
Go translate an ee cummings poem and make sure to retain all its meanings.
You could consider the "cost" of expressing a word as some kind of metric or norm on the vector. What in one language/basis is a simple Kronecker delta, in another is a very complex vector (of course if it were the same vector in two bases, it would have the same length, but we could rather think of translation as an affine transformation, say).
And finally, with two bases, they need not span the same vector space. You can have a three-coordinate vector space all you like, if you have only two basis vectors you ain't spanning it. At best you can hope for an orthogonal projection from one to the other, and lose some nuance.
Eventually, with bilinguality, you learn not to translate words. Concepts live in different languages and describe a reality. Usually you can describe that reality in two different languages, but sometimes not.
Phew! Thanks for clarifying.
The closest English equivalent is abbreviations like "PC". They're perfectly usable in context, but if you see one standing alone it's not clear if it's personal computer, politically correct, Peace Corps, etc etc.
If you think back to the meme from a decade or two ago about how men and women perceive colour [1], where e.g. "pink" to a man covers a whole range of colours to a woman, then that kind of hints at the idea.
One example back in the realm of vocabulary is the English word "happy". This embodies a range of meanings from joy, willingness, pleased, contentment, satiation, etc. There might be some overlap in some of these meanings with other words like "joy" or "excited", that don't have the same overlaps in other languages. E.g. "happy" might be translated to French as "heureuse" for the senses of pleased or content, but not for willingness sense.
Similarly, the French word "dommage" can be translated into a whole bunch of English words that aren't normally synonyms of each other - pity, damage, shame, harm.
This kind of nuance can lead to two opposite problems when translating - when the meaning is limited to a subset of possible meanings by context, and the wrong one is chosen in the foreign language, and when the author's meaning embodied multiple meanings and the chosen translation doesn't cover all of them.
Some of these features can lead to the humour in subtle jokes being lost in translation, e.g. "he'd be late to his own funeral".
[1] e.g. https://www.psychologytoday.com/us/blog/brain-babble/201504/... or https://digitalsynopsis.com/design/male-vs-female-color-perc...
But I think this exposes an even greater problem, where words thought to be direct translations will always drift in vector value as they are weighted for attention within their respective corpora. Are we on the brink of translation-nihilism?
This isn't even limited to complex phenomena or shades of snow. Even "I like" is a different construction in many languages, in an unexpected way to new language learners.
N^{any constant} is not bijective with a single R.
There's an xkcd devoted to this problem, even using computational linguistics as an example, IIRC.
Vague: does it mean all the inflected forms of a word, or just the stem without inflection? Example: is (are?) 'walk', 'walked', 'walks' and 'walking' four words, or one? What about "stand/stood"? (And languages where the bare root or stem can't appear by itself, like verbs in Spanish.) Derived words, like 'push' and 'pushy'.
Do compound nouns count as a word, or do only the parts count? Example: 'doghouse' (or 'dog house'). What about idioms? Example: 'to crane his/her/my/your/our neck(s)'.
What about different pronunciations? Is 'roof' pronounced to rhyme with 'aloof' the same word as 'roof' pronounced with the vowel of 'put'? And different spellings but same pronunciation: 'bear' vs. 'bare'.
What about words with different grammatical categories, like 'push' as a noun ("I gave her a push") or a verb ("I pushed her"). Or the same word with virtually unrelated meanings, "I pushed her on the swing" vs. "I pushed my ideas."
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.