Learning Languages with the Help of Algorithms
Posted4 months agoActive3 months ago
johndcook.comTechstory
calmmixed
Debate
60/100
Language LearningAlgorithmsEducation Technology
Key topics
Language Learning
Algorithms
Education Technology
The article discusses using algorithms to optimize language learning, sparking a discussion on the effectiveness of this approach and the complexities of language acquisition.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3d
Peak period
21
72-84h
Avg / period
10.3
Comment distribution41 data points
Loading chart...
Based on 41 loaded comments
Key moments
- 01Story posted
Sep 17, 2025 at 9:07 PM EDT
4 months ago
Step 01 - 02First comment
Sep 20, 2025 at 11:43 PM EDT
3d after posting
Step 02 - 03Peak activity
21 comments in 72-84h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 23, 2025 at 7:11 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45283637Type: storyLast synced: 11/20/2025, 7:40:50 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Turns out that most consumers just want to feel like they're learning a language instead of doing the actual work, or in extreme cases, literally only care about maintaining their streak or leaderboard score.
But what makes you think that this is because "most consumers just want" it that way? The whole effect of dopamine hits is to manipulate what users believe they "want". But you cannot claim to be working in the interests of your users after you manipulated them.
I.e. if a user installed Duolingo because they genuinely wanted to learn the language and than got sidetracked by all the gamification stuff, I don't think you can say they "really" just wanted to play games the whole time.
(Duolingo is walking a fine line here, which was probably the reason they picked language learning in the first place: Because in that field, users really do want a certain degree of nudging and manipulation, to help them keep up with the tedious process of frequent repetition.
That was sort of the official value proposition if Duolingo and I think the reason why many users installed it. It's also why many of the nudging strategies work at all, because they can assume a cooperating user.
But if you use the app, you can see that it frequently tries to push beyond that mutually agreed purpose: Trying to upsell you to the paid version, invite friends, take part in global leaderboard challenges, etc - all of which has very little to do with language learning)
Duolingo is one of the worst apps out there for language learning, and its users are not practicing useful language skills. It’s a gamified system that feels like language learning, without actually having any substance.
Or ...
I think the article is just using this as a hook to introduce the submodularity of the maximum weighted cover problem. But I'll talk about a different way of using the same collection of books to learn a language that I think is better.
First of all, you'll probably want to take into account which words you already know, instead of just removing stopwords. If a book uses lots of common words, but you already know them, you're not learning much.
Secondly, no matter how much or how little you already know, you're unlikely to find a book that fits your level well. If you're just beginning to learn the language, no matter which book you pick, the very first sentence will be full of new words, but most of those will be rare ones that you won't encounter again until much later. If on the other hand you already have a very good command of the language, you might be able to breeze through entire chapters and only pick up a handful of new words. (If your primary goal is to enjoy books rather than achieving mastery of the language, this is of course perfectly fine.)
So what I do is split the entire collection into sentences, and for each word from most common to least, pick a small number of sentences using this word, ideally without also having much rarer words, try to read and understand them all, and then use the most suitable sentence to make an Anki flashcard. It's much easier to find a sentence at the right level than an entire book.
It can be a bit weird to learn about the plot of a book piecemeal out of order, especially if multiple books are mixed together, but I think it's an interesting experience.
The same principle can also be applied to recordings from Mozilla Common Voice: https://commonvoice.mozilla.org/en/datasets I like to use them for dictation exercises in Anki, where the card plays a recording and I type in what I thought I heard to check whether I got it right.
The final step of choosing one sentence and turning it into an Anki flashcard is manual.
I want to learn so that I can read/understand publications in mathematics in a foreign language, mostly Swedish, French, German. (*) For this exercise, the typical apps do not help much.
(*) I would have liked to add Latin and Greek too but that's mostly a pipe dream.
Reading old mathematicians and scholars I have realized something that runs quite counter to the common perception we have, especially in my country.
That common perception is school kids, especially in mid and high school are overwhelmingly burdened by sheer volume of subject matter to learn at a very young age. But then I look at educated teenagers from 17th - 18th centuries, who went on to become mathematicians or scholars, they were so immensely well read at a very young age. I understand this is a biased sample, but many of these people, Newton, for example, were ordinary folks (socio-economically speaking)
Hamilton (I concede that one cannot compare Hamilton with a typical modern teenager) was already quite fluent in thirteen languages in his pre-teens. Apart from the usual suspects, he knew Arabic, Hebrew, Farsi, Sanskrit, Hindi, Marathi.
This might sound atypical but this was not unheard of. One of the poets in my language was fluent in Hebrew, Greek, Italian, French, Latin, Sanskrit, Telugu, Tamil, Bengali, English.
[1] https://archive.org/details/conspectus-grammaticus-familia-r...
[2] https://latinitium.com/best-books-for-learning-latin/
As for difficulty, well, even English is not my first language. So Latin would be quite a stretch for me.
What makes things more difficult (this is not specific to Latin) is that Maths, Physics has its own language. Domain specific words, such as curvature, torsion, divergence, curl, force, power, action, moment, momentum do not translate in a way that is linguistically obvious.
For French: Dandberg and Tatham, French for Reading
For German: Jannach, German for Reading Knowledge
I've used both and swear they're magic, especially if you're trying to learn to read in a scientific domain that you're already a specialist in (versus literature).
Once you've sort of "learned the game" it isn't very hard to do a similar process for other languages on your own. Then, my main recommendation is to take a text you're deeply familiar with in your native language or English that exists in X other language and just go ahead and start reading it with a dictionary. It starts slow, but progress is very very fast if you stick with it, especially compared to learning to speak or even just listen to a language.
For life reasons, I've found myself having to learn Danish, so I'll let you know if I figure out any good resources for Scandinavian languages.
[1] The only downside I've encountered is trying to later learn to speak a language I had been reading for a while where overcoming the sort of "fictitious phonetics" that existed in my head proved problematic.
I think, for Danish, if your German is decent, look to older, more formal Danish books you can also find in German, or maybe try to find work in both Danish and Low German / Plattdeutsch and see if it forms a good midpoint for you.
Dutch might possibly also form a decent parallel - the combination of my Norwegian, German and English means I can slog my way through more formal Dutch reasonably well without ever having tried to learn it.
The phonetics, on the other hand, are presenting some challenges...
For Danish, it's so similar to Norwegian it's a lot easier, but there's an old Norwegian joke that Danish is just Norwegian spoken with a potato in your mouth... To us, Danish sounds like they're failing to enunciate every single sound...
Incidentally, pronunciation got a lot easier to me when I started looking at mouth placement of natives when speaking. Just watching and copying mouth placement and movements have fixed so many pronunciation issues for me that no amount of listening and repeting could address.
Apparently I was good at picking up languages other than my mother tongue, as a child (4yrs). But now those same languages that I apparently was fluent in appear quite incomprehensible, like first contact incomprehensible.
What's your mother tongue out of curiosity?
Good luck with your studies!
What always makes learning to read easier is that time is completely in your control. The principle is pretty straightforward: if you have enough time and patience, you can read anything (with a dictionary, grammar book etc) and the more you read in that language the less time it starts to take. These books basically just bootstrap the process.
I mentioned it above, but the other way is if you have a book or article you really know well in your mother tongue that exists in a language you want to learn, just patiently try and read it in that language. I think programmers actually have a bit of an advantage in this, as it's really just pattern recognition -- and it isn't that different from trying to understand a program in a language you haven't worked with before.
Thankfully there is the "Translations of Mathematical Monographs" book series
https://bookstore.ams.org/mmono
I had just resigned myself to the fact that I will probably never be able learn Russian. At an optimistic best, perhaps French and Swedish only, if at all.
One way to tackle this problem is to just get started with an LLM. You ask ChatGPT for example, to translate for you, and then you try to figure out what word correspond to what word and keep going. After a while you will need the LLM help less and less.
Who am I to tell you this? I only read math in my native language, and in French and English. But once I wanted to do some calculations using Gauss's Theorema Egregium, so, out of curiosity, I picked up both the English translation of Gauss's original publication, and the original Latin text. I was able to understand sufficient Latin to figure out what Gauss was saying and to find out that the English translation has a bug.
Something like collecting phrases from these books, loading them into SRS, collecting youtube videos of natives discussing the material you are into, extracting the sound and listening several hours of it for immersion... That is basically the way I learn but focusing on different material.
With LLMs, it is much easier to create your own study material nowadays, as you can ask to translate, break down and explain things as you go.
I believe this does not have to be perfect, simplicity is preferred. But it should be just enough for an LLM to take a glimpse and estimate users' level in given language.
[0] https://apps.apple.com/us/app/ai-anki-learning-fluentread/id...
> Anki is a registered trademark of Ankitects Pty Ltd.
https://apps.ankiweb.net/
But the algorithms are interesting, so I think a better title would have been "why submodular NP hard problems are cool" or something similar.
"Procrastination, perfectionism and writer's block are not moral flaws; nor are they caused by laziness, lack of discipline or lack of commitment. They are habits rooted in fear and scarcity - and the great news is that once we start alleviating our fears and resourcing ourselves abundantly, our procrastination and related problems are often remarkably easily solved."
It's directed at writers, but it's really for all perfectionists.
[1] https://tadoku.org/japanese/en/graded-readers-en
github here: https://github.com/fdietze/ravioli
prototype deployed here: https://raviolio.web.app/
See Also: https://en.wikipedia.org/wiki/Thousand_Character_Classic
Not saying that one is inherently more worthy than the other, but no surprise- the first group is usually better at actually _doing_ the thing
So maybe looking for high frequency words is good, but only high frequency words that you know. So the most coverage of the most high frequency words would be very bad. To get the most coverage of the most high frequency words, they'd have to be used in a lower frequency than they are normally, with less repetition in natural contexts, which enable the learner to build meaning. Unless the books were longer which makes degenerate the concept of concentrating common vocabulary in very few books (just read two 2000 page books!)
Reading a bunch of stuff with a concentrated dose of tons of words you don't know will leave you with absolutely no retention. If you know every word but one in e.g. a chapter, you'll probably remember that word forever. The concept is called comprehensible input - you set unfamiliar things in a background of familiar things.
If you want a book with the most unfamiliar vocabulary, it's called a dictionary. It contains all of the most commonly used words, and the least commonly used, too.
In fact, maybe this makes sense if you're going to be locked in a cell for 10 years, you want to learn a language starting from zero*, and only get to have a pocket dictionary and two other books (with a size limit.) You might want to have sample natural sentences for as many of the best words to know as you could.
The real algorithmic language learning trick is to write books that are interesting that use the fewest words (which would inevitably be the most important words to use to communicate but not the necessarily the most common words that natives use to communicate), and introduce new, useful words at a steady rate. That seems like how Capretz put together French in Action. It's also graded readers: I still remember the moment I realized that I could not only understand what was happening in the basic graded reader I'd accidentally picked up on a whim, but also I was interested in finding out what was going to happen next. It's been downhill from there.
-----
[*] or maybe from one? You would have to have some familiarity with the script, and it had better be a phonetic one. Otherwise, this would be just learning how to read a language. No English, no French, no Portuguese, no Chinese... although having poetry books might help, because you can be surer of vowel similarities and syllable breaks. Poetry books are not dense, however, and might bump against any size limit. And the vocabulary would be weird and not representative.