Show HN: I made a spaced-repetition-based language learning app
yap.townThe app "only" includes about the 3,000 most common words, so if you're past that level, I don't know how helpful it will be to you. I can easily extend this in the future, I just need bigger corpus with more data.
Also, I'm confused that you say you would need a bigger corpus for more words, since your readme says that you use the OpenSubtitles data from OPUS. Their 2024 release has tens of millions of sentences for each language, which surely should be enough for tens of thousands of unique words?
What you say about binary search as a good point. I initially used something more like a straightforward binary search, but the issue is that the ramp up is too quick and beginner users would end up adding a bunch of words that were way too advanced for that level. So I tried to make it less aggressive to avoid overshooting, but I guess that has the opposite issue of it taking longer for advanced users. I’ll think about what I can do about that.
For the corpus, I prefer to use Neri’s sentence lists as they’re much higher quality than opensubtitles. You’d be surprised at the problems it has. So I only use opensubtitles for korean (because Neri’s sentence lists doesn’t have a korean version).