Language Support for Marginalia Search
Posted2 months agoActive2 months ago
marginalia.nuTechstory
calmmixed
Debate
60/100
Search EnginesNatural Language ProcessingInformation Retrieval
Key topics
Search Engines
Natural Language Processing
Information Retrieval
The Marginalia search engine has added language support, and the community discusses its features, limitations, and potential applications, with some users praising its simplicity and others criticizing its AI approach.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
42m
Peak period
8
0-3h
Avg / period
2.5
Comment distribution27 data points
Loading chart...
Based on 27 loaded comments
Key moments
- 01Story posted
Oct 21, 2025 at 2:48 AM EDT
2 months ago
Step 01 - 02First comment
Oct 21, 2025 at 3:30 AM EDT
42m after posting
Step 02 - 03Peak activity
8 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 23, 2025 at 3:02 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45653143Type: storyLast synced: 11/20/2025, 3:50:08 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I'm kinda allergic to writing "I did the thing" posts, so I can't help but tryhard and attempt to make them compelling somehow.
Writing in this manner is also very helpful in making sense of the work for myself. Takes a better understanding of the subject to thoroughly explain what you've built than to merely build it. Sometimes I've gone back and read through one of these updates to just get a refresher on what my thinking was when I built something.
Some fun context, I was trying to find a scanned copy of the first 'correct' book on optics (written by https://en.wikipedia.org/wiki/Ibn_al-Haytham). Possibly the first person to really use the scientific method in circa 1000CE (!!). And I found this (https://cudl.lib.cam.ac.uk/view/MS-PETERHOUSE-00209/103) filled with interesting optical diagrams like something out of my high school physics notebooks. Anyway - I was also thinking about how they might index interesting doodles in the margins. So it was on my mind.
Language detection and sentence splitting are the other two slow bits of processing.
[1] https://github.com/datquocnguyen/RDRPOSTagger
I'm asking this as one of my projects is a link aggregator similar to old reddit (and HN to some extent) and I would like to be able to present to users a search box, but without having to implement document indexing and search. (I assume ad principio that the website is already aligned ethically and technologically with what Marginalia stands for :D)
When it works, one of the things I have in mind is making a site search-esque functionality available, as well as exposing it via the public API so that it can be whiteboxed.
[1] https://www.marginalia.nu/tags/search-engine/
> Sentences are stemmed and POS-tagged. Sentences, with stemming and POS-tag data is fed into keyword extraction algorithms
IS AI, it's just old fashioned and bad AI. What he's trying will never work well, for the same reason rule-based machine translation never worked well: there are just too many rules and exceptions. Simplicity is great when you can have it, but with human language, simplicity was never on the table.
He's going to have to bite the bullet and use document embedding models sooner or later.
Likely I am totally not understanding what this search engine is for. I see this a lot on submissions here. I find something interesting sounding but I don’t understand the context. Maybe it’s just me, but it’s confusing.
If you read his about page, it is basically an anti-centralization anti-ad anti-spyware attempt at websearch. It is also "The project is independent in that it has no loans, no investors looking for a payday, no strings attached anywhere to pressure it into doing anything than providing as much and as good internet search as it is capable of."
It not indexing NYT seems precisely on brand.
Though since the search engine doesn't really apply much in terms of domain authority, this doesn't rank very highly, the websites that talk about Ezra Klein rank higher.
[1] https://marginalia-search.com/search?query=site%3Anytimes.co...
Where it particularly shines is finding highly specific results that get buried in other search engines. Some topics (particularly topics of high commercial interest) have become impossible to research on mainstream search engines. Marginalia will actually find informative articles about these topics rather than page after page of product results and spam.
It may not be useful to you if you’re not a researcher, writer, or someone who often needs to dig deeply into subjects beyond the level of common knowledge.
It's not a google replacement, and if you already know what you're looking for then it's probably not the right tool.
Maybe you're looking for mechanical keyboard discussions, then maybe a search for "mechanical keyboard" in the Blogs or Forums filters will provide results you are into.
It's also pretty good at unearthing weird stuff. Say you want to read up on Jack Parsons[3], that Jet Propulsion Lab guy who dabbled in occultism, fell in with Alistair Crowley and then got scammed out of his wealth by L Ron Hubbard, and finally blew himself up, well that is the sort of topic Marginalia Search generally excels at.
[1] https://marginalia-search.com/search?query=mechanical+keyboa...
[2] https://marginalia-search.com/search?query=mechanical+keyboa...
[3] https://marginalia-search.com/search?query=Jack+Parsons&prof...
I'm confused by this. TD-IDF incorporates the term frequency (the IDF part), which search engines precompute for the index as a whole. But so does BM25; its IDF formula is slightly different, but also relies on term frequencies. What's the difference?
When searching, doing BM25, it is a lot more accessible as you already fetch that information indirectly as part of looking up the documents lists, and this is typically only done up to about a dozen times per query.
Small UI issue: on Desktop, the left sidebar should be scrollable, because now on Firefox I can't reach the "Language" menu item in the search results view, unless I zoom-out.