Scaling Hnsws
Postedabout 2 months agoActiveabout 2 months ago
antirez.comTechstory
calmmixed
Debate
60/100
Vector DatabasesHnsw AlgorithmNearest Neighbor Search
Key topics
Vector Databases
Hnsw Algorithm
Nearest Neighbor Search
The article discusses the challenges of scaling Hierarchical Navigable Small World (HNSW) graphs for approximate nearest neighbor search, with commenters sharing their insights and experiences on the topic.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
5h
Peak period
31
0-12h
Avg / period
11
Comment distribution44 data points
Loading chart...
Based on 44 loaded comments
Key moments
- 01Story posted
Nov 11, 2025 at 9:11 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 11, 2025 at 1:57 PM EST
5h after posting
Step 02 - 03Peak activity
31 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 17, 2025 at 2:59 PM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45887466Type: storyLast synced: 11/20/2025, 4:32:26 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.
Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.
HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.
Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.
https://turbopuffer.com/docs/vector
Tried a million different things, but haven't heard of Turbopuffer yet. Any references on how they perform with such additional filters?
https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-sear...
1. As softwaredoug mentioned, you might want to filter results, potentially with a high filtration rate.
2. ANN isn't good enough. Suppose you need bounded accuracy with meaningfully sublinear time on a high-dimensional dataset. You're hosed.
Point (1) is just a repeat of a general observation that composition of nice data structures doesn't usually give you a nice data structure, even if it technially works. Creating a thing that does what you want without costing both arms and the president's leg requires actually understanding DS&A and applying it in your solution from the ground up.
Point (2) might seem irrelevant (after all, people are "building" stuff with RAG and whatnot nowadays aren't they?), but it's crucial to a lot of applications. Imagine, e.g., that there exists one correct result in your database. The guarantees provided by SOTA ANN solutions (on high-dimensional data) have a steep compute/correctness tradeoff, giving you an abysmal chance of finding your document without searching an eye-watering fraction of your database. I usually work around that with the relaxation that the result needs to be the best one but that its information can be fuzzy (admitting solutions which merge a bunch of low-dimensional queries, corrected via some symmetry in a later step), but if you actually need good KNN results then you're kind of hosed.
Basically my entire full-time job is spent prosecuting this argument. It is indeed true that many programmers are smart, but it is equally true that many programmers _are not_ smart, and those programmers have to contribute too. More hands is usually better than simpler systems for reasons that have nothing to do with technical proficiency.
If you are working on open source databases, or something close to the metal I agree with antirez, if you are working at some established tech business (e.g: a very old ecommerce site), I agree with you
The unfortunate reality is that a large cadre of people cannot handle such tools, and those people still have extremely valuable contributions to make.
I say this as a full-time research engineer at a top-10 university. We are not short on talent, new problems, or funding. There is ample opportunity to make our systems as simple/"pure" as possible, and I make that case vigorously. The fact remains that intentionally limiting scope for the sake of the many is often better than cultivating an elite few.
> The Hierarchical navigable small world (HNSW) algorithm is a graph-based approximate nearest neighbor search technique used in many vector databases.[1] Nearest neighbor search without an index involves computing the distance from the query to each point in the database, which for large datasets is computationally prohibitive. For high-dimensional data, tree-based exact vector search techniques such as the k-d tree and R-tree do not perform well enough because of the curse of dimensionality.
would love to hear this story as well now!
I write blog posts into TextEdit, just the white page to fill with text. Normally this is fine as I end writing and publish the blog post. This time after writing it I left it there for weeks: I wanted to refine it a bit, the post felt not "ready". Then I had to reboot the computer for an upgrade, and I magically quit the application hitting cancel when there was to save the document :-|
However rewriting it was fast, and the second version was better. So, it's fine. But starting from now I'll use the less pleasant (to me) Google Docs.
These are the insights behind DiskANN, which has replaced HNSW in most production systems.
past that, well, you should really go read the DiskANN paper instead of this article, product quantization is way way way way way more effective than simple int8 or binary quant.
here's my writeup from a year and a half ago: https://dev.to/datastax/why-vector-compression-matters-64l
and if you want to skip forward several years to the cutting edge, check out https://arxiv.org/abs/2509.18471 and the references list for further reading
* but it still matters more than a lot of people thought circa 2020
All this to say: the blog post tells mostly the conclusions, but to reach that design, many things were tried, including things that looked cooler but in the practice were not the best fit. It's not by chance that Redis HNSWs are easily able to go 50k full queries/sec in decent hardware.
Small world networks have been my obsession for almost a decade, from a biological and sociological evolution angle. Love this stuff.
This is the paper he's referring to, which I was just reading again yesterday :)
Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs" (2025) https://arxiv.org/abs/2412.01940
Also:
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data (2010) https://www.jmlr.org/papers/v11/radovanovic10a.html
As someone also interested in not just repair of the social fabric, but the evolution of gendered social dynamics in social species (biologically and culturally), it's interesting to me that there might be some property of networks that:
1. Makes any random point as close as possible to any other random point (all solutions are equally likely to be close to the unknown solution somewhere in the network)
2. Minimally edges must be maintained (valuable in social networks, where edges are relationships that have maintenance cost)
3. Involves some negotiation between hierarchy (aka entrepreneurial networks, with opportunity for value extraction thru rent-seeking traffic) and hubness (which dissolve capture points in networks via "weaving" them out of existence). All the short paths in a small-world pass through hubs, which necessarily maintain many more weaker connections in a much more "gossip" comms protocols
Am working on this stuff related to collective intelligence and democracy work, if anyone wants to be in touch https://linkedin.com/in/patcon-
I built a KV-backed HNSW in Rust using LMDB to address this exact usecase (https://github.com/nnethercott/hannoy). The trickiest part for sure is being clever with your IO ops since HNSW search requires tons of random reads to find the nearest neighbours. Especially if you're not on NVMe's (think EBS-based HNSW) you really start to feel the cost of those reads, even with all the fancy SIMD.
https://en.wikipedia.org/wiki/Vector_quantization