Hierarchical Navigable Small World (hnsw) in Php
Key topics
Diving into the world of efficient similarity search, a developer recently open-sourced a PHP implementation of Hierarchical Navigable Small World (HNSW), sparking a lively discussion around its potential applications and usability. Commenters were impressed by the author's clear explanations, including fantasy-based examples that made the complex concept more accessible. While some raised practical questions about using HNSW with large datasets, the author reassured that their implementation performs well with 1,000 documents and is agnostic to the AI model used for generating embeddings. The conversation also touched on the importance of providing clear instructions for generating search vectors, highlighting the need for a more comprehensive guide to make the project more approachable.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
N/A
Peak period
7
1-2h
Avg / period
2.7
Based on 16 loaded comments
Key moments
- 01Story posted
Jan 1, 2026 at 10:48 AM EST
8 days ago
Step 01 - 02First comment
Jan 1, 2026 at 10:48 AM EST
0s after posting
Step 02 - 03Peak activity
7 comments in 1-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 1, 2026 at 5:31 PM EST
8 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
It's tempting to use this in projects that use PHP.
Is it useable with a corpus of like 1.000 3kb markdown files? And 10.000 files?
Can I also index PHP files so that searches include function and class names? Perhaps comments?
How much ram and disk memory we would be talking about?
And the speed?
My first goal would to index a PHP project and its documentation so that an LLM agent could perform semantic search using my MCP tool.
Since it only stores the vectors, the actual size of the Markdown document is irrelevant; you just need to handle the embedding and chunking phases carefully (you can use a parser to extract code snippets).
RAM isn't an issue because I aim for random data access as much as possible. This avoids saturating PHP, since it wasn't exactly built for this kind of workload.
I'm glad you found the article and repo useful! If you use it and run into any problems, feel free to open an issue on GitHub.
HNSW is just the indexing algorithm. It doesn't care where the vectors come from. You can generate them using Ollama (locally) HuggingFace, Gemini...
As long as you feed it an array of floats, it will index it. The dependency on OpenAI is purely in the example code, not in the engine logic.
I love PHP, but I will realistically admit that most people interested in using PHP probably don't have the experience to know how to do such a thing offhand.
Programming is magic incarnation and spells after all. (And fighting against evil spirits and demons)
Never heard this term before, but I like it.
https://centamori.com/index.php?slug=basics-of-web-developme...
Any more depth you could go into as far as generating embeddings goes would be wonderful. Especially if it uses more D&D analogies :)
Very good contribution.