Hierarchical Navigable Small World (hnsw) in Php

Posted8 days agoActive8 days ago

centamiv

79 points

15 comments

centamori.comProgrammingstory

informativepositive

Debate

0/100

HnswPhpAlgorithm

Key topics

Hnsw

Php

Algorithm

Diving into the world of efficient similarity search, a developer recently open-sourced a PHP implementation of Hierarchical Navigable Small World (HNSW), sparking a lively discussion around its potential applications and usability. Commenters were impressed by the author's clear explanations, including fantasy-based examples that made the complex concept more accessible. While some raised practical questions about using HNSW with large datasets, the author reassured that their implementation performs well with 1,000 documents and is agnostic to the AI model used for generating embeddings. The conversation also touched on the importance of providing clear instructions for generating search vectors, highlighting the need for a more comprehensive guide to make the project more approachable.

Snapshot generated from the HN discussion

Discussion Activity

Moderate engagement

First comment

N/A

Peak period

1-2h

Avg / period

2.7

Comment distribution16 data points

Loading chart...

Based on 16 loaded comments

Key moments

01Story posted
Jan 1, 2026 at 10:48 AM EST
8 days ago
Step 01
02First comment
Jan 1, 2026 at 10:48 AM EST
0s after posting
Step 02
03Peak activity
7 comments in 1-2h
Hottest window of the conversation
Step 03
04Latest activity
Jan 1, 2026 at 5:31 PM EST
8 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (15 comments)

Showing 16 comments

centamivAuthor

8 days ago

5 replies

OP here. I wrote this implementation to deeply understand the mechanics behind HNSW (layers, entry points, neighbor selection) without relying on external libraries. While PHP isn't the typical choice for vector search engines, I found it surprisingly capable for this use case, especially with JIT enabled on PHP 8.x. It serves as a drop-in solution for PHP monoliths that need semantic search features without adding the complexity of a separate service like Qdrant or Pinecone. If you want to jump straight to the code, the open-source repo is here: https://github.com/centamiv/vektor Happy to answer any questions about the implementation details!

hu3

8 days ago

1 reply

Great writeup. Thanks for talking the time to organise and share.

It's tempting to use this in projects that use PHP.

Is it useable with a corpus of like 1.000 3kb markdown files? And 10.000 files?

Can I also index PHP files so that searches include function and class names? Perhaps comments?

How much ram and disk memory we would be talking about?

And the speed?

My first goal would to index a PHP project and its documentation so that an LLM agent could perform semantic search using my MCP tool.

centamivAuthor

8 days ago

I tested it myself with 1k documents (about 1.5M vectors) and performance is solid (a few milliseconds per search). I haven't run more aggressive benchmarks yet.

Since it only stores the vectors, the actual size of the Markdown document is irrelevant; you just need to handle the embedding and chunking phases carefully (you can use a parser to extract code snippets).

RAM isn't an issue because I aim for random data access as much as possible. This avoids saturating PHP, since it wasn't exactly built for this kind of workload.

I'm glad you found the article and repo useful! If you use it and run into any problems, feel free to open an issue on GitHub.

Random09

8 days ago

1 reply

The only small thing you forgot to mention - it requires use of AI. Open Ai to be specific. I've got baited.

centamivAuthor

8 days ago

1 reply

Apologies if it felt that way! I used OpenAI in the examples just because it's the quickest 'Hello World' for embeddings right now, but the library itself is completely agnostic.

HNSW is just the indexing algorithm. It doesn't care where the vectors come from. You can generate them using Ollama (locally) HuggingFace, Gemini...

As long as you feed it an array of floats, it will index it. The dependency on OpenAI is purely in the example code, not in the engine logic.

devmor

8 days ago

1 reply

I think you'd get a lot more people interested in trying your project out if you included steps on how to generate vectors for the search as a document.

I love PHP, but I will realistically admit that most people interested in using PHP probably don't have the experience to know how to do such a thing offhand.

centamivAuthor

8 days ago

You are absolutely right. I will update the README with some examples, thanks for the feedback!

lukan

8 days ago

1 reply

Thanks a lot, I liked the fantasy based examples to explain the concept.

Programming is magic incarnation and spells after all. (And fighting against evil spirits and demons)

centamivAuthor

8 days ago

I'm really glad you liked the article! Thanks so much for reading the previous one too, I really appreciate it.

hilti

8 days ago

1 reply

Great article! I also read your other post and love it! This is exactly my thinking: Locality of Behavior (LoB)

Never heard this term before, but I like it.

https://centamori.com/index.php?slug=basics-of-web-developme...

centamivAuthor

8 days ago

Thanks for checking out the other posts too! I wasn't familiar with the term 'Locality of Behavior' until recently, but it perfectly captures what I strive for: readability and simplicity.

chuckadams

8 days ago

I went to the first article and really loved the nerd-accessible analogies, but I was a little crestfallen when it came to reading how embeddings were generated, which boiled down to "submit it to OpenAI and get a vector back". I get that it's an awesomely gnarly process, but I felt like I was missing out on learning some first principles.

Any more depth you could go into as far as generating embeddings goes would be wonderful. Especially if it uses more D&D analogies :)

fithisux

8 days ago

1 reply

It makes perfect sense to implement it in a high level language that allows understandability.

Very good contribution.

centamivAuthor

8 days ago

Thank you! That was exactly the goal. Modern PHP turned out to be surprisingly expressive for this kind of 'executable pseudocode'. Glad you appreciated it!

rvnx

8 days ago

1 reply

Cool blog post, smart guy, very thoughtful and not a copy-paste of Python code like 99% of folks. Nice to see

centamivAuthor

8 days ago

Thank you, really appreciate that

View full discussion on Hacker News

ID: 46454968Type: storyLast synced: 1/1/2026, 11:45:26 PM

Want the full context?