Wikipedia as a Graph
Key topics
Delving into the fascinating world of Wikipedia as a graph, users are exploring the connections between seemingly unrelated topics on Wikigrapher, a tool that visualizes the links between Wikipedia pages. Some commenters, like Retr0id and axus, are debugging the tool, discovering that it requires exact page URLs and can be thrown off by minor discrepancies. Meanwhile, others are having fun testing the tool's limits, with someone7x marveling at the "torturous hops" between Belle's dad Maurice and Emperor Maurice, and rzzzt noting that it takes 12 steps to connect Kevin Bacon to Henry Kissinger. As users play with the tool, they're also pointing out related projects, like Wikidata and dbpedia, which extract Wikipedia data into graph formats.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
16m
Peak period
23
0-3h
Avg / period
6.3
Based on 57 loaded comments
Key moments
- 01Story posted
Aug 29, 2025 at 12:19 PM EDT
4 months ago
Step 01 - 02First comment
Aug 29, 2025 at 12:36 PM EDT
16m after posting
Step 02 - 03Peak activity
23 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 30, 2025 at 11:44 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Yup, checks out.
Love -> Time (magazine) -> Henry Kissinger
https://www.sixdegreesofwikipedia.com/?source=Love&target=He...
I thought it would be a few trivial steps to reach the Emperor Maurice from Belle’s dad Maurice, but the best I could do was 5 torturous hops between List of Beauty and the Beast Characters and the Maurice disambiguation page.
https://www.sixdegreesofwikipedia.com/?source=List+of+Disney...
Thanks for sharing this
Henry_Kissinger
https://m.wikidata.org/wiki/Wikidata:Main_Page
https://github.com/dbpedia
Here's the dbpedia page about DBpedia; https://dbpedia.org/resource/DBpedia which is extracted from the wikipedia page about DBpedia: https://en.wikpedia.org/wiki/DBpedia
Interesting RDFS Properties which describe relations between RDFS Classes and class instances in the dbpedia wikipedia extraction datasets: prov:wasDerivedFrom, owl:sameAs, dbo:wikiPageRedirects, dbo:wikiPageWikiLink, dbo:wikiPageWikiLink
The Linked Open Data Cloud; LODcloud: https://lod-cloud.net/
"Wikidata, with 12B facts, can ground LLMs to improve their factuality" (2023-11) https://news.ycombinator.com/item?id=38304290#38309408
/? knowledge graph llm: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=kno...
/? site:github.com inurl:awesome knowledge graph llm: https://www.google.com/search?q=site%253Agithub.com+inurl%25...
To train the robots as well
By excluding the category links at the bottom that contain all the recipients, there would still be a connection, but it would include the extra hop between the two that makes their connection more clear on the graph (Titanic -> Caruso -> Grammy Lifetime Achievement Award -> David Bowie.)
Otherwise, this is a fun little tool to play around with. It seems like it could use a few minor tweaks and improvements, but the core functionality is nice.
Sounds like a perfectly good connection to me, but "exclude categories" could still be a neat feature for exploring more indirect linkage. Not sure it would help in this case though -- is that actually a category page?
What the parent commenter is referring to is actually called a Navbox (https://en.wikipedia.org/wiki/Wikipedia:Navigation_template). Like @chatmasta, I think it would be interesting to label those types of links distinctly and allow excluding them.
Or perhaps alternatively, exclude the contents of those navigation templates, but allow using them as an additional node: David_Bowie -> Template:Grammy_Lifetime_Achievement_Award -> Enrico_Caruso. (In this case, that is redundant with the main non-template Grammy_Lifetime_Achievement_Award page.)
Its orthogonal to art.
Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.
Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025
If entries have a wikipedia article, it'll be linked to in the wikidata entry. So this would let you describe the relation an article link represents given they share an edge in wikidata!
For example: https://www.wikidata.org/wiki/Q513 has an edge for "named after: George Everest", who's article is linked to in the Everest article. If you could match those up, I think that could add some interesting context to the graph!
Everest -- links to (named after) --> George Everest
I'm guessing you know this, but for the passerby curious about Wikipedia drama:
Wikidata was founded back in 2012 after Google bought & closed its predecessor[2] to make the now-famous "Google Knowledge Graph". It was continuing a wave of interest in knowledge graphs going back to GOFAI (the "neat"[3] approach to AI), most famously advanced by Lenat's Cyc[4] as a path to intuitive algorithms. We obviously lost that particular war to the "scruffies" for good in 2022, but the well-known problems with LLMs highlight exactly why certain, structured, efficient knowledge graphs are also needed.
The aforementioned drama is that the project to integrate Wikidata into Wikipedia's citations has basically been on pause since 2017 after a lot of arguing[5], and this weekend's scheduled discussion[6] seems passive at best. This comes simply from the fact that the "editors" of Wikipedia--the people who spend countless hours researching content for free following strict rules--don't really care about AI paradigms! Specifically, they find the concept of citing the id of a work as opposed to writing out the whole citation dangerous.
Still, Wikidata is the "fastest growing wiki project" and backs a ton of Wikipedia stuff behind the scenes, such as fancy templates for the infoboxes on the top-right of pages. We've only got 1.65B items compared to Google's AI-curated 500B facts, but I have faith that 2026 will be the year of Wikidata regardless!
After all, is a knowledge base curated with scruffy NLP models until it's incomprehensibly-big still neat? ;)
[1] https://wikimediafoundation.org/what-we-do/wikimedia-project...
[2] https://en.wikipedia.org/wiki/Freebase_(database)
[3] [WARNING: 500KB PDF] https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...
[4] https://en.wikipedia.org/wiki/Cyc
[5] https://en.wikipedia.org/wiki/Wikipedia:Templates_for_discus...
[6] https://meta.wikimedia.org/wiki/WikiCite_2025/Proposals#Cite...
One of our projects in algorithms/data structures was to do a BFS on the Wikipedia dump. In 2007.
I have to question its accuracy.
From Jello I followed this route:
Jell-O -> All caps -> Typography -> Typesetting -> Written Language -> Language -> Communication -> Information -> Abstraction -> Rule of inference -> Premise -> Proposition -> Philosophy of Language -> Philosophy
Jell-O -> Brand -> Business -> Trade -> Goods and services -> Tangibility -> Perception -> Sense -> Biological system -> Biological network inference -> Inference -> Logical reasoning -> Mind -> Thought -> Cognition -> Mental state -> Mind (Loop!)
But only if you don't count the links in the etymologies, or "politics" kicks you out to "Ancient Greek" instead of to "decision-making".
This would be a directed acyclic graph like schema.org
https://m.wikidata.org/wiki/Property:P31
So we would take in all items with that property to make the graph. Although we might have to deal with multiple roots.
There are also other interesting linking relations:
It has been around for at least 15 years! https://news.ycombinator.com/item?id=1728592
It's a bit hard to read though with the text and lines intersecting each other, maybe you could render text inside a white background so it appears on top? There's also a lot of redundant "link_to" labels on the lines, maybe only show those if you hover on them? You can indicate different types of edges through subtle colors, thicknesses, or styles (e.g., dotted).
For context: https://blog.jxmo.io/p/there-is-only-one-model
I made this awhile back for more freeform browsing: https://wikijumps.com
Would love to integrate some of that relationship data
https://github.com/vasturiano/3d-force-graph
https://github.com/neuml/txtai/blob/master/examples/58_Advan...
If anyone is looking to start similar projects, I open-sourced a library to convert the wikipedia dump into a simpler format, along with a bunch of parsers: https://github.com/Zulko/wiki_dump_extractor . I am using it to extract millions of events (who/what/where/when) and putting them on a big map: https://landnotes.org/?location=u07ffpb1-6&date=1548&strictD...