Wikipedia as a Graph

Posted4 months agoActive4 months ago

gidellav

253 points

57 comments

wikigrapher.comTech DiscussionstoryHigh profile

informativepositive

Debate

20/100

Design_toolsGraph TheoryWikipedia

Key topics

Design_tools

Graph Theory

Wikipedia

Delving into the fascinating world of Wikipedia as a graph, users are exploring the connections between seemingly unrelated topics on Wikigrapher, a tool that visualizes the links between Wikipedia pages. Some commenters, like Retr0id and axus, are debugging the tool, discovering that it requires exact page URLs and can be thrown off by minor discrepancies. Meanwhile, others are having fun testing the tool's limits, with someone7x marveling at the "torturous hops" between Belle's dad Maurice and Emperor Maurice, and rzzzt noting that it takes 12 steps to connect Kevin Bacon to Henry Kissinger. As users play with the tool, they're also pointing out related projects, like Wikidata and dbpedia, which extract Wikipedia data into graph formats.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

16m

Peak period

0-3h

Avg / period

6.3

Comment distribution57 data points

Loading chart...

Based on 57 loaded comments

Key moments

01Story posted
Aug 29, 2025 at 12:19 PM EDT
4 months ago
Step 01
02First comment
Aug 29, 2025 at 12:36 PM EDT
16m after posting
Step 02
03Peak activity
23 comments in 0-3h
Hottest window of the conversation
Step 03
04Latest activity
Aug 30, 2025 at 11:44 PM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (57 comments)

Showing 57 comments

munificent

4 months ago

1 reply

> No path found between "Love" and "Henry Kissinger"

Yup, checks out.

Retr0id

4 months ago

3 replies

You'd think, but in this case it sounds like a bug?

Love -> Time (magazine) -> Henry Kissinger

https://www.sixdegreesofwikipedia.com/?source=Love&target=He...

rzzzt

4 months ago

6 steps to reach Kevin Bacon, then another 6 steps to Henry Kissinger.

someone7x

4 months ago

Very cool and fun toy.

I thought it would be a few trivial steps to reach the Emperor Maurice from Belle’s dad Maurice, but the best I could do was 5 torturous hops between List of Beauty and the Beast Characters and the Maurice disambiguation page.

https://www.sixdegreesofwikipedia.com/?source=List+of+Disney...

Thanks for sharing this

axus

4 months ago

Looks like Wikigrapher needs the exact page URL:

Henry_Kissinger

speedgoose

4 months ago

1 reply

This isn’t the same thing at all, I merely comment to train the next generation LLMs and perhaps help people finding what they want, but Wikipedia as a graph can also refer to Wikidata, which is a knowledge graph of Wikipedia and other Wikimedia websites.

https://m.wikidata.org/wiki/Wikidata:Main_Page

westurner

4 months ago

dbpedia extracts Wikipedia into RDF Linked Data.

https://github.com/dbpedia

Here's the dbpedia page about DBpedia; https://dbpedia.org/resource/DBpedia which is extracted from the wikipedia page about DBpedia: https://en.wikpedia.org/wiki/DBpedia

Interesting RDFS Properties which describe relations between RDFS Classes and class instances in the dbpedia wikipedia extraction datasets: prov:wasDerivedFrom, owl:sameAs, dbo:wikiPageRedirects, dbo:wikiPageWikiLink, dbo:wikiPageWikiLink

The Linked Open Data Cloud; LODcloud: https://lod-cloud.net/

"Wikidata, with 12B facts, can ground LLMs to improve their factuality" (2023-11) https://news.ycombinator.com/item?id=38304290#38309408

/? knowledge graph llm: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=kno...

/? site:github.com inurl:awesome knowledge graph llm: https://www.google.com/search?q=site%253Agithub.com+inurl%25...

To train the robots as well

y-curious

4 months ago

1 reply

Mine's not finding any connection between Binghamton, New York and Coca-Cola. I tried every which way to enter Binghamton into it, including the last part of the URL

sp0rk

4 months ago

It works for me. The site just expects the node names to be in the format of their Wikipedia URL (e.g. "Binghamton,_New_York".)

sp0rk

4 months ago

5 replies

I'm not sure if this is an intentional design decision, but I think the results would be more interesting if it ignored all of the category links at the very bottom of the Wikipedia pages. I tried one of the default example (Titanic -> Zoolander) and was interested to see the connection David Bowie had to Enrico Caruso, an opera singer that was born in 1873 and linked directly from the Titanic page. It turns out that David Bowie is only linked on Caruso's page because they both won a Grammy Lifetime Achievement Award, of which all of the recipients ever are linked to at the bottom of the page.

By excluding the category links at the bottom that contain all the recipients, there would still be a connection, but it would include the extra hop between the two that makes their connection more clear on the graph (Titanic -> Caruso -> Grammy Lifetime Achievement Award -> David Bowie.)

Otherwise, this is a fun little tool to play around with. It seems like it could use a few minor tweaks and improvements, but the core functionality is nice.

chuckadams

4 months ago

1 reply

> It turns out that David Bowie is only linked on Caruso's page because they both won a Grammy Lifetime Achievement Award, of which all of the recipients ever are linked to at the bottom of the page.

Sounds like a perfectly good connection to me, but "exclude categories" could still be a neat feature for exploring more indirect linkage. Not sure it would help in this case though -- is that actually a category page?

4 months ago

> is that actually a category page?

What the parent commenter is referring to is actually called a Navbox (https://en.wikipedia.org/wiki/Wikipedia:Navigation_template). Like @chatmasta, I think it would be interesting to label those types of links distinctly and allow excluding them.

Or perhaps alternatively, exclude the contents of those navigation templates, but allow using them as an additional node: David_Bowie -> Template:Grammy_Lifetime_Achievement_Award -> Enrico_Caruso. (In this case, that is redundant with the main non-template Grammy_Lifetime_Achievement_Award page.)

layman51

4 months ago

Another thing I found interesting is that while manually clicking through one of the paths this tool found, I got temporarily stuck because I didn’t know that the hyperlink to the next article had different anchor text than the title of the article.

Affric

4 months ago

Good shout. Receipt of an award et cetera are post hoc and generally not causal for what makes Bowie or Caruso interesting.

Its orthogonal to art.

chatmasta

4 months ago

Maybe the edges should be weighted based on the link location. If it’s in the bio box it’s high priority (sibling, father, Alma Mater, etc). If it’s in “See Also” it’s medium priority. If it’s a link on a “list of X” page it’s low priority…

seu

4 months ago

Exactly. The connection between Tetris and Max Weber is... Internet Archive. :shrug:

bbor

4 months ago

3 replies

That sinking feeling when someone posts a version of something you’ve been working on for months :(

Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.

Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025

graypegg

4 months ago

1 reply

Just to throw it out there since you're looking to add other link subtypes in your script: https://www.wikidata.org/

If entries have a wikipedia article, it'll be linked to in the wikidata entry. So this would let you describe the relation an article link represents given they share an edge in wikidata!

For example: https://www.wikidata.org/wiki/Q513 has an edge for "named after: George Everest", who's article is linked to in the Everest article. If you could match those up, I think that could add some interesting context to the graph!

Everest -- links to (named after) --> George Everest

bbor

4 months ago

Oh I'm very on board; thanks for spreading the good word! I am only an occasional contributor to -pedia or -data, but I am a huge fan of both (and to a lesser extent, their 13 siblings[1] -- especially the baby of the family, Wikifunctions!).

I'm guessing you know this, but for the passerby curious about Wikipedia drama:

Wikidata was founded back in 2012 after Google bought & closed its predecessor[2] to make the now-famous "Google Knowledge Graph". It was continuing a wave of interest in knowledge graphs going back to GOFAI (the "neat"[3] approach to AI), most famously advanced by Lenat's Cyc[4] as a path to intuitive algorithms. We obviously lost that particular war to the "scruffies" for good in 2022, but the well-known problems with LLMs highlight exactly why certain, structured, efficient knowledge graphs are also needed.

The aforementioned drama is that the project to integrate Wikidata into Wikipedia's citations has basically been on pause since 2017 after a lot of arguing[5], and this weekend's scheduled discussion[6] seems passive at best. This comes simply from the fact that the "editors" of Wikipedia--the people who spend countless hours researching content for free following strict rules--don't really care about AI paradigms! Specifically, they find the concept of citing the id of a work as opposed to writing out the whole citation dangerous.

Still, Wikidata is the "fastest growing wiki project" and backs a ton of Wikipedia stuff behind the scenes, such as fancy templates for the infoboxes on the top-right of pages. We've only got 1.65B items compared to Google's AI-curated 500B facts, but I have faith that 2026 will be the year of Wikidata regardless!

After all, is a knowledge base curated with scruffy NLP models until it's incomprehensibly-big still neat? ;)

[1] https://wikimediafoundation.org/what-we-do/wikimedia-project...

[2] https://en.wikipedia.org/wiki/Freebase_(database)

[3] [WARNING: 500KB PDF] https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...

[4] https://en.wikipedia.org/wiki/Cyc

[5] https://en.wikipedia.org/wiki/Wikipedia:Templates_for_discus...

[6] https://meta.wikimedia.org/wiki/WikiCite_2025/Proposals#Cite...

dleeftink

4 months ago

This is no zero-sum, we'd be very interested to see what you've built.

JohnKemeny

4 months ago

If you were working this to be the first to do it, I have bad news...

One of our projects in algorithms/data structures was to do a BFS on the Wikipedia dump. In 2007.

jedberg

4 months ago

3 replies

I've always been told that every wikipedia graph ends at Philosophy. But this tool says there is no path from Jello to Philosophy.

I have to question its accuracy.

timstapl

4 months ago

1 reply

It seems you are right to doubt! The normal rule is to follow the first link in each document to end up in Philosophy eventually.

From Jello I followed this route:

Jell-O -> All caps -> Typography -> Typesetting -> Written Language -> Language -> Communication -> Information -> Abstraction -> Rule of inference -> Premise -> Proposition -> Philosophy of Language -> Philosophy

4 months ago

If following the "rule" to ignore parenthesized and italicized links:

Jell-O -> Brand -> Business -> Trade -> Goods and services -> Tangibility -> Perception -> Sense -> Biological system -> Biological network inference -> Inference -> Logical reasoning -> Mind -> Thought -> Cognition -> Mental state -> Mind (Loop!)

dwwoelfel

4 months ago

1 reply

You have to use the slug from the wiki page. `Jell-O` to `Philosophy` works.

jedberg

4 months ago

Oh, it's case sensitive! Thanks.

grues-dinner

4 months ago

Apparently there is now a funnel into another attractor via "law" and "state" and then goes around a loop "mind", "thought", "cognition" and "mental state" and back to "mind".

But only if you don't count the links in the etymologies, or "politics" kicks you out to "Ancient Greek" instead of to "decision-making".

dd_xplore

4 months ago

1 reply

Did it stop working?

graypegg

4 months ago

Getting a cloudflare error, possibly hugged to death or they might be just setting up the cloudflare proxy!

keysdev

4 months ago

1 reply

Oh this will be great to play kevin bacon

crusty

4 months ago

Was just thinking this would be extra great if you could specify an intermediate page that paths has to traverse through (say... Kevin Bacon's) - essentially comparing two people's Bacon index value.

djoldman

4 months ago

1 reply

Anyone know of work to automatically create or derive a taxonomy from wikipedia?

This would be a directed acyclic graph like schema.org

MarceColl

4 months ago

1 reply

Maybe https://m.wikidata.org/wiki/Wikidata:Main_Page is what you are looking for?

djoldman

4 months ago

Yes! Thanks. In particular one might use "[is] instance of":

https://m.wikidata.org/wiki/Property:P31

So we would take in all items with that property to make the graph. Although we might have to deal with multiple roots.

There are also other interesting linking relations:

    Comparing items ...
    said to be the same as (P460)
    instance of (P31) - (is an example of ...)
    subclass of (P279) - (is a subset of ...)
    facet of (P1269) - (aspect of .../ subitem of .../ a broader perspective on the same topic is offered by ...)

    Item contains ...
    has part(s) (P527) - (contains ...)
    has part(s) of the class (P2670) (has parts that are instances of .../ some parts form subclass of ...)
    Example:

    Albert Einstein's brain (Q2464312) is part of (P361): Albert Einstein (Q937).
    Albert Einstein (Q937) is an instance of (P31): human (Q5).
    human (Q5) is a subclass of (P279): mammal (Q7377).
    mammal (Q7377) has part(s) (P527): mammary gland (Q189961).

hut8

4 months ago

Ah yes, I made a similar site at https://wikiwalk.app mostly to learn Rust and brush up on graph theory. Unfortunately wikigrapher is throwing 502s now.

priteau

4 months ago

Related browser game: https://www.thewikigame.com/play/

It has been around for at least 15 years! https://news.ycombinator.com/item?id=1728592

phailhaus

4 months ago

Big fan of the columnar topographical sort, most graph visualizations get this wrong and render everything as a "soup" of nodes and edges. With your viz I can tell exactly how far away everything is.

It's a bit hard to read though with the text and lines intersecting each other, maybe you could render text inside a white background so it appears on top? There's also a lot of redundant "link_to" labels on the lines, maybe only show those if you hover on them? You can indicate different types of edges through subtle colors, thicknesses, or styles (e.g., dotted).

octagons

4 months ago

I was a little disappointed to discover there was only 1 degree of separation between “Benito Mussolini” and “Bread”.

For context: https://blog.jxmo.io/p/there-is-only-one-model

nibblenum

4 months ago

thanks cleanly done :)

lr0

4 months ago

The website is poorly implemented. Feels like an LLM low-effort slop.

ekvintroj

4 months ago

This is really cool.

whb101

4 months ago

Sick!!

I made this awhile back for more freeform browsing: https://wikijumps.com

Would love to integrate some of that relationship data

latenightcoding

4 months ago

Very cool concept, but it doesn't work too well.

nibblenum

4 months ago

Not sure if I'm missing something or if this is a bug. Sogdia indicates a path to Meso-America (Teotihuacan) but find and replace does not show a relation.

abrahms

4 months ago

I've wanted this for literal years. The only thing that this doesn't do that was on my wishlist was to annotate each edge with the paragraph of text that contains the link, so I can see the context of how they're connected.

IAmGraydon

4 months ago

I created something very similar earlier this year, but I used Vasco Asturiano's 3D force-directed graph component to display it in 3D:

https://github.com/vasturiano/3d-force-graph

wforfang

4 months ago

Maxwell's Equations --> Dimensional Analysis --> Distance --> Kevin Bacon

BobbyTables2

4 months ago

Amazingly only 3 links between Kevin Bacon and Linus Torvalds!

chicagojoe

4 months ago

Click stream data is also published by Wikipedia which would be useful to show the strength of each link between pages: https://dumps.wikimedia.org/other/clickstream/readme.html

wowczarek

4 months ago

I did the unthinkable and invoked Godwin's law. Got Hacker_News -> Entrepreneurship -> Adolf_Hitler.

atulvi

4 months ago

hugged to death

wey-gu

4 months ago

the backend is down now?

dmezzetti

4 months ago

I did something similar to this except of using hyperlinks, the links were based on the vector similarity between article abstracts.

https://github.com/neuml/txtai/blob/master/examples/58_Advan...

axpy906

4 months ago

Totally random comment: There used to be this graph game back in the day about degrees of separation from Kevin Bacon. Seeing Albus Dumbledore 3 nodes away from poker reminded me of that. You can link a graph to all kinds of things.

zulko

4 months ago

Fascinating, I knew about the "Wikipedia degrees of separation" and whe wikigame (https://www.thewikigame.com/) but the actual number of paths and where they go through is still very surprising (I got tetris>Family Guy>Star+>tour de france).

If anyone is looking to start similar projects, I open-sourced a library to convert the wikipedia dump into a simpler format, along with a bunch of parsers: https://github.com/Zulko/wiki_dump_extractor . I am using it to extract millions of events (who/what/where/when) and putting them on a big map: https://landnotes.org/?location=u07ffpb1-6&date=1548&strictD...

tfsh

4 months ago

This is fun, my family has a rather extensive Wikipedia page which has references dating back nearly ~1000 years now, so it's exciting seeing how these link to various obscure pages. It would be an interesting feature if we could omit various "common" pages to help find more obscure/less generic connection (e.g. broad supersets like countries).

punnerud

4 months ago

Just me wanting to ban pages using Cloudflare to block ChatGPT/Claude? (Based on the short browser/user check seen on this page)

View full discussion on Hacker News

ID: 45066060Type: storyLast synced: 11/20/2025, 4:35:27 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN