Endless AI-Generated Wikipedia

Posted4 months agoActive3 months ago

Twixes

28 points

24 comments

seangoedecke.comTechstory

calmmixed

Debate

40/100

AI-Generated ContentWikipediaLlms

Key topics

AI-Generated Content

Wikipedia

Llms

The author created an 'Endless Wikipedia' where AI generates new pages based on existing ones, sparking discussions on its potential, limitations, and potential issues like hallucinated content and cost.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

42-48h

Avg / period

4.8

Comment distribution24 data points

Loading chart...

Based on 24 loaded comments

Key moments

01Story posted
Sep 25, 2025 at 5:13 AM EDT
4 months ago
Step 01
02First comment
Sep 26, 2025 at 10:22 PM EDT
2d after posting
Step 02
03Peak activity
12 comments in 42-48h
Hottest window of the conversation
Step 03
04Latest activity
Sep 28, 2025 at 8:54 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (24 comments)

Showing 24 comments

bawolff

4 months ago

7 replies

> I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70. I don’t really understand why anybody would do that.

Guess it wasn't so endless after all.

Author is assuming malice, but honestly bots clicking links is just what happens to every public site on the internet. Not to mention going down the link clicking rabbit hole is common among wikipedia readers.

All that said, i don't really see the point. Wikipedia's human controls is what makes it exciting.

dpark

4 months ago

1 reply

> but honestly bots clicking links is just what happens to every public site on the internet.

As a CS student ~20 years ago I wrote a small website to manage my todo list and hosted it on my desktop in the department. One day I found my items disappearing before my eyes. At first I assumed someone was intentionally messing with my app but logs indicated it was just a scraping bot someone was running.

It was a low stakes lesson on why GET should not mutate meaningful state. I knew when I built it anyone could click the links and I wasn’t bothered with auth since it was one accessible from within the department network. But I didn’t plan for the bots.

vunderba

4 months ago

Reminds me of the Spider of Doom which was a similar issue where "Get/Delete" links were hidden by simple javascript to see if the user was logged in. All of a sudden pages and content on the website began to mysteriously vanish.

You know what doesn't care about Javascript and tries to click every link on your page? A search engine's web crawler.

https://thedailywtf.com/articles/The_Spider_of_Doom

userbinator

4 months ago

Google and all the other search engines will crawl any public site too.

haileys

4 months ago

It’s a poetic end, considering that the very same scraping activity without regard for cost to site operators is how these models are trained to begin with.

leobg

4 months ago

Would have been ironic if it was the crawler from OpenAI… :)

UltraSane

4 months ago

You should always have per-IP rate limiting.

blourvim

4 months ago

more clicks means a bigger wiki which I guess should be the point, unless the generated articles lead to nonsensical strings which sucks, but should be reasonable to prevent

kristianp

4 months ago

New page generation has been re-enabled, with a rate limit and "using openai/gpt-oss-120b instead of Kimi-K2".

kiriberty

4 months ago

1 reply

This is a slippery slope to hallucinated hell

visarga

4 months ago

I would use Deep Research mode outputs. Sometimes I run multiple of these in parallel on different models, then compare between them to catch hallucinations. If I wanted to publish that, I would also doublecheck each citation link.

I think the idea is sound, the potential is to have a much larger AI-wikipedia than the human one. Can it cover all known entities, events, concepts and places? All scientific publications? It could get 1000x larger than Wikipedia and be a good pre-training source of text.

Covering a topic I would not make the AI agent try to find the "Truth" but just to analyze the distribution of information out there. What are the opinions, who has them? I would also test a host of models in closed book mode and put an analysis of how AI covers the topic on its own, it is useful information to have.

This method has the potential to create much higher quality text than usual internet scrape, in large quantities. It would be comparative analysis text connecting across many sources, which would be better for the model than training on separate pieces of text. Information needs to circulate to be understood better.

kristianp

4 months ago

I noticed it isn't that eager to generate links, for example the game names "Virtua Fighter" and "Daytona USA" are italicized, but not links in https://www.endlesswiki.com/wiki/sega_studio_tokyo

pinkmuffinere

3 months ago

I was curious how much I would have to click around to reach some strange hallucinations. It turns out it didn't take long! This page on Entity‑relationship model has the following line [1]:

"Tools such as Oracle Designer, Microsoft Visio, and open‑source platforms generate ER diagrams to aid developers in visualizing schema structures and ensuring Sean Goedecke."

I love the idea of "ensuring Sean Goedecke", and that developers are actively working to do so, lol! Something something John Connor something something

[1] https://www.endlesswiki.com/wiki/Entity%E2%80%91relationship...

000ooo000

4 months ago

>I’m not worried about one power user costing me a lot of money in inference

>edit: I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70.

avinashsonee

4 months ago

https://infinipedia.ai/

dcreater

4 months ago

So will this end up being part of the training dataset for future LLMs?

blourvim

4 months ago

I wonder if first link chain here also would lead to "Philosophy"

_def

4 months ago

Huh i found a dead end, 404

AaronAPU

4 months ago

This was literally the first idea I had at the initial GPT release. Prototyped it in about 30 minutes and then thought “bots will obviously just destroy this” and discarded it.

hliyan

4 months ago

Wouldn't this be better as a browser extension where the user can highlight some text on it and have it explained, like these: https://chromewebstore.google.com/search/ai%20explain?filter...

tehjoker

4 months ago

interesting idea, but while it is sold as a way to interact with the knowledge in a model, i suspect the rabbit hole effect and the most tantalizing information in it will be subtlety hallucinated. an efficient delivery vehicle for “computer madness”

j_juggernaut

4 months ago

Solved the Neon Genesis Evangelion challenge using Chatgpt Agents, take a look

oidar

4 months ago

hugged

indigodaddy

4 months ago

I'm trying to link to Philip Glass. This could take a while. Kinda fun and a bit reminiscint of googlewhacking or maybe the LLM equivalent of Six Degrees of Kevin Bacon, but it's gonna be way more than six to get to Philip Glass.

Edit, well shit looks like there is a Minimalism page, but it didn't make any names clickable. Sean, looks like you need to tweak the code a bit?

https://www.endlesswiki.com/wiki/minimalism

View full discussion on Hacker News

ID: 45370760Type: storyLast synced: 11/20/2025, 2:30:18 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN