I Failed to Recreate the 1996 Space Jam Website with Claude

Posted26 days agoActive24 days ago

thecr0w

548 points

456 comments

j0nah.comTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

Web DevelopmentCreative_toolsWebsite Management

Key topics

Web Development

Creative_tools

Website Management

The quest to recreate the nostalgic 1996 Space Jam website using AI model Claude has sparked a lively debate about the limitations of modern AI. The original poster, thecr0w, was stumped by Claude's inability to accurately replicate the site's table-based layout, prompting commenters to chime in with insights on the challenges of working with early web design techniques and the constraints of AI models. Some suggested that Claude's struggles stemmed from its lack of training data on outdated specs and its weakness in processing visual information, with one commenter noting that the model's decomposition of images into semantic vector spaces destroys pixel-level information. As the discussion unfolded, a consensus emerged that understanding the inner workings of AI models is crucial for effective "prompt engineering," and thecr0w's experiment has shed light on the complexities of working with these tools.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

36m

Peak period

117

0-6h

Avg / period

22.9

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 7, 2025 at 12:18 PM EST
26 days ago
Step 01
02First comment
Dec 7, 2025 at 12:54 PM EST
36m after posting
Step 02
03Peak activity
117 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 9, 2025 at 4:27 PM EST
24 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (456 comments)

Showing 160 comments of 456

stonecharioteer

26 days ago

2 replies

I'm curious. Did you ask it to use tables and no CSS?

In 1996, We had only css1. Ask it to use tables to do this, perhaps.

lagniappe

26 days ago

1 reply

We actually had a myriad of competing specs. CSS proper wasn't released until december 1996.

thecr0wAuthor

26 days ago

2 replies

Any names for the competing specs? Maybe i could try re-prompting with that direction.

lagniappe

26 days ago

2 replies

Models are trained with content scraped from the net, for the most part. The availability of content pertaining to those specs is almost nil, and of no SEO value. Ergo, models for the most part will only have a cursory knowledge of a spec that your browser will never be able to parse because that isn't the spec that won.

Nonetheless, here is a link to a list of the specs you asked for: https://www.w3.org/Style/History/Overview.en.html

boie0025

26 days ago

1 reply

Thanks for sharing that. I read through a lot of this. Interesting to read those perspectives in the context of today.

lagniappe

26 days ago

Much obliged. Have a good weekend. Your new gray hairs are en route :)

mxfh

24 days ago

Almost nil?

Websites that started before 2000 tend to stick around and comparably quite well archived.

What has model training to do with SEO, that's outright detrimental to it for the most part.

This is the spec it can guarantee you that's in every large language model thats actually large.

https://www.rfc-editor.org/rfc/rfc1866

https://www.w3.org/TR/2018/SPSD-html32-20180315/

Why even bring up CSS this was before that.

You can just also learn with the knowledge of 1996

Selfhtml exits it pretty easy to limit the scope of authoring language to a given HTML version and target browser. Your LLM should have no problem with german.

https://wiki.selfhtml.org/wiki/Museum

some other docs:

https://rauterberg.employee.id.tue.nl/publications/WWW-Desig...

Just spec the year and target browser and target standard and you will get something better than just asking for visual accuracy.

The OG prompt is simply poor and loose:

"Your job is to recreate the landing page as faithfully as possible, matching the screenshot exactly."

wanderingstan

26 days ago

There were specs competing for adoption, but only tables (the old way) and CSS were actually adopted by browsers. So no point trying to use some other positioning technique.

thecr0wAuthor

26 days ago

Yes yes great question!

I tried your suggestion and also tried giving it various more general versions of the limitations presented by earlier generations.

Claude's instinct initially was actually to limit itself to less modern web standards.

Unfortunately, nothing got those planets to be in the right place.

Wowfunhappy

26 days ago

2 replies

Claude is not very good at using screenshots. The model may technically be multi-modal, but its strength is clearly in reading text. I'm not surprised it failed here.

fnordpiglet

26 days ago

2 replies

Especially since it decomposes the image into a semantic vector space rather than the actual grid of pixels. Once the image is transformed into patch embeddings all sense of pixels is entirely destroyed. The author demonstrates a profound lack of understanding for how multimodal LLMs function that a simple query of one would elucidate immediately.

The right way to handle this is not to build it grids and whatnot, which all get blown away by the embedding encoding but to instruct it to build image processing tools of its own and to mandate their use in constructing the coordinates required and computing the eccentricity of the pattern etc in code and language space. Doing it this way you can even get it to write assertive tests comparing the original layout to the final among various image processing metrics. This would assuredly work better, take far less time, be more stable on iteration, and fits neatly into how a multimodal agentic programming tool actually functions.

mcbuilder

26 days ago

1 reply

Yeah, this is exactly what I was thinking. LLMs don't have precise geometrical reasoning from images. Having an intuition of how the models work is actually.a defining skill in "prompt engineering"

thecr0wAuthor

26 days ago

1 reply

Yeah, still trying to build my intuition. Experiments/investigations like this help me. Any other blogs or experiments you'd suggest?

fnordpiglet

26 days ago

Asking your favorite LLM actually helps a lot. They generally are well trained on LLM papers unsurprisingly. In this case though it’s important to realize the LLM is incapable of seeing or hearing or reading. Everything has to be transformed into a vector space. Images are generally cut into patches (like 16x16) which are themselves transformed by several neural networks to convert them into a semantic space represented by the models parameters.

But this isn’t hugely different than your vision. You don’t see the pixel grid either. You have to use tools to measure things. You have the ability over time to iteratively interact with the image by perhaps counting grid lines but the LLM does not - it’s a one shot inference against this highly transformed image. They’ve gotten better at complex visual tasks including types of counting, but it’s not able to examine the image in any analytical way or even in its original representation. It’s just not possible.

It can however make tools that can. It’s very good at working with PIL and other image processing libraries or even writing image processing code de novo, and then using those to ground itself. Likewise it can not do math, but it can write a calculator that can do highly complex mathematics on its behalf.

thecr0wAuthor

26 days ago

Great, thanks for that suggestion!

dcanelhas

26 days ago

1 reply

Even with text, parsing content in 2D seems to be a challenge for every LLM I have interacted with. Try getting a chatbot to make an ascii-art circle with a specific radius and you'll see what I mean.

Wowfunhappy

26 days ago

I don't really consider ASCII art to be text. It requires a completely different type of reasoning. A blind person can be understand text if it's read out loud. A blind person really can't understand ASCII art if it's read out loud.

throwaway314155

26 days ago

2 replies

Somehow I suspect Claude Code (in an interactive session with trial, error, probing, critiquing, perusing, and all the other benefits you get) would do better. This example seems to assume Claude can do things in "one shot" (even the later attempts all seem to conceal information like it's a homework assignment).

That's not how to successfully use LLM's for coding in my experience. It is however perhaps a good demonstration of Claude's poor spatial reasoning skills. Another good demonstration of this is the twitch.tv/ClaudePlaysPokemon where Claude has been failing to beat pokemon for months now.

CharlesW

26 days ago

2 replies

Using https://github.com/anthropics/claude-code/tree/main/plugins/... with style-supporting instructions and context would've improved the outcome as well.

aidos

26 days ago

Is the skill effectively just adding a little extra context here though? Doesn’t strike me as the sort of context that would improve the outcome.

https://github.com/anthropics/claude-code/blob/main/plugins/...

thecr0wAuthor

26 days ago

thank you! I'll try this

thecr0wAuthor

26 days ago

Not a homework assignment, and no deliberate attempt to conceal information, just very long and repetitive logs. A lot of the same "insights" so I just didn't provide them here.

> That's not how to successfully use LLM's for coding in my experience.

Yeah agree. I think I was just a little surprised it couldn't one-shot given the simplicity.

999900000999

26 days ago

4 replies

Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

thecr0wAuthor

26 days ago

2 replies

ya, this is true. Another commenter also pointed out that my intention was to one-shot. I didn't really go too deeply into trying to try multiple iterations.

This is also fairly contrived, you know? It's not a realistic limitation to rebuild HTML from a screenshot because of course if I have the website loaded I can just download the HTML.

swatcoder

26 days ago

1 reply

> rebuild HTML from a screenshot

???

This is precisely the workflow when a traditional graphic designer mocks up a web/app design, which still happens all the time.

They sketch a design in something like Photoshop or Illustrator, because they're fluent in these tools and many have been using them for decades, and somebody else is tasked with figuring out how to slice and encode that design in the target interactive tech (HTML+CSS, SwiftUI, QT, etc).

Large companies, design agencies, and consultancies with tech-first design teams have a different workflow, because they intentionally staff graphic designers with a tighter specialization/preparedness, but that's a much smaller share of the web and software development space than you may think.

There's nothing contrived at all about this test and it's a really great demonstration of how tools like Claude don't take naturally to this important task yet.

thecr0wAuthor

26 days ago

You know, you're totally right and I didn't even think about that.

Retric

26 days ago

It’s not unrealistic to want to revert to an early version of something you only have a screenshot of.

smallnix

26 days ago

1 reply

That's not the point of the article. It's about Claude/LLM being overconfident in recreating pixel perfect.

jacquesm

26 days ago

All AI's are overconfident. It's impressive what they can do, but it is at the same time extremely unimpressive what they can't do while passing it off as the best thing since sliced bread. 'Perfect! Now I see the problem.'. 'Thank you for correcting that, here is a perfect recreation of problem 'x' that will work with your hardware.' (never mind the 10 glaring mistakes).

I've tried these tools a number of times and spent a good bit of effort on learning to maximize the return. By the time you know what prompt to write you've solved the problem yourself.

GeoAtreides

26 days ago

2 replies

>which is something OP can manually fix

what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.

alickz

26 days ago

1 reply

>what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong?

the same thing that always happens if a dev gets something wrong without even knowing it's wrong - either code review/QA catches it, or the user does, and a ticket is created

>if it fails here, it will fail with other things, in not such obvious ways.

is infallibility a realistic expectation of a software tool or its operator?

GeoAtreides

26 days ago

By sheer chance, there's now a HN submission that answers both (but mostly the second) questions PERFECTLY:

https://news.ycombinator.com/item?id=46185957

godelski

26 days ago

I think that's the main problem with them. It is hard to figure out when they're wrong.

As the post shows, you can't trust them when they think they solved something but you also can't trust them when they think they haven't[0]. The things are optimized for human preference, which ultimately results in this being optimized to hide mistakes. After all, we can't penalize mistakes in training when we don't know the mistakes are mistakes. The de facto bias is that we prefer mistakes that we don't know are mistakes than mistakes that we do[1].

Personally I think a well designed tool makes errors obvious. As a tool user that's what I want and makes tool use effective. But LLMs flip this on the head, making errors difficult to detect. Which is incredibly problematic.

[0] I frequently see this in a thing it thinks is a problem but actually isn't, which makes steering more difficult.

[1] Yes, conceptually unknown unknowns are worse. But you can't measure unknown unknowns, they are indistinguishable from knowns. So you always optimize deception (along with other things) when you don't have clear objective truths (most situations).

bigstrat2003

26 days ago

4 replies

> it just can't get the order right which is something OP can manually fix.

If the tool needs you to check up on it and fix its work, it's a bad tool.

markbao

26 days ago

1 reply

“Bad” seems extreme. The only way to pass the litmus test you’ve described is for a tool to be 100% perfect, so then the graph looks like 99.99% “bad tool” until it reaches 100% perfection.

It’s not that binary imo. It can still be extremely useful and save a ton of time if it does 90% of the work and you fix the last 10%. Hardly a bad tool.

It’s only bad tool if you spent more time fixing the results than building it yourself, which sometimes used to be the case for LLMs but is happening less and less as they get more capable.

a4isms

26 days ago

1 reply

If you show me a tool that does a thing perfectly 99% of the time, I will stop checking it eventually. Now let me ask you: How do you feel about the people who manage the security for your bank using that tool? And eventually overlooking a security exploit?

I agree that there are domains for which 90% good is very, very useful. But 99% isn't always better. In some limited domains, it's actually worse.

999900000999

26 days ago

1 reply

Counterpoint.

Humans don't get it right 100% or the time.

a4isms

24 days ago

That is a true and useful component of analyzing risk, but the point is that human behaviour isn't a simple risk calculation. We tend to over-guard against things that subjectively seem dangerous, and under-guard against things that subjectively feel safe.

This isn't about whether AI is statistically safer, it's actually about the user experience of AI: If we can provide the same guidance without lulling a human backup into complacency, we will have an excellent augmented capability.

wvenable

26 days ago

Perfection is the enemy of good.

godelski

26 days ago

I wouldn't go that far, but I do believe good tool design tries to make its failure modes obvious. I like to think of it similar to encryption: hard to do, easy to verify.

All tools have failure modes and truthfully you always have to check the tool's work (which is your work). But being a master craftsman is knowing all the nuances behind your tools, where they work, and more importantly where they don't work.

That said, I think that also highlights the issue with LLMs and most AI. Their failure modes are inconsistent and difficult to verify. Even with agents and unit tests you still have to verify and it isn't easy. Most software bugs are created from subtle things, often which compound. Which both those things are the greatest weaknesses of LLMs: nuance and compounding effects.

So I still think they aren't great tools, but I do think they can be useful. But that also doesn't mean it isn't common for people to use them well outside the bounds of where they are generally useful. It'll be fine a lot of times, but the problem is that it is like an alcohol fire[0]; you don't know what's on fire because it is invisible. Which, after all, isn't that the hardest part of programming? Figuring out where the fire is?

[0] https://www.youtube.com/watch?v=5zpLOn-KJSE

mrweasel

26 days ago

That's my thinking. If I need to check up on the work, then I'm equally capable of writing the code myself. It might go faster with an LLM assisting me, and that feels perfectly fine. My issue is when people use the AI tools to generate something far beyond their own capabilities. In those cases, who checks the result?

supern0va

26 days ago

1 reply

Honestly, if you had showed this article to me even eighteen months ago, I would have been blown away at how good of a job Claude did.

It's remarkable how high our expectations have been steadily creeping.

WhyOhWhyQ

26 days ago

1 reply

This comment is missing the point. The real goal of all this is not to amaze. It's to create better software. Let's graduate past the amazement phase into the realism phase as soon as possible. What parts of my project is the LLM for? That is the real question worth asking.

supern0va

26 days ago

Oh, to be clear, this isn't a criticism. I think it's super cool that we're moving onto the nitpick/refinement phase of this tech. :)

dreadnip

26 days ago

1 reply

Why involve an LLM in this? Just download the site?

ChrisArchitect

26 days ago

1 reply

Yeah, Internet Archive has lots of copies https://web.archive.org/web/20250000000000*/https://www.spac... also

What's with the panicked pleas and need to preserve the site, assuming locally...?

GeoAtreides

26 days ago

The post is clearly about something else than preserving https://www.spacejam.com/1996/

It seems to me the post is about how Claude fails to recreate a very simple website from 1996.

sigseg1v

26 days ago

4 replies

Curious if you've tested something such as:

- "First, calculate the orbital radius. To do this accurately, measure the average diameter of each planet, p, and the average distance from the center of the image to the outer edge of the planets, x, and calculate the orbital radius r = x - p"

- "Next, write a unit test script that we will run that reads the rendered page and confirms that each planet is on the orbital radius. If a planet is not, output the difference you must shift it by to make the test pass. Use this feedback until all planets are perfectly aligned."

turnsout

26 days ago

1 reply

Yes, this is a key step when working with an agent—if they're able to check their work, they can iterate pretty quickly. If you're in the loop, something is wrong.

That said, I love this project. haha

monsieurbanana

26 days ago

1 reply

I'm trying to understand why this comment got downvoted. My best guess is that "if you're in the loop, something is wrong" is interpreted as there should be no human involvement at all.

The loop here, imo, refers to the feedback loop. And it's true that ideally there should be no human involvement there. A tight feedback loop is as important for llms as it is for humans. The more automated you make it, the better.

turnsout

26 days ago

Yes, maybe I goofed on the phrasing. If you're in the feedback loop, something is wrong. Obviously a human should be "in the loop" in the sense that they're aware of and reviewing what the agent is doing.

Aurornis

26 days ago

2 replies

This is my experience with using LLMs for complex tasks: If you're lucky they'll figure it out from a simple description, but to get most things done the way you expect requires a lot of explicit direction, test creation, iteration, and tokens.

One of the keys to being productive with LLMs is learning how to recognize when it's going to take much more effort to babysit the LLM into getting the right result as opposed to simply doing the work yourself.

jazzyjackson

26 days ago

Re: tokens, there is a point where you have to decide what's worth it to you. I'd been unimpressed with what I could get out of chat apps but when I wanted to do a rails app that would cost me thousands in developer time and some weeks between communication zoom meetings and iteration... I bit the bullet and kept topping up Claude API and spent about $500 on Opus over the course of a weekend, but the site is done and works great.

jacquesm

26 days ago

It would not be the first time that an IT services provider makes more money the worse their products perform.

thecr0wAuthor

26 days ago

2 replies

Hm, I didn't try exactly this, but I probably should!

Wrt unit test script, let's take Claude out of the equation, how would you design the unit test? I kept running into either Claude or some library not being capable of consistently identifying planet vs non planet which was hindering Claude's ability to make decisions based on fine detail or "pixel coordinates" if that makes sense.

cfbradford

26 days ago

1 reply

Do you give Claude the screenshot as a file? If so I’d just ask it to write a tool to diff each asset to every possible location in the source image to find the most likely position of each asset. You don’t really need recognition if you can brute force the search. As a human this is roughly what I would do if you told me I needed to recreate something like that with pixel perfect precision.

thecr0wAuthor

26 days ago

Ok! will give it a shot. In a few iterations I gave him screenshots, i have given him the ability to take screenshots, and I gave him the Playwright MCP. I kind of gave up on the path you're suggesting (though I didn't get super far along) because I felt like I would run into this problem eventually of needing a model to figure out what a planet is, where the edge of the planet is, etc.

But if that could be done deterministically, I totally agree this is the way to go. I'll put some more time into it over the next couple weeks.

yfontana

25 days ago

If I were to do this (and I might give it a try, this is quite an interesting case), I would try to run a detection model on the image, to find bounding boxes for the planets and their associated text. Even a small model running on CPU should be able to do this relatively quickly.

bluedino

26 days ago

Congratulations, we finally created 'plain English' programming languages. It only took 1/10th of the worlds electricity and 40% of the semiconductor production.

micromacrofoot

26 days ago

1 reply

I wouldn't call it entirely undefeated, it got maybe 90% of the way there. Before LLMs you couldn't get 50% of the way there in an automated way.

> What he produces

I feel like personifying LLMs more than they currently are is a mistake people make (though humans always do this), they're not entities, they don't know anything. If you treat them too human you might eventually fool yourself a little too much.

thecr0wAuthor

26 days ago

1 reply

As a couple other comments pointed out, it's also not fair to judge Claude based on a one shot like this. I sort of assume these limitations will remain even if we went back and forth but to be fair, I didn't try that more than a few times in this investigation. Maybe on try three it totally nails it.

micromacrofoot

25 days ago

Very true, I would also caution this with test projects with real humans in the hiring process. Comparing one-shots from actual people is unfair too, and often the most valid assessment comes with giving them feedback and seeing how they respond to it.

Aside from that point: if you are reading this and making people do a project as part of the hiring process, you should absolutely be paying them for their time (even a token amount).

daemonologist

26 days ago

2 replies

Interesting - these models are all trained to do pixel-level(ish) measurement now, for bounding boxes and such. I wonder if you could railroad it into being accurate with the right prompt.

Lerc

26 days ago

2 replies

What models are good at this? I have tried passing images to models and asking them for coordinates for specific features, then overlaid dots on those points and passed that image back to the model so it has a perception of how far out it was. It had a tendency to be consistently off by a fixed amount without getting closer.

I don't doubt that it is possible eventually, but I haven't had much luck.

Something that seemed to assist was drawing a multi coloured transparent chequerboard, if the AI knows the position of the grid colours it can pick out some relative information from the grid.

ryoshu

26 days ago

I can't do that either without opening up an image editing tool. Give the model a tool and goal with "vision". Should work better.

daemonologist

26 days ago

I've found Qwen3-VL to be fairly accurate at detection (though it doesn't always catch every instance). Note that it gives answers as per-mille-ages, as if the image was 1000x1000 regardless of actual resolution or aspect ratio.

I have also not had luck with any kind of iterative/guess-and-check approach. I assume the models are all trained to one-shot this kind of thing and struggle to generalize to what are effectively relative measurements.

sdenton4

26 days ago

Feels like the "right" approach would be to have it write some code to measure how far off the elements are in the original vs recreated image, and then iterate using the numerical output of the program...

johncoatesdev

26 days ago

1 reply

You last-minute cancelled coffee with your friends to work on this? I'm not sure how I would feel if a friend did that to me.

fishtoaster

26 days ago

Based on the later life updates, I suspect this was being humorous.

> After these zoom attempts, I didn't have any new moves left. I was being evicted. The bank repo'd my car. So I wrapped it there.

syassami

26 days ago

1 reply

We've lost the capability to build such marvels.

https://knowyourmeme.com/memes/my-father-in-law-is-a-builder...

barfoure

26 days ago

Lost it at wooden structures at the playground.

a-dub

26 days ago

1 reply

maybe ask it to use 1990s table based layout approaches?

al_borland

26 days ago

1 reply

Interesting. I just looked at the page source and it is in fact using a table layout. I always assumed it was an image map, which I assume would be even more obscure for the LLM.

thecr0wAuthor

26 days ago

1 reply

We should check the Wayback Machine, but in my memory this was built with an image map. Maybe like, 10 years ago or something. I was googling around when writing this post and saw that there are folks still tasked with making sure it's up and running. I wonder if they migrated it to tables at some point in the last decade.

sitharus

26 days ago

The January 1997 version used a server-side image map served via CGI: https://web.archive.org/web/19970124032137/http://spacejam.c...

This was soon moved to a static table layout with higher quality images: https://web.archive.org/web/19970412180040/http://www.spacej...

smoghat

26 days ago

3 replies

Ok, so here is an interesting case where Claude was almost good enough, but not quite. But I’ve been amusing myself by taking abandoned Mac OS programs from 20 years ago that I find on GitHub and bringing them up to date to work on Apple silicon. For example, jpegview, which was a very fast and simple slideshow viewer. It took about three iterations with Claude code before I had it working. Then it was time to fix some problems, add some features like playing videos, a new layout, and so on. I may be the only person in the world left who wants this app, but well, that was fine for a day long project that cooked in a window with some prompts from me while I did other stuff. I’ll probably tackle scantailor advanced next to clean up some terrible book scans. Again, I have real things to do with my time, but each of these mini projects just requires me to have a browser window open to a Claude code instance while I work on more attention demanding tasks.

egeozcan

26 days ago

1 reply

Side note: As a person who started using a mac since march, I found phoenix slides really good.

smoghat

26 days ago

It is! I was really just curious if I could update this old codebase without getting my hands dirty.

skrebbel

26 days ago

1 reply

> Ok, so here is an interesting case where Claude was almost good enough, but not quite.

You say that as if that’s uncommon.

jonplackett

26 days ago

1 reply

This should be the strap line for all AI (so far)

smoghat

26 days ago

1 reply

That's fair. But I always think of it as an intern I am paying $20 a month for or $200 a month. I would be kind of shocked if they could do everything as well as I'd hoped for that price point. It's fascinating for me and worth the money.

I am lucky that I don't depend on this for work at a corporation. I'd be pulling my hair out if some boss said "You are going to be doing 8 times as much work using our corporate AI from now on."

jonplackett

26 days ago

1 reply

Don get me wrong, doing 80% of my work for me is still great. And I’m actually quite glad I’m still needed for the other 20%

jasonkester

26 days ago

The problem is that your intern in this case is doing 1600% of the work, and now it’s your job to find and remove that extra 1520% so that you’re left with something usable.

mabedan

25 days ago

1 reply

Interesting. I switched to the Mac in 2005, and what I missed the most was the fact that in windows you could double click an image and then tap the left and right keys to browse other photos in the same folder. I learned objective c and made an app for it back then, but never published. I guess the jpegview fulfilled a similar purpose.

pwython

25 days ago

1 reply

I switched to Mac in 2008. I forget if the featured existed back then, but today on macOS if you press spacebar on an image in Finder to preview, you can use the arrow keys to browse other photos.

mabedan

25 days ago

Right. They introduced quick look soon after, but still not ideal. If you interact with the finder in any way, “quicklooked” item changes.

pluc

26 days ago

2 replies

I like how the author calls a script on the internet "him".

NooneAtAll3

26 days ago

3 replies

better than using a plural for a single entity

lillesvin

26 days ago

Come on, just stop. "They" have been used to refer to singular antecedents since the 14th century. (Source: https://www.oed.com/discover/a-brief-history-of-singular-the...)

shwaj

26 days ago

“it”

ga_to

26 days ago

To each their own.

thecr0wAuthor

26 days ago

lol

docheinestages

26 days ago

2 replies

> Note: please help, because I'd like to preserve this website forever and there's no other way to do it besides getting Claude to recreate it from a screenshot.

Why not use wget to mirror the website? Unless you're being sarcastic.

$ wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org

Source: https://superuser.com/questions/970323/using-wget-to-copy-we...

malfist

26 days ago

Because that wasn't the goal of this exercise

thecr0wAuthor

26 days ago

The stuff about not being able to download it is a bit of a joke and I don't think the tone landed with everybody haha. This was just an experiment to see if Claude could recreate a simple website from a screenshot, of course to your point you could download it if you wanted.

th0ma5

26 days ago

1 reply

I personally don't understand why asking these things to do things we know they can't do is supposed to be productive. Maybe for getting around restrictions or fuzzing... I don't see it as an effective benchmark unless it can link directly to the ways the models are being improved, but, to look at random results that sometimes are valid and think more iterations of randomness will eventually give way to control is a maddening perspective to me, but perhaps I need better language to describe this.

thecr0wAuthor

26 days ago

1 reply

I think this is a reasonable take. I think for me, I like to investigate limitations like this in order to understand where the boundaries are. Claude isn't impossibly bad at analyzing images. It's just pixel perfect corrections that seem to be a limitation. Maybe for some folks it's enough to just read that but for me, I like to feel like I have some good experiential knowledge about the limitations that I can keep in my brain and apply appropriately in the future.

th0ma5

25 days ago

Yes but follow this forward, what about current models would be informative about future models. We've seen waves of "insight" come and go, to the point where there are endless waves of people at different points in the journey, there's a cohort of people that would be upset at the statement that prompt engineering is useless, and others that would support that as exactly right, and still more that would have a redefinition of the word prompt to include many other things. This is my exact complaint. You would want it work like how you want it to work, that our collective discoveries will turn into education and learning, but the content in the models and the subsequent inference based on that information have all not behaved like the physical sciences with regards to discoveries providing universal and reliable knowledge.

jacobsenscott

26 days ago

4 replies

> here's no other way to do it besides getting Claude to recreate it from a screenshot

And

> I'm an engineering manager

I can't tell if this is an intentional or unintentional satire of the current state of AI mandates from management.

master_crab

26 days ago

1 reply

Honest question: does he know about F5? Or was it intentional to use screenshots when source is available?

Mashimo

26 days ago

What is F5? Beside refresh in the browser?

thecr0wAuthor

26 days ago

lololol

dmd

26 days ago

i can’t tell if your comment is satire or not

chilmers

26 days ago

You really can’t tell? Perhaps the bar for AGI is lower than I thought.

thuttinger

26 days ago

2 replies

Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things. There are a lot of problems that are easy to get right for a junior web dev but impossible for an LLM. On the other hand, I was able to write a C program that added gamma color profile support to linux compositors that don't support it (in my case Hyprland) within a few minutes! A - for me - seemingly hard task, which would have taken me at least a day or more if I didn't let Claude write the code. With one prompt Claude generated C code that compiled on first try that:

- Read an .icc file from disk

- parsed the file and extracted the VCGT (video card gamma table)

- wrote the VCGT to the video card for a specified display via amdgpu driver APIs

The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).

littlecranky67

26 days ago

2 replies

> Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things

Because the rendered output (pixels, not HTML/CSS) is not fed as data in the training. You will find tons of UI snippets and questions, but they rarely included screenshots. And if they do, the are not scraped.

Wowfunhappy

26 days ago

2 replies

Interesting thought. I wonder if Anthropic (and others) could include some sort of render-html-to-screenshot as part of the training routine. (So the rendered output would get included as training data.)

btown

26 days ago

Even better, a tool that can tell the rendered bounding box of any set of elements, and what the distances between pairs of elements are, so it can make adjustments if relative positioning doesn't match its expectation. This would be incredible for SVG generation for diagrams, too.

KaiserPro

26 days ago

thats basically a VLM, but the problem is that describing the world requires a better understanding of the world. Hence why LeCunn is talking about world models (Its also cutting edge for teaching robots to manipulate and plan manipulations)

ubercow13

26 days ago

1 reply

Why wouldn't they be?

littlecranky67

26 days ago

1 reply

Why would they be?

ubercow13

25 days ago

Well, I don't know but many LLMs are multimodal and understand pictures and images. You can upload videos to Gemini and they're tokenised and fed into the LLM. If some programming blog post has a screenshot with the result of some UI code, why would that not be scraped and used for training? Is there some reason that wouldn't be possible?

jacquesm

26 days ago

5 replies

Claude didn't write that code. Someone else did and Claude took that code without credit to the original author, adapted it to your use case and then presented it as its own creation to you and you accepted this. If a human did this we probably would have a word for them.

Mtinie

26 days ago

3 replies

> If a human did this we probably would have a word for them.

I don’t think it’s fair to call someone who used Stack Overflow to find a similar answer with samples of code to copy to their project an asshole.

jacquesm

26 days ago

1 reply

Who brought Stack Overflow up? Stack Overflow does not magically generate code, someone has to actually provide it first.

Mtinie

26 days ago

1 reply

I generally agree with your underlying point concerning attribution and intellectual property ownership but your follow-up comment reframes your initial statement: LLMs generate recombinations of code from code created by humans, without giving credit.

Stack Overflow offers access to other peoples’ work, and developers combined those snippets and patterns into their own projects. I suspect attribution is low.

jacquesm

26 days ago

2 replies

Stack Overflow deals with that issue by having a license agreement.

Mtinie

26 days ago

1 reply

GitHub, Bitbucket, GCE, AWS…all have licensing agreements for user contributions which the user flagged as “public” so I’m not exactly clear of your point if you are holding SO up as a bastion of intellectual property rights different from the other places LLM training sets were scraped from.

jacquesm

26 days ago

I was not the person that introduced SO to the discussion.

mbesto

26 days ago

To be fair, their license agreement is pretty much impossible to enforce.

sublinear

26 days ago

1 reply

Using stack overflow recklessly is definitely asshole behavior.

Mtinie

26 days ago

Recklessly is a strong word. I’ll give you the benefit of the doubt and assume your comment in good faith.

How do you describe the “reckless” use of information?

bluedino

26 days ago

It has been for the last 15 years.

FanaHOVA

26 days ago

2 replies

Are you saying that every piece of code you have ever written contains a full source list of every piece of code you previously read to learn specific languages, patterns, etc?

Or are you saying that every piece of code you ever wrote was 100% original and not adapted from any previous codebase you ever worked in or any book / reference you ever read?

jacquesm

26 days ago

1 reply

What's with the bad takes in this thread. That's two strawmen in one comment, it's getting a bit crowded.

DangitBobby

26 days ago

2 replies

Or the original point doesn't actually hold up to basic scrutiny and is indistinguishable from straw itself.

jacquesm

26 days ago

1 reply

HN has guidelines for a reason.

incr_me

26 days ago

You're adhering to an excess of rules, methinks!

tovej

26 days ago

1 reply

The original point, that LLMs are plagiarising inputs, is a very common and common sense opinion.

There are court cases where this is being addressed currently, and if you think about how LLMs operate, a reasonable person typically sees that it looks an awful lot like plagiarism.

If you want to claim it is not plagiarism, that requires a good argument, because it is unclear that LLMs can produce novelty, since they're literally trying to recreate the input data as faithfully as possible.

DangitBobby

25 days ago

2 replies

I need you to prove to me that it's not plagiarism when you write code that uses a library after reading documentation, I guess.

> since they're literally trying to recreate the input data as faithfully as possible.

Is that how they are able to produce unique code based on libraries that didn't exist in their training set? Or that they themselves wrote? Is that how you can give them the documentation for an API and it writes code that uses it? Your desire to make LLMs "not special" has made you completely blind to reality. Come back to us.

tovej

25 days ago

1 reply

What?

The LLM is trained on a corpus of text, and when it is given a sequence of tokens, it finds a set of token that, when one of them is appended, make the resulting sequence most like the text in that corpus.

If it is given a sequence of tokens that is unlike anything in its corpus, all bets are off and it produces garbage, just like machine learning models in general: if the input is outside the learned distribution, quality goes downhill fast.

The fact that they've added a Monte Carlo feature to the sequence generation, which makes it sometimes select a token that is slightly less like the most exact match in the corpus does not change this.

LLMs are fuzzy lookup tables for existing text, that hallucinate text for out-of-distribution queries.

This is LLM 101.

If the LLM was only trained using documentation, then there would be no problem. If it would generate a design, look at the documentation, understand the semantics of both, and translate the design to code by using the documentation as a guide.

But that's not how it works. It has open source repositories in its corpus that it then recreates by chaining together examples in this stochastic parrot -method I described above.

DangitBobby

25 days ago

jacquesm

25 days ago

1 reply

No, you need to prove that it is not plagiarism when you use an LLM to produce a piece of code that you then claim as yours.

You have the whole burden of proof thing backwards.

DangitBobby

25 days ago

Oh wild, I was operating under the assumption that the law requires you to prove that a law was broken, but it turns out you need to prove it wasn't. Thanks!

pests

26 days ago

1 reply

While I generally agree with you, this "LLM is a human" comparisons really are tiresome I feel. It hasn't been proven and I don't know how many other legal issued could have solved if adding "like a human" made it okay. Google v Oracle? "oh, you've never learned an API??!?" or take the original Google Books controversy - "its reading books and memorizing them, like humans can". I do agree its different but I don't like this line of argument at all.

FanaHOVA

26 days ago

I agree, that's why I was trying to point out that saying "if a person did that we'd have a word for them" is useless. They are not people, and people don't behave like that anyway. It adds nothing to the discussion.

bsaul

26 days ago

1 reply

That's an interesting hypothesis : that LLM are fundamentally unable to produce original code.

Do you have papers to back this up ? That was also my reaction when i saw some really crazy accurate comments on some vibe coded piece of code, but i couldn't prove it, and thinking about it now i think my intuition was wrong (ie : LLMs do produce original complex code).

jacquesm

26 days ago

4 replies

We can solve that question in an intuitive way: if human input is not what is driving the output then it would be sufficient to present it with a fraction of the current inputs, say everything up to 1970 and have it generate all of the input data from 1970 onwards as output.

If that does not work then the moment you introduce AI you cap their capabilities unless humans continue to create original works to feed the AI. The conclusion - to me, at least - is that these pieces of software regurgitate their inputs, they are effectively whitewashing plagiarism, or, alternatively, their ability to generate new content is capped by some arbitrary limit relative to the inputs.

bfffbgfdcb

26 days ago

1 reply

I guess you can’t create original work either. Given you can’t reproduce the sum total of human output since 1970 by next Tuesday.

There’s something in what you’re saying, but until you refine it to something actually true, it’s just more slop.

jacquesm

26 days ago

I think my track record belies your very low value and frankly cowardly comment. If you have something to say at least do it under your real username instead of a throwaway.

andsoitis

26 days ago

1 reply

I like your test. Should we also apply to specific humans?

We all stand on the shoulders of giants and learn by looking at others’ solutions.

jacquesm

26 days ago

2 replies

That's true. But if we take your implied rebuttal then current level AI would be able to learn from current AI as well as it would learn from humans, just like humans learn from other humans. But so far that does not seem to be the case, in fact, AI companies do everything they can to avoid eating their own tail. They'd love eating their own tail if it was worth it.

To me that's proof positive they know their output is mangled inputs, they need that originality otherwise they will sooner or later drown in nonsense and noise. It's essentially a very complex game of Chinese whispers.

handoflixue

26 days ago

1 reply

Equally, of course, all six year olds need to be trained by other six year olds; we must stop this crutch of using adult teachers

subscribed

26 days ago

Beautiful, thank you.

andsoitis

26 days ago

I share that perspective.

measurablefunc

26 days ago

2 replies

This is known as the data processing inequality. Non-invertible functions can not create more information than what is available in their inputs: https://blog.blackhc.net/2023/08/sdpi_fsvi/.

xyzzy123

26 days ago

1 reply

Using this reasoning, would you argue that a new proof of a theorem adds no new information that was not present in the axioms, rules of inference and so on?

If so, I'm not sure it's a useful framing.

measurablefunc

26 days ago

1 reply

Sound deductive rules of logic can not create novelty that exceeds the inherent limits of their foundational axiomatic assumptions. You can not expect novel results from neural networks that exceed the inherent information capacity of their training corpus & the inherent biases of the neural network (encoded by its architecture). So if the training corpus is semantically unsound & inconsistent then there is no reason to expect that it will produce logically sound & semantically coherent outputs (i.e. garbage inputs → garbage outputs).

xyzzy123

26 days ago

1 reply

Maybe? But it also seems like you are ignoring that you can introduce new information at inference time. Let's pretend I agree the LLM is a plagiarism machine that can produce no novelty in and of itself that didn't come from what it was trained on, and produces mostly garbage (I only half agree lol, and I think "novelty" is under-specified here).

When I apply that machine (with its giant pool of pirated knowledge) _to my inputs and context_ I can get results applicable to my modestly novel situation which is not in the training data. Perhaps the output is garbage. Naturally if my situation is way out of distribution I cannot expect very good results.

But I often don't care if the results are garbage some (or even most!) of the time if I have a way to ground-truth whether they are useful to me. This might be via running a compile, a test suite, a theorem prover or mk1 eyeball. Of course the name of the game is to get agents to do this themselves and this is now fairly standard practice.

measurablefunc

26 days ago

1 reply

I'm not here to convince you whether Markov chains are helpful for your use cases or not. I know from personal experience that even in cases where I have a logically constrained query I will receive completely nonsensical responses¹.

¹https://chatgpt.com/share/69367c7a-8258-8009-877c-b44b267a35...

jacquesm

26 days ago

> Here is a correct, standard correction:

It does this all the time, but as often as not then outputs nonsense again, just different nonsense, and if you keep it running long enough it starts repeating previous errors (presumably because some sliding window is exhausted).

cornel_io

26 days ago

Theoretical "proofs" of limitations like this are always unhelpful because they're too broad, and apply just as well to humans as they do to LLMs. The result is true but it doesn't actually apply any limitation that matters.

andrepd

26 days ago

Excellent observation.

giancarlostoro

26 days ago

You mean like copying and pasting code from Stack Overflow?

idiotsecant

26 days ago

Yes, the word for that is software developer.

iwontberude

26 days ago

Apropos given Warner Brothers Discovery just sold to Netflix

zitterbewegung

26 days ago

In actual workflows someone would accept a very close reproduction and fix the small issues. Generally I use systems to get close enough to a scaffolding and / or make small incremental improvements and direct its design

soared

26 days ago

I got quite close with Gemini 3 pro in AI studio. I uploaded a screenshot (no assets) and the results were similar to OP. It failed to follow my fix initially but I told it to follow my directions (lol) and it came quite close (though portrait mode distorted it, landscape was close to perfect.

“Reference the original uploaded image. Between each image in the clock face, create lines to each other image. Measure each line. Now follow that same process on the app we’ve created, and adjust the locations of each image until all measurements align exactly.”

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

hestefisk

26 days ago

Would be interesting to see whether Gemini could crack this problem.

bdcravens

26 days ago

A comparison would Codex would be good. I haven't done it with Codex, but when working through problems using ChatGPT, it does a great job when given screenshots.

296 more comments available on Hacker News

View full discussion on Hacker News

ID: 46183294Type: storyLast synced: 12/10/2025, 5:15:18 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN