We Ran Over 600 Image Generations to Compare AI Image Models
Key topics
A blog post compares the performance of three AI image generation models (OpenAI, Gemini, and Seedream) across over 600 image generations, sparking discussion on their strengths, weaknesses, and potential use cases.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
54m
Peak period
32
3-6h
Avg / period
9.3
Based on 102 loaded comments
Key moments
- 01Story posted
Nov 11, 2025 at 12:26 PM EST
about 2 months ago
Step 01 - 02First comment
Nov 11, 2025 at 1:20 PM EST
54m after posting
Step 02 - 03Peak activity
32 comments in 3-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 13, 2025 at 11:29 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Love the optimism
Local models are definitely something I want to dive into more, if only out of personal interest.
It failed Remove background, Isolate the background, long exposed (kept people), Apply a fish-eye lens effect (geometry incorrect), Strong bokeh blur (wrong blur type)
Some were more ambiguous. Give it a metallic sheen looked cool but that isn’t a metallic sheen and IMO it just failed ukiyo-e Japanese woodblock print style but I wouldn’t object to calling it a vaguely Japanese style. Compare how colors blend with ukiyo-e woodblocks vs how OpenAI‘a sky is done.
You're mostly right to criticize the fisheye - it's plausibly a fisheye image, but not one derived from the original. For bokeh, you're right that it got the mountain wrong. But it did get the other samples, and it's the only one that seems to know what bokeh is at all, as the other models got none of them (other than Seadream getting the Newton right).
For the "metallic sheen", I assume you mean where they said "give the object a metallic sheen", since the first attempt had OpenAI giving the image itself a quality as if it were printed or etched on metal, arguably correct. But for that second one, for all but the 4th sample, OpenAI did it best for mountain and rubik's cube, and no worse for cats and car. Seadream wins for the Newton.
I don't have any knowledge of the Japanese styles requested, so I'm not judging those.
I've reviewed your examples, and it hasn't changed my mind.
> I’ve reviewed your examples, and it hasn’t changed my mind.
I think I have a better understanding of your thinking, but IMO you’re using a bar so low effectively anything qualifies. “it's the only one that seems to know what bokeh is at all, as the other models got none of them (other than Seadream getting the Newton right).” For bokeh look at the original then the enlarged images on the car. OpenAI blurs the entire image car and ground fairly uniformly, where Seedream keeps the car in focus while blurring background elements including the ground when it’s far enough back. Same deal with the cats where the original has far more distant objects in the upper right which Seedream puts out of focus while keeping the cats in focus while OpenAI blurs everything.
In my mind the other models also did quite poorly in general, but when I exclude failures I don’t judge OpenAI as the winner. IE on the kaleidoscopic task OpenAI’s girl image didn’t have radial symmetry and so it simply failed the task, Gemini’s on the other hand looks worse but qualifies as a bad approximation of the task.
Right now though the software for local generation is horrible. It's a mish-mash of open source stuff with varying compatibility loaded with casually excessive use of vernacular and acronyms. To say nothing of the awkwardness of it mostly being done in python scripts.
But once it gets inevitably cleaned up, I expect people in the future are going to take being able to generate unlimited, near instantaneous images, locally, for free, for granted.
I haven't seen the latest from adobe over the last three months, but last I saw the firefly engine was still focused on "magically" creating complete elements.
So far Adobe AI tools are pretty useless, according to many professional illustrators. With Firefly you can use other (non-Adobe) image generators. The output is usually barely usable at this point in time.
There is no better recent example than AI comedy made by a professional comedian [0]
Of course, this makes sense once you think about it for a second. Even AGI, without a BCI, could not read your mind to understand what you want. Of course, the people who have been communicating these ideas with other humans up to this point, are the best at doing that.
[0] old.reddit.com/r/ChatGPT/comments/1oqnwvt/ai_comedy_made_by_a_professional_comedian/
To clarify, the “comedy” part of this “AI comedy” was written entirely by a human with no assistance from a language model.
> For anyone interested in my process. I wrote every joke myself, then use Sora 2 to animate them.
Apologies if I wrote my original comment poorly, but that was I was trying to communicate.
Not only was this person able to write good comedy, but they knew what tools were available and how to use them.
I previously wrote:
> "AI won't replace you, but someone who knows how to use AI will replace you." ...
The missing part is "But a person who was excellent at their pre-AI job, will replace ten of the people down the chain."
The possible analog that just popped into my head is the nearly always missed part of the quote "the customer is always right" ... "in matters of taste."
I think comedy is a great example of how this is not the general case.
In this instance, the video you posted was the result when a comic used a tool to make a non-living thing say their jokes.
That’s not new, that’s a prop. It’s ventriloquism. People have been doing that gag since the first crude marionette was whittled.
The existence of prop comics isn’t an indicator that that’s the pinnacle of comedy (or even particularly good). If Mitch Hedburg had Jeff Dunham’s puppets it probably would’ve been… fine, but if Jeff Dunham woke up tomorrow with Hedburg’s ability to write and deliver jokes his life and career would be dramatically changed forever.
Better dummies will benefit some ventriloquists but there’s no reason to think that this is the moment that the dummies get so good that everyone will stop watching humans and start watching ventriloquists (which is what would have to happen for one e-ventriloquist putting 10 comedians out of a job to be a regular thing)
I will be right back after I think of a reply while in the shower, many months in the future.
Photography didn’t make artists obsolete.
For that matter, the car didn’t make horse riding completely obsolete either.
For artists, the question is whether generative AI is like photography or the car. My guess, at this stage, is photography.
For what it’s worth I think the proponents of generative AI are grossly overestimating the utility and economic value of meh-OK images that approximate the thing you’ve asked for.
Maybe just the advent of the microwave oven is the analogy.
Either way, I am out. I have spent many days fiddling with AI image generation but, looking back on what I thought was 'wow' at the time, I now think everything AI art is practically useless. I only managed one image I was happy with and most of that was GIMP, not AI.
This study has confirmed my suspicions, hence I am out.
Going back to the fast food analogy, for the one restaurant that actually cooks actual food from actual ingredients, if everyone else is selling junk food then the competition has been decimated. However, the customers have been decimated too. This isn't too bad as those customers clearly never appreciated proper food in the first place, so why waste effort on them? It is a pearls and swine type of thing.
And some day the news will announce that the last human actor has died.
That aside, humans are necessary for making up new forms and styles. There was no cubism before Picasso and Braque, or pointillism before Seurat and Signac. I don’t think I’ve seen anyone argue that if you trained a diffusion model on only the art that Osamu Tezuka was exposed to before he turned 24 it would output Astro Boy.
Same will happen to music, artists etc. They won't vanish. But only a few per city will be left
I could see that changing in a few years.
To me, I like to think in times the model failed versus success. So what I did, is I looked every time at the worst result. To me, the one which stood out (negatively) is Gemini. OpenAI had some very good results but also some missing the mark. SeeDream (which I never heard of previously) missed the mark less often than Gemini, and at times where OpenAI failed, SeeDream came out clearly on top.
So, if I were to use the effects of the mentioned models, I wouldn't bother with Gemini; only OpenAI and SeeDream.
It's like OpenAI is reducing to some sort of median face a little on all of these, whereas the other two models seemed to reproduce the face.
For some things, exactly reproducing the face is a problem -- for example in making them a glass etching, Gemini seemed unwilling to give up the specific details of the child's face, even though that would make sense in that context.
Even Sam Altman's "Ghiblified" twitter avatar looks nothing like him (at least to me).
Other models seem much more able to operate directly on the input image.
This leads to incredibly efficient, dense semantic consistency because every object in an image is essentially recreated from (intuitively) an entire chapter of a book dedicated to describing that object's features.
However, it loses direct pixel reference. For some things that doesn't matter much, but humans are very discerning regarding faces.
Chatgpt is architecturally unable to reproduce exactly the input pixels - they're always encoded into tokens, then decoded. This matters more for subjects for which we are sensitive to detail loss, like faces.
Now, the difficulty is in achieving an encoding/decoding scheme that is both: information efficient AND semantically coherent in latent space. Seems like there is a tradeoff here.
Peak quality in terms of realistic color rendering was probably the initial release of DALL-E 3. Once they saw what was going to happen, they fixed that bug fast.
It absolutely did not do that on day 1.
Most DALL-E 3 images have a orange-blue cast, which is absolutely not an unintended artifact. You'd literally have to be blind to miss it, or at least color-blind. That wasn't true at first -- check the original paper, and try the same prompts! It was something they started doing not long after release, and it's hardly a stretch to imagine why.
They will be doing the same thing for the same reasons today, assuming it doesn't just happen as a side effect.
This is not my special interest though but the DIY space is much more interesting than the SaaS offerings; this is something about generative AI more generally that also holds, the DIY scene is going to be more interesting.
They noted the Gemini issue too:
> Especially with photos of people, Gemini seems to refuse to apply any edits at all
Seedream will always alter the global color balance with edits.
It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.
I’m editing mostly pics of food and beverages though, it wouldn’t surprise me if it is situationally better or worse.
[1] = https://thumbnail.ai/
It is just very hard to make any generalizations because any single prompt will lead to so many different types of images.
The only thing I would really say to generalize is every model has strengths and weaknesses depending on what you are going for.
It is also generally very hard to explore all the possibilities of a model. So many times I thought I seen what the model could do just to be completely blown away by a particular generation.
I am a small time youtuber
NanoBanana and Flux Kontext are the models that get closest to traditional SDXL inpainting techniques.
Seedream is a strong contender by virtue of its ability to natively handle higher resolutions (up to around 4 megapixels) so you lose less detail - however it also tends to alter the color palette more often then not.
Finally GPT-image-1 (yellowish filter notwithstanding) exhibits very strong prompt adherence but will almost always change a number of the details.
"consumer internet connection in Japan", "10 Gbps nominal bandwidth"
Coming from a third world country, that surprises me.
The main issue is latency and bandwidth across the oceans since Asia far away from the US where a lot of servers live, and even for services that are distributed, I live in a rural prefectural capitol of Japan 1000 km away from Tokyo where all the "Japan" data centers are, so my ping is always unimpressive despite the bandwidth.
The stock photo industry was always pretty bad and silly expensive. Being able to custom generate visuals and photos to replace that is a good use case of AI IMHO. Yes sometimes it does goofy things, but it’s getting quite good. If AI blows up the stock photo industry few will shed a tear.
• OpenAI (gpt-image-1): The wild artist. Best for creative, transformative, style-heavy edits—Ghibli, watercolor, fantasy additions, portals, sci-fi stuff, etc. But it hallucinates a lot and often distorts fine details (especially faces). Slowest.
• Gemini (flash-image / nanoBanana): The cautious realist. Best for subtle, photorealistic edits—fog, lighting tweaks, gentle filters, lens effects. Almost never ruins details, but sometimes refuses to do artsy transformations, especially on human photos.
• Seedream: The adventurous middle child. Faster, cheaper, and often surprisingly good at aesthetic effects—bokeh, low-poly, ukiyo-e, metallic sheen, etc. Not as creative as OpenAI, not as conservative as Gemini. Can hallucinate, but in fun ways.
Bottom line: • Creative prompts → OpenAI • Realistic photo edits → Gemini • Budget-friendly, balanced option → Seedream
If you’re planning an automated pipeline, routing “artistic” prompts to OpenAI and “photorealistic” ones to Gemini (with Seedream as a wildcard) matches their own conclusion.
I like giving it weird, non-prompts, like lines from songs or novels. I then run it for a few hundred generations locally and doing stuff with the malformed shit it comes out with. I have a few art projects like this.
Aphex Twin vibes.
It’s like using gen. ai to do math instead of extracting the numbers from a story and just doing the math with +, -, / and *
1 more comments available on Hacker News