(Realistically, Seedream 4 is the best at aesthetically pleasing generation, Nano Banana Pro is the best at realism and editing, and Seedream 4.5 is a very strong middleground between the two with great pricing)

gpt-image-1.5 feels like OpenAI doing the bare minimum to keep people from switching to Gemini every time they want an image.

mohsen1

17 days ago

3 replies

Unlike Nano Banana it allows generating photos of children. Always fun to ask AI to imagine children of a couple but it's also kinda concerning that there might be terrible use cases.

BoorishBears

17 days ago

I haven't seen that, meanwhile gpt-image-1.5 still has zero-tolerance policing copyright (even via the API) so it's pretty much useless in production once exposed to consumers.

I'm honestly surprised they're still on this post-Sora 2: let the consumer of the API determine their risk appetite. If a copyright holder comes knocking, "the API did it" isn't going to be a defense either way.

r053bud

17 days ago

I was able to generate photos of my imagined children via Nano Banana

hexage1814

17 days ago

If memory serves me, Nano Banana allows generating/editing photos of children. But anything that could be misinterpreted, gets blocked, even absolutely benign and innocent things (especially if you are asking to modify a photo that you upload there). So they allow, but they turn on the guardrails to a point that might not be useful in many situations.

rvz

17 days ago

1 reply

Another bunch of "startups" have been eliminated.

moralestapia

17 days ago

1 reply

Among those, Photoshop.

koakuma-chan

17 days ago

I wish. Even Nano Banana Pro still sucks for even basic operations.

alasano

17 days ago

3 replies

It's still not available in the API despite them announcing the availability.

They even linked to their Image Playground where it's also not available..

I updated my local playground to support it and I'm just handling the 404 on the model gracefully

https://github.com/alasano/gpt-image-1-playground

minimaxir

17 days ago

1 reply

It's a staggered rollout but I am not seeing it on the backend either.

joshstrange

17 days ago

> staggered rollout

It's too bad no OpenAI Engineers (or Marketers?) know that term exists. /s

I do not understand why it's so hard for them to just tell the truth. So many announcements "Available today for Plus/Pro/etc" really means "Sometime this week at best, maybe multiple weeks". I'm not asking for them to roll out faster, just communicate better.

weird-eye-issue

17 days ago

My Enterprise account got an email 1.5 hours ago that it is available in API but my other accounts haven't gotten any email yet

anonfunction

17 days ago

Yeah I just tried it and got a 500 server error with no details as to why:

  POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
    "message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
    "type": "server_error",
    "param": null,
    "code": "server_error"
  }

Interestingly if you change to request the model foobar you get an error showing this:

  POST "https://api.openai.com/v1/responses": 400 Bad Request {
    "message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
    "type": "invalid_request_error",
    "param": "tools[0].model",
    "code": "invalid_value"
  }

StarterPro

17 days ago

1 reply

In the image they showed for the new one, the mechanic was checking a dipstick...that was still in the vehicle.

I really hope everyone is starting to get disillusioned with OpenAI. They're just charging you more and more for what? Shitty images that are easy to sniff out?

In that case, I have a startup for you to invest in. Its a bridge-selling app.

czhu12

17 days ago

1 reply

Haven’t their prices stayed at $20/m for a while now?

wahnfrieden

17 days ago

1 reply

They've published anticipated price increases over coming years. Prices will rise dramatically and steadily to meet revenue targets.

cheema33

17 days ago

1 reply

AI doesn’t have much of a moat. People can and will easily switch providers.

wahnfrieden

17 days ago

1 reply

Sure but there are only a couple leading providers worth considering for coding at least, and there will be consolidation once investment pulls back. They may find a way to collude on raising prices.

Where switching will be easier is with casual chat users plus API consumers that are already using substandard models for cost efficiency.

wahnfrieden

16 days ago

Reinforced today:

As Gemini has gained competitiveness (higher confidence in its output, better reputation), its prices have steadily risen

blurbleblurble

17 days ago

4 replies

It's really weird to see "make images from memories that aren't real" as a product pitch

999900000999

17 days ago

2 replies

I can actually imagine actors selling the rights to make fake images with them.

In late stage capitalism you pay for fake photos with someone. You have chat gpt write about how you dated for a summer, and have it end with them leaving for grad school to explain why you aren't together.

Eventually we'll all just pay to live in the matrix. When your credit card is declined you'll be logged out, to awaken in a shared studio apartment. To eat your rations.

ares623

17 days ago

I can see them getting paid like residuals from TV re-runs.

But after a point it'll hit saturation point. The novelty will wear off since everyone has access to it. Who cares if you have a fake photo with a celebrity if everyone knows it's fake.

oblio

15 days ago

> When your credit card is declined you'll be logged out, to awaken in a shared studio apartment. To eat your rations.

You're funny. No, you'll awaken in a tent, next to your shopping cart, under the bridge.

impjohn

16 days ago

This is what struck me as well. I got weird undertones of 'Now you don't even need to have real memories! Just fabricate them.' They even prominently showcase edits of placing you with another person, further deepening disingenuous or parasocial relationships

nurettin

17 days ago

It would creep me out if it produced origami animals for that prompt.

kingstnap

17 days ago

It's strange to me too, but they must have done the market research for what people do with image gen.

My own main use cases are entirely textual: Programming, Wiki, and Mathematics.

I almost never use image generation for anything. However its objectively extremely popular.

This has strong parallels for me to when snapchat filters became super popular. I know lots of people loved editing and filtering pictures but I always left everything as auto mode, in fact I'd turn off a lot of the default beauty filters. It just never appealed to me.

oxag3n

17 days ago

7 replies

If this was a farm of sweatshop Photoshopers in 2010, who download all images from the internet and provide a service of combining them on your request, this would escalate pretty quickly.

Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?

Anecdotal: I had a hobby of doing photos in quite rare style and lived in a place where you'd get quite a few pictures of. When I asked gpt to generate a picture of that are in that style, it returned highly modified, but recognizable copy of a photo I've published years ago.

nobody_r_knows

17 days ago

7 replies

my question to your anecdotal: who cares? not being fecicious, but who cares if someone reproduced your stuff and millions of people see your stuff? is the money that you want? is it the fame? because fame you will get, maybe not money... but couldn't there be another way?

netule

17 days ago

3 replies

Suddenly, copyright doesn't matter anymore when it's no longer useful to the narrative.

ragequittah

17 days ago

1 reply

Copyright has overstepped its initial purpose by leaps and bounds because corporations make the law. If you're not cynical about how Copyright currently works you probably haven't been paying attention. And it doesn't take much to go from cynical to nihilist in this case.

netule

17 days ago

There's definitely a case of miscommunication at play if you didn't read cynicism into my original post. I broadly agree with you, but I'll leave it at that to prevent further fruitless arguing about specifics.

BoorishBears

17 days ago

OpenAI does care about copyright, thankfully China does not: https://imgur.com/a/RKxYIyi

CamperBob2

17 days ago

(Shrug) This is more important. Sorry.

swatcoder

17 days ago

1 reply

People have values that go beyond wealth and fame. Some people care about things like personal agency, respect and deference, etc.

If someone were on vacation and came home to learn that their neighbor had allowed some friends stay in the empty house, we would often expect some kind of outrage regardless of whether there had been specific damage or wear to the home.

Culturally, people have deeply set ideas about what's theirs, and feel like they deserve some say over how their things are used and by whom. Even those that are very generous and want their things be widely shared usually want to have have some voice in making that come to be.

visarga

17 days ago

If I were a creative I would avoid seeing any work I am not legally allowed to get inspired by, why install furniture into my brain I can't sit on?

Forgeties79

17 days ago

3 replies

As a professional cinematographer/photographer I am incredibly uncomfortable with people using my art without my permission for unknown ends. Doubly so when it’s venture backed private companies stealing from millions of people like me as they make vague promises about the capabilities of their software trained on my work. It doesn’t take much to understand why that makes me uncomfortable and why I feel I am entitled to saying “no.” Legally I am entitled to that in so many cases, yet for some reason Altman et al get to skip that hurdle. Why?

How do you feel about entities taking your face off of social media and plastering it on billboards smiling happily next to their product? What if it’s for a gun? Or condoms? Or a candidate for a party you don’t support? Pick your own example if none of those bother you. I’m sure there are things you do not want to be associated with/don’t want to contribute to.

At the end of the day it’s very gross when we are exploited without our knowledge or permission so rich groups can get richer. I don’t care if my visual work is only partially contributing to some mashed up final image. I don’t want to be a part of it.

CamperBob2

17 days ago

1 reply

The day after I first heard about the Internet, back in 1990-whatever, it occurred to me that I probably shouldn't upload anything to the Internet that I didn't want to see on the front page of tomorrow's newspaper.

Apart from the 'newspaper' anachronism, that's pretty much still my take.

Sorry, but you'll just have to deal with it and get over it.

Forgeties79

17 days ago

1 reply

> Sorry, but you'll just have to deal with it and get over it.

You were fine until this bit.

onraglanroad

17 days ago

1 reply

They're still fine because they're right.

You got to play the copyright game when the big corps were on your side.

Now they're on the other side. Deal with it and get over it.

Forgeties79

16 days ago

1 reply

You are not entitled to my work.

CamperBob2

16 days ago

1 reply

I get access to inspiration from everybody's art, and so do you. Seems like a good deal to me.

Meanwhile, the next generation of great artists is already at work down the street from you. Some kids you've never heard of, playing around in a basement or garage you've probably driven past a hundred times. They're learning to make the most of the tools at hand, just like the old masters did. Except the tools at hand this time are little short of godlike.

It's an exciting time. If you wanted things to stay the same, you shouldn't have picked technology or art.

Forgeties79

16 days ago

1 reply

Inspiring artists =/= involuntarily training privately owned LLM’s that charge for access.

If you want me to hand my work over to artists so they can learn and grow and experiment, I’ll give them access to my drives this minute. Send them my way. I have a whole system for it.

CamperBob2

16 days ago

Inspiring artists =/= involuntarily training privately owned LLM’s that charge for access.

Agreed there, which is why it's important to work for open access to the results.

The resulting regime won't look like copyright, but if we do it right, it will be better for us all.

smileson2

17 days ago

1 reply

You should be proud your work will now be distilled enterally and an aspect of your work will forever influence the world

Forgeties79

16 days ago

I’m not

vintermann

16 days ago

1 reply

> How do you feel about entities taking your face off of your personal website and plastering it on billboards smiling happily next to their product?

That would be misrepresentation. Even Stallman isn't OK with that. You can take one of his opinion pieces and publish it as your own. Or you can attach his name to it.

However, if you're editing it and releasing it under his name, clearly you're simply lying, and nobody is OK with that. People have the right to be recognized as authors of things they did author (if they so desire) and they have a right to NOT be associated with things they didn't.

> At the end of the day it’s very gross when we are exploited without our knowledge or permission so rich groups can get richer.

The second part is the entirety of the problem. If I'm "exploited" in a way where I can't even notice it, and I'm not worse off for it, how is it even exploitation? But people amassing great power is a problem no matter if they do it with "legitimate" means or not.

Forgeties79

16 days ago

1 reply

If somebody is stealing from your bank account every week and you just don’t notice it, are you not being stolen from? Has nobody stolen your credit card and used it until the moment you notice the charges?

I don’t really think we can go “if a tree fall in the forest and nobody is around to hear it…” about this.

vintermann

16 days ago

1 reply

If someone steals from my bank account I certainly CAN notice it even if I don't immediately, and I'm certainly worse off.

That's such a bad straw man I wonder if you're really supporting the position you claim to be supporting. Maybe you're just trying to give it a bad name.

Your opinion isn't on visual work, but visual property. You don't demand to be paid for your work - your labor. Rather you traded that for the dream of being paid rent on a capital object, in perpetuity (or close enough). Artists lost to the power-mongers when we bit at that bait.

Forgeties79

16 days ago

If you think that’s a bad example so be it but I’m not attempting to make a strawman or give anything a bad name.

I don’t really know where all the hostility came from in this conversation but I think it’s best if we move on.

whywhywhywhy

16 days ago

The people building the tech are extremely fussy about their work being cited and extremely protective of their models files so they themselves have massive issues with their work being used or replicated non-consensually.

jibal

17 days ago

facetious

[I won't bother responding to the rest of your appalling comment]

illwrks

17 days ago

The issue is ownership, not promotion or visibility.

oxag3n

17 days ago

To clarify my question - I do not want anything I create to be fed into their training data. That photo is just an example that I caught and it became personal. But in general I don't want to open source my code, write articles and put any effort into improving training data set.

margorczynski

17 days ago

2 replies

We are probably entering the post-copyright era. The law will follow sooner or later.

rafram

17 days ago

1 reply

That seems unlikely to me. One side is made up of lots and lots of entrenched interests with sympathetic figures like authors and artists on their side, and the other is “big tech,” dominated by the rather unsympathetic OpenAI and Google.

realharo

16 days ago

1 reply

The other side however has the "if you restrict us, China will win" argument on their side.

panopticon

16 days ago

That argument is easy to politicize and selectively ignore. See: renewables and EVs.

oblio

15 days ago

Yup, just like the post-copyright era followed the dawn of the internet and the emergence of Napster.

mortenjorck

17 days ago

1 reply

> how do I make (at least) new content protected?

Air gap. If you don’t want content to be used without your permission, it never leaves your computer. This is the only protection that works.

If you want others to see your content, however, you have to accept some degree of trade off with it being misappropriated. Blatant cases can be addressed the same as they always were, but a model overfitting to your original work poses an interesting question for which I’m not aware of any legal precedents having been set yet.

echelon

17 days ago

1 reply

Horror scenario:

Big IP holders will go nuclear on IP licensing to an extent we've never seen before.

Right now, there are thousands of images and videos of Star Wars, Pokemon, Superman, Sonic, etc. being posted across social media. All it takes is for the biggest IP conglomerates to turn into linear tv and sports networks of the past and treat social media like cable.

Disney: "Gee {Google,Meta,Reddit,TikTok}, we see you have a lot of Star Wars and Marvel content. We think that's a violation of our rights. If you want your users to continue to be able to post our media, you need to pay us $5B/yr."

I would not be surprised if this happens now that every user on the internet can soon create high-fidelity content.

This could be a new $20-30B/yr business for Disney. Nintendo, WBD, and lots of other giant IP holders could easily follow suit.

empressplay

17 days ago

1 reply

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora

https://arstechnica.com/ai/2025/12/disney-invests-1-billion-...

echelon

17 days ago

One day later, "Google pulls AI-generated videos of Disney characters from YouTube in response to cease and desist":

https://www.engadget.com/ai/google-pulls-ai-generated-videos...

The next step is to take this beyond AI generations and to license rights to characters and IP on social media directly.

The next salvo will be where YouTube has to take down all major IP-related content if they don't pay a licensing fee. Regardless of how it was created. Movie reviews, fan animations, video game let's plays.

I've got a strong feeling that day is coming soon.

pfortuny

16 days ago

I guess some kind of hard (repetitive) steganography where the private key signature of the original photo is somehow encoded lots of times; also watermarking everything and asking the reader for some kind of verification if they want their non-watermarked copy.

There seems to be no other way (apart from air-gapping everything, as others say).

ur-whale

17 days ago

> Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?

Question: Now that the steamboats have been invented, how do I keep my clipper business afloat ?

Answer: Schumpeter's Gale is around the corner, time for a new business model.

999900000999

17 days ago

A middle ground would be Chat GPT at least providing attribution.

Back in reality, you can get in line to sue. Since they have more money than you, you can't really win though.

So it goes.

LudwigNagasena

17 days ago

Using references is a standard industry practice for digital art and VFX. The main difference is that you are unable to accidentally copy a reference too close, while with AI it’s possible.

dzonga

17 days ago

2 replies

we seriously can't be burning GW of energy just to have sama in a GPT-Shirt Ad generated by A.I

impressive stuff though - as you can give it a base image + prompt.

drawnwren

17 days ago

1 reply

counterpoint: we should make energy abundant enough that it really doesn't matter if sama wants to generate gpt-shirt ads or not.

we have the capability, we just stopped making power more abundant.

iknowstuff

17 days ago

I think we can say the pause we took was reasonable once we realized the environmental impact of dumping greenhouse gases into the atmosphere but if now that can ensure further growth won’t do it, let’s make sure we restart, just clean this time.

astrange

17 days ago

It's a joke about one of his old fits.

https://x.com/coldhealing/status/1747270233306644560

KaiserPro

17 days ago

6 replies

Is there a watermarking, or some other way for normal people to tell if its fake?

PhilippGille

17 days ago

1 reply

https://help.openai.com/en/articles/8912793-c2pa-in-chatgpt-...

It doesn't mention the new model, but it's likely the same or similar.

adrian17

17 days ago

I just checked several of the files uploaded to the news post, the "previous" and "new", both the png and webp (&fm=webp in url) versions - none had the content metadata. So either the internal version they used to generate them skipped them, or they just stripped the metadata when uploading.

mmh0000

17 days ago

1 reply

I know OpenAI watermarks their stuff. But I wish they wouldn't. It's a "false" trust.

Now it means whoever has access to uncensored/non-watermarking models can pass off their faked images as real and claim, "Look! There's no watermark, of course, it's not fake!"

Whereas, if none of the image models did watermarking, then people (should) inherently know nothing can be trusted by default.

pbmonster

16 days ago

Yeah, I'd go the other way. Camera manufacturers should have the camera cryptographically sign the data from the sensor directly in hardware, and then provide an API to query if a signed image was taken on one of their cameras.

Add an anonymizing scheme (blind signatures or group signatures), done.

mnorris

17 days ago

1 reply

[delayed]

KaiserPro

17 days ago

Exif isn't all that robust though.

I suppose I'm going to have to bite the bullet and actually train an AI detector that works roughly in real time.

laurent123456

17 days ago

1 reply

There are ways to tell if an image is real, if it's been signed cryptographically by the camera for example, but increasingly it probably won't be possible to tell if something is fake. Even if there's some kind of hidden watermark embedded in the pixels, you can process it with img2img in another tool and get rid of the watermark. Exif data, etc is irrelevant, you can get rid of it easily or fake it.

ewoodrich

17 days ago

Sure, you can always remove it, but the average person posting AI images on Facebook or whatever probably won't bother. I was skeptical of Google's SynthID when I first heard about it but I've seen it used to identify suspected AI images on Reddit recently (the example I saw today was cropped and lightly edited but still got flagged correctly) and it's cool to have a hard data point when it's present. It won't help with bad/manipulative actors but a decent mitigation for the low effort slop scenario.

qingcharles

16 days ago

Not if you strip the EXIF data. Also, it will strip the star watermark and SynthID from Gemini if you paste a Nano Banana pic in and tell it to mirror it.

wavemode

17 days ago

I think society is going to need the opposite - cameras that can embed data in the pixels of a video which indicate the image is real.

zkmon

17 days ago

1 reply

AI-generated images would remove all the trust and admire for human talent in art, similar to how text-generation would remove trust and admire for human talent in writing. Same case for coding.

So, let's simulate that future. Since no one trusts your talent in coding, art or writing, you wouldn't care to do any of these. But the economy is built on the products and services which get their value based how much of human talent and effort is required to produce them.

So, the value of these services and products goes down as demand and trust goes down. No one knows or cares who is a good programmer in the team, who is great thinker and writer and who is a modern Picasso.

So, the motivation disappears for humans. There are no achievements to target, there is no way to impress others with your talent. This should lead to uniform workforce without much difference in talents. Pretty much a robot army.

arnz-arnz

17 days ago

1 reply

all I can hope for is that a new industry or reliable ecosystem of vetters of real human talent will emerge. Are you really as good a writer as you claim to be? Show us the badge. That or AI firms have to be forced to 'watermark' all their creative outputs.

zkmon

16 days ago

1 reply

Both are just mid-summer dreams. There is no global law to enforce watermark. There are no badges that can't be forged.

arnz-arnz

16 days ago

There isn't but that doesn't mean there won't be. It can even go as far as banning certain features. There isn't just hope with the kind of politics we have right now.

gs17

17 days ago

2 replies

> Still some scientific inaccuracies, but ~70% correct

That's still dangerously bad for the use-case they're proposing. We don't need better looking but completely wrong infographics.

rcarmo

17 days ago

We don’t, but most Marketing departments salivate for them.

astrange

17 days ago

It's pretty common for infographics to be wrong. The people making them aren't the same people who know the facts.

I'd especially say like 100% of amateur political infographics/memes are wrong. ("climate change is caused by 100 companies" for instance)

agentifysh

17 days ago

2 replies

I am very impressed a benchmark I like to run is have it create sprite maps, uv texture maps for an imagined 3d model

Noticed it captured a megaman legends vibe ....

https://x.com/AgentifySH/status/2001037332770615302

gs17

17 days ago

2 replies

> however im not sure if these are true uv maps

I can tell you with 100% certainty they are not. For example, Crash doesn't have a backside for his torso. You could definitely make a model that uses these as textures, but you'd really have to force it and a lot of it would be stretched or look weird. If you want to go this approach, it would make a lot more sense to make a model, unwrap it, and use the wireframe UV map as input.

Here's the original Crash model: https://models.spriters-resource.com/pc_computer/crashbandic... , its actual texture is nothing like the generated one, because the real one was designed for efficiency.

Nition

17 days ago

1 reply

Most of Crash was not textured at all; just vertex colours. IIRC only the fur on his back is a texture at all.

gs17

17 days ago

"Original" as in the original of the one they used in their tweet.

agentifysh

17 days ago

yeah definitely impressive compared to what nano banana outputted

tried your suggested approach by unwrapaped wireframe uv as input and im impressed

https://x.com/AgentifySH/status/2001057153235222867

obviously its not going to be accurate 1:1 but with more 3d spatial awareness i think it could definitely improve

101008

17 days ago

> however im not sure if these are true uv maps that is accurate as i dont have the 3d models itself

also in the tweet

> GPT Image 1.5 is **ing crazy

and

> holy shit lol

what's impressive if you don't know if it's right or not (as the other comment pointed out, it is not right)

ares623

17 days ago

2 replies

My copium is that analog photography makes a come back as a way to recover some level of trust and authenticity.

Forgeties79

17 days ago

1 reply

Good luck getting it developed unfortunately. I have to ship it off now, there isn’t a single local spot in my city that will develop anymore

ares623

17 days ago

1 reply

When the demand is back, the labs should start coming back. There's a few in my relatively small city which is pretty surprising. But the costs are still too high to cover the low volume I guess.

Forgeties79

16 days ago

The big issue is chemical disposal IIRC (which yes is a cost just being more specific)

famahar

17 days ago

I was reading a trend report on art and it seems like collage, squiggly hand drawn text, and lots of intentional imperfections are becoming popular. I'm not sure how hard it is for AI to recreate those, but it is nice to see people trying to do more of what AI struggles with.

vunderba

17 days ago

16 replies

Okay results are in for GenAI Showdown with the new gpt-image 1.5 model for the editing portions of the site!

https://genai-showdown.specr.net/image-editing

Conclusions

- OpenAI has always had some of the strongest prompt understanding alongside the weakest image fidelity. This update goes some way towards addressing this weakness.

- It's leagues better at making localized edits without altering the entire image's aesthetic than gpt-image-1, doubling the previous score from 4/12 to 8/12 and the only model that legitimately passed the Giraffe prompt.

- It's one of the most steerable models with a 90% compliance rate

Updates to GenAI Showdown

- Added outtakes sections to each model's detailed report in the Text-to-Image category, showcasing notable failures and unexpected behaviors.

- New models have been added including REVE and Flux.2 Dev (a new locally hostable model).

- Finally got around to implementing a weighted scoring mechanism which considers pass/fail, quality, and compliance for a more holistic model evaluation (click pass/fail icon to toggle between scoring methods).

If you just want to compare gpt-image-1, gpt-image-1.5, and NB Pro at the same time:

https://genai-showdown.specr.net/image-editing?models=o4,nbp...

echelon

17 days ago

1 reply

I really love everything you're doing!

Personal request: could you also advocate for "image previz rendering", which I feel is an extremely compelling use case for these companies to develop. Basically any 2d/3d compositor that allows you to visually block out a scene, then rely on the model to precisely position the set, set pieces, and character poses.

Here are some examples:

gpt-image-1 absolutely excels at this:

https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh

Nano Banana (Pro) fails at this task:

https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd

Flux Kontext, Qwen, etc. have mixed results.

I'm going to re-run these under gpt-image-1.5 and report back.

vunderba

17 days ago

Thanks! A highly configurable Previz2Image model would be a fantastic addition. I was literally just thinking about this the other day (but more in the context of ControlNets and posable kinematic models). I’m even considering adding an early CG Poser blocked‑out scene test to see how far the various editor models can take it.

With additions like structured prompts (introduced in BFL Flux 2), maybe we'll see something like this in the near future.

irishcoffee

17 days ago

1 reply

> the only model that legitimately passed the Giraffe prompt.

10 years ago I would have considered that sentence satire. Now it allegedly means something.

Somehow it feels like we’re moving backwards.

echelon

17 days ago

5 replies

> Somehow it feels like we’re moving backwards.

I don't understand why everyone isn't in awe of this. This is legitimately magical technology.

We've had 60+ years of being able to express our ideas with keyboards. Steve Jobs' "bicycle of the mind". But in all this time we've had a really tough time of visually expressing ourselves. Only highly trained people can use Blender, Photoshop, Illustrator, etc. whereas almost everyone on earth can use a keyboard.

Now we're turning the tide and letting everyone visually articulate themselves. This genuinely feels like computing all over again for the first time. I'm so unbelievably happy. And it only gets better from here.

Every human should have the ability to visually articulate themselves. And it's finally happening. This is a major win for the world.

I'm not the biggest fan of LLMs, but image and video models are a creator's dream come true.

In the near future, the exact visions in our head will be shareable. We'll be able to iterate on concepts visually, collaboratively. And that's going to be magical.

We're going to look back at pre-AI times as primitive. How did people ever express themselves?

Rodeoclash

17 days ago

1 reply

Where is all this wonderful visual self expression that people are now free to do? As far as I can tell it's mostly being used on LinkedIn posts.

scrollaway

17 days ago

It’s a classic issue that you give access to superpowers to the general population and most will use them in the most boring ways.

The internet is an amazing technology, yet its biggest consumption is a mix of ads, porn and brain rot.

We all have cameras in our pockets yet most people use them for selfies.

But if you look closely enough, the incredible value that comes from these examples more than makes up for all the people using them in a “boring” way.

And anyway who’s the arbiter of boring?

concats

16 days ago

1 reply

“I've come up with a set of rules that describe our reactions to technologies:

1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.

2. Anything that's invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.

3. Anything invented after you're thirty-five is against the natural order of things.”

― Douglas Adams

vintermann

16 days ago

Is that how it works this time, though?

* I'm into genealogy. Naturally, most of my fellow genealogists are retired, often many years ago, though probably also above average in mental acuity and tech-savviness for their age. They LOVE generative AI.

* My nieces, and my cousin's kids of the same age, are deeply into visual art. Especially animation, and cutesy Pokemon-like stuff. They take it very seriously. They absolutely DON'T like AI art.

irishcoffee

17 days ago

You basically described magic mushrooms, where the description came from you while high on magic mushrooms.

It’s just a tool. It’s not a world-changing tech. It’s a tool.

conradfr

15 days ago

It is amazing and impressive. But also a unlimited source of trash and slop on my internet use.

SchemaLoad

17 days ago

I'm struggling to see the benefits. All I see people using this for is generating slop for work presentations, and misleading people on social media. Misleading might be understating it too. It's being used to create straight up propaganda and destruction of the sense of reality.

pierrec

17 days ago

3 replies

This showdown benchmark was and still is great, but an enormous grain of salt should be added to any model that was released after the showdown benchmark itself.

Maybe everyone has a different dose of skepticism. Personally I'm not even looking at results for models that were released after the benchmark, for all this tells us, they might as well be one-trick ponies that only do well in the benchmark.

It might be too much work, but one possible "correct" approach for this kind of benchmark would to periodically release new benchmarks with new tests (that are broadly in the same categories) and only include models that predate the benchmark.

vunderba

17 days ago

1 reply

Yeah that’s a classic problem, and it's why good tests are such closely guarded secrets: to keep them from becoming training fodder for the next generation of models.

A few weeks ago I actually added some new, more challenging tests to the GenAI Text-to-Image section of the site (the “angelic forge” and “overcrowded flat earth”) just to keep pace with the latest SOTA models.

In the next few weeks, I’ll be adding some new benchmarks to the Image Editing section as well~~

echelon

17 days ago

The Blender previz reskin task could be automated!

Generate a novel previz scene programatically in Blender or some 3D engine, then task the image model with rendering it in a style (or to style transfer to a given image, eg. something novel and unseen from Midjourney).

Throw in a 250 object asset pack and some skeletal meshes that can conform to novel poses.

Furthermore, anything that succeeds from that task can then be fed into another company's model and given an editing task.

smusamashah

17 days ago

1 reply

I think training image models to pass these very specific tests correctly will be very difficult for any of these companies. How would they even do that?

8n4vidtmkvmk

17 days ago

1 reply

Hire a professional Photoshop artist to manually create the "correct" images and then put the before and after photos into the training data. Or however they've been training these models thus far, i don't know.

And if that still doesn't get you there, hash the image inputs to detect if its one of these test photos and then run your special test-passer algo.

smusamashah

15 days ago

1 reply

I don't think a few images done by any professional will have a measurable impact in training.

hdjrudni

15 days ago

I'm sure there's a way for them to give enough weight if they really cared enough. I don't think they should or would, but they could stuff the training data with thousands of slight variations if they wanted to or manually give them more importance. This might adversely affect everything else, but that's another story.

somenameforme

17 days ago

You don't need skepticism, because even if you're acting in 100% good faith and building a new model, what's the first thing you're going to do? You're going to go look up as many benchmarks as you can find and see how it does on them. It gives you some easy feedback relative to your peers. The fact that your own model may end up being put up against these exact tests is just icing.

So I don't think there's even a question of whether or not newer models are going to be maximizing for benchmarks - they 100% are. The skepticism would be in how it's done. If something's not being run locally, then there's an endless array of ways to cheat - like dynamically loading certain LoRAs in response to certain queries, with some LoRAs trained precisely to maximize benchmark performance. Basically taking a page out of the car company playbook in response to emissions testing.

But I think maximizing the general model itself to perform well on benchmarks isn't really unethical or cheating at all. All you're really doing there is 'outsourcing' part of your quality control tests. But it simultaneously greatly devalues any benchmark, because that benchmark is now the goal.

smusamashah

17 days ago

1 reply

Z-image was released recently and that's what /r/StableDiffusion all talks about these days. Consider adding that too. It is very good quality for its size (Requires only 6 or 8 gigs of ram).

vunderba

17 days ago

I've actually done a bit of preliminary testing with ZiT. I'm holding off on adding it to the official GenAI site until the base and edit models have been released since the Turbo model is pretty heavily distilled.

https://mordenstar.com/other/z-image-turbo

heystefan

17 days ago

1 reply

So when you say "X attempts" what does that mean? You just start a new chat with the same exact prompt and hope for a different result?

vunderba

17 days ago

All images are generated using independent, separate API calls. See the FAQ at the bottom under “Why is the number of attempts seemingly arbitrary?” and “How are the prompts written?” for more detail, but to quickly summarize:

In addition to giving models multiple attempts to generate an image, we also write several variations of each prompt. This helps prevent models from getting stuck on particular keywords or phrases, which can happen depending on their training data. For example, while “hippity hop” is a relatively common name for the ball-riding toy, it’s also known as a “space hopper.” In some cases, we may even elaborate and provide the model with a dictionary-style definition of more esoteric terms.

This is why providing an “X Attempts” metric is so important. It serves as a rough measure of how “steerable” a given model is - or put another way how much we had to fight with the model in order for it to consistently follow the prompt’s directives.

KeplerBoy

16 days ago

3 replies

"Remove all the trash from the street and sidewalk. Replace the sleeping person on the ground with a green street bench. Change the parking meter into a planted tree."

What a prompt and image.

__alexs

16 days ago

1 reply

Looking forward to the first AR glasses to include live editing of the world like this.

nisegami

16 days ago

How long until this shows up in a YC batch?

walrus01

16 days ago

I've already seen images on the MLS uploaded by real estate agents that look like this is the same concept as what they've been doing, generally, to bait people into coming and touring houses.

imdsm

16 days ago

A way it could be...

boredhedgehog

16 days ago

1 reply

I disagree with gpt-image-1.5's grade on the worm sign. It moved some of the marks around to accommodate the enlarged black area, but retained the overall appearance of the sign.

vunderba

16 days ago

I can see how you'd come to that conclusion. Each prompt is supposed to illustrate a different type of test criteria. The ultimate goal of Worm Sign is intended to test a near 100% retention of the original weathered/dented sign.

If you look at the ones that passed (Flux.2 Pro, Gemini 2.5 Flash, Reve), you'll see that they did not add/subtract/move any of the pockmarks from the original image.

quietbritishjim

16 days ago

1 reply

Absolutely fabulous work.

Ludicrously unnecessary nit for "Remove all the brown pieces of candy from the glass bowl":

> Gemini 2.5 Flash - 18 attempts - No matter what we tried, Gemini 2.5 Flash always seemed to just generate an entirely new assortment of candies rather than just removing the brown ones.

The way I read the prompt, it demands that the candies should change arrangement. You didn't say "change the brown candies to a different color", you said "remove them". You can infer from the few brown ones that you can see that there are even more underneath - surely if you removed them all (even just by magically disappearing them) then the others would tumble down into a new location? The level of the candies is lower than before you started, which is what you'd expect if you remove some. Maybe it's just coincidence, but maybe this really was its reasoning. (It did unnecessarily remove the red candy from the hand though.)

I don't think any of the "passes" did as well as this, including Gemini 3.0 Pro Image. Qwen-Image-Edit did at least literally remove one of the three visible brown candies, but just recolored the other two.

vunderba

16 days ago

That is a great point! Since we are moving towards better "world models" in terms of these multimodal models, you could reasonably argue that if the directive was to physically remove the candy that in the process of doing so, gravity/physics could affect the positioning of other objects.

You will note that the Minimum Passing Criteria allows for a color change in order to pass the prompt but with the rapid improvements in generative models, I may revise this test to be stricter, only allowing "Removal" to be considered as pass as opposed to a simple color swap.

Bombthecat

16 days ago

1 reply

I can't click the compliance info button on mobile. The text shows for half a second and then vanishes. Long press just marks the text for copy paste.

vunderba

15 days ago

Hey bombthecat - thanks for pointing this out. I had some poor mobile browser detection that was causing this issue. It should be fixed now.

singhkays

17 days ago

GPT Image 1.5 is the first model that gets close to replicating the intricate detail mosaic of bullets in the "Lord of War" movie poster for me. Following the prompt instructions more closely also seems better compared to Nano Banana Pro.

I edited the original "Lord of War" poster with a reference image of Jensen and replaced bullets with GPU dies, silicon wafers and electronic components.

https://x.com/singhkays/status/2001080165435113791

mvkel

17 days ago

This leaderboard feels incredibly accurate given my own experience.

BoredPositron

17 days ago

Nano Banana has still the best VAE we have seen especially if you are doing high res production work. The flux2 comes close but gpt image is still miles away.

llmthrow0827

16 days ago

It failed my benchmark of a photo of a person touching their elbows together.

nicpottier

16 days ago

Love this benchmark, always the first place I look. Also seems like it is time to move the goalposts, not sure we are getting enough resolution between models anymore.

Out of curiosity why does gemini get gold for the poker example but gpt-image 1.5 does not? I couldn't see a difference between the two.

leumon

16 days ago

One other test you could add is generating a chessboard from a FEN. I was surprised to see NBP able to do that (however, it seems to only work with fewer pieces, after a certain amount it makes mistakes or even generates a completely wrong image) https://files.catbox.moe/uudsyt.png

lobochrome

17 days ago

Stupid Cisco Umbrella is blocking you

smlavine

17 days ago

2 replies

This is terrifying. Truth is dead.

teaearlgraycold

16 days ago

Eventually phone manufacturers will be forced to become arbiters of truth with signed images and videos.

WhyOhWhyQ

17 days ago

Makes you wonder what's really meant when we talk about progress.

surrTurr

17 days ago

not super impressed. feels like 70% as good as nano banana pro.

sfmike

17 days ago

Hope to see more "red alert" status from the ai wars putting companies into al hands on deck. This is only helping cost of tokens and efficacy. As always competition only helps the end users.

celeryd

17 days ago

If it can't generate non-sexual content of a woman in a bikini, I am not interested.

aziis98

17 days ago

I know this is a bit out of scope for these image editing models but I always try this experiment [1] of drawing a "random" triangle and then doing some geometric construction and they mess up in very funny ways. These models can't "see" very well. I think [2] is still very relevant.

[1]: https://chatgpt.com/share/6941c96c-c160-8005-bea6-c809e58591...

[2]: https://vlmsareblind.github.io/

anonfunction

17 days ago

So the announcement said the API works with the new model, so I updated my Golang SDK grail (https://github.com/montanaflynn/grail) to use but it returns a 500 server error when you try to use it, and if you change to a completely unknown model it's not listed in the available models:

  POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
    "message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
    "type": "server_error",
    "param": null,
    "code": "server_error"
  }

  POST "https://api.openai.com/v1/responses": 400 Bad Request {
    "message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
    "type": "invalid_request_error",
    "param": "tools[0].model",
    "code": "invalid_value"
  }

nycdatasci

17 days ago

@OpenAI. I ran this through GPT-5.2 Pro for you. Why aren't you drinking the kool-aid yet? How did you miss "what matter" vs. "what matters" ?

------------------

Suggested remediation checklist (practical, fast wins):

1. Fix copy: “what matter” → “what matters.”

2. Standardize model naming: one marketing name + code-form API ids (gpt-image-1.5).

3. Clarify rollout: split model vs UI experience availability ("all users" vs. Business & Enterprise language)

4. Fix concatenated tab labels / logo lists for accessibility.

5. Correct mis-labeled input asset metadata (input-3 labeled as input-2).

6. Reconsider the “~70% correct” deep-sea poster as a flagship improvement example (or add clearer framing).

7. Add a disclaimer for real-person-name creative examples.

nycdatasci

17 days ago

I asked GPT-5.2 Pro to review the release:

Highest-impact issues to fix

1) Clear copy editing error in a major section header

The section header reads “Precise edits that preserve what matter”—it should almost certainly be “what matters.” This appears both in the table of contents and the body header, so it’s high-visibility.

Why it matters: This is the kind of basic grammar error that undermines trust in the rest of the claims, especially in a product announcement.

Fix: Update the heading and TOC anchor text site-wide.

gostsamo

17 days ago

Alt text is one of the nicest uses for ai and still Open AI didn't bother using it for something so basic. The dogfooding is not strong with their marketing team.

brador

17 days ago

Every person in every picture in their examples is white except for 1 Asian dude. Like a 46:1 ratio for the page (I counted). Not one Middle Eastern or Black or Jewish or Indian or South American person.

Not even one.

Very weird.

0dayman

17 days ago

nah Nano Banana Pro is much better

neom

17 days ago

Anyone else have issues verifying with openai? I always get a "congrats you're done" screen with a green checkmark from Persona, nothing to click, and my account stays unverified.

mingabunga

17 days ago

Did an experiment to give a software product a dark theme. Gave Both (GPT and Gemini/Nano) a screenshot of the product and an example theme I found on Dribbble. Gemini/Nano did a pretty average job, only applying some grey to some of the panels. I tried a few different examples and similar output. GPT did a grreat job and themed the whole app and made it look great. I think I'd still need a designer to finesse some things though.

90 more comments available on Hacker News

Resources