Gemini 2.5 Flash Image
Key topics
The debate around Google's new Gemini 2.5 Flash image model is heating up, with users testing its limits by asking it to generate sensitive images, such as a "1920s Nazi officer." The model politely declines, citing its inability to create realistic human images, sparking a lively discussion about its censorship and potential workarounds. Some commenters are poking fun at the model's evasive responses, while others are speculating about the possibility of "uncensoring" it once its weights are released. As one commenter cheekily suggested, a future uncensored model might even become an "OnlyFans model" – dubbed "BigBanana."
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
25m
Peak period
84
0-3h
Avg / period
14.5
Based on 160 loaded comments
Key moments
- 01Story posted
Aug 26, 2025 at 10:01 AM EDT
5 months ago
Step 01 - 02First comment
Aug 26, 2025 at 10:26 AM EDT
25m after posting
Step 02 - 03Peak activity
84 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 28, 2025 at 8:44 AM EDT
5 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
""" Unfortunately, I can't generate images of people. My purpose is to be helpful and harmless, and creating realistic images of humans can be misused in ways that are harmful. This is a safety policy that helps prevent the generation of deepfakes, non-consensual imagery, and other problematic content.
If you'd like to try a different image prompt, I can help you create images of a wide range of other subjects, such as animals, landscapes, objects, or abstract concepts. """
"Unfortunately I'm not able to generate images that might cause bad PR for Alphabet(tm) or subsidiaries. Is there anything else I can generate for you?"
https://www.reddit.com/r/LocalLLaMA/comments/1mx1pkt/qwen3_m...
It’s possible that they relaxed the safety filtering to allow humans but forgot to update the error message.
https://postimg.cc/xX9K3kLP
...
https://en.m.wikipedia.org/wiki/Sturmabteilung
https://developers.googleblog.com/en/introducing-gemini-2-5-...
It seems like this is 'nano-banana' all along
For people like me that don’t know what nano-banana is.
I thought Medium was a stuck up blogging platform. Other than for paid subscriptions, why would they pay bloggers? Are they trying to become the next HuffPost or something?
"Banana" would be a nice name for their AI, and they could freely claim it's bananas.
Definitely inferior to results I see on AI Studio and image generation time is 6s on AI Studio vs 30 seconds on Fal.AI
Quality or latency?
Flash Image is an image (and text) predicting large language model. In a similar fashion to how trained LLMs can manipulate/morph text, this can do that for images as well. Things like style transfer, character consistency etc.
You can communicate with it in a way you can't for imagen, and it has a better overall world understanding.
Gemini Flash Image: ChatGPT image, but by Google
This is why I'm sticking mostly to Adobe Photoshop's AI editing because there are no restrictions in that regard.
1. Reduce article to a synopsis using an LLM
2. Generate 4-5 varying description prompts from the synopsis
3. Feed the prompts to an imagegen model
Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.
The response was a summary of the article that was pretty good, along with an image that dagnabbit, read the assignment.
I have to say while I'm deeply impressed by these text to image models, there's a part of me that's also wary of their impact. Just look at the comments beneath the average Facebook post.
It survives a lot of transformation like compression, cropping, and resizing. It even survives over alterations like color filtering and overpainting.
Now is that so bad?
Don’t Pay This AI Family To Write You a Song - https://www.youtube.com/watch?v=u-DDHSfBBeo
Without going into detail, basically the task boils down to, "generate exactly image 1, but replace object A with the object depicted in image 2."
Where image 2 is some front-facing generic version, ideally I want the model to place this object perfectly in the scene, replacing the existing object, that I have identified ideally exactly by being able to specify its position, but otherwise by just being able to describe very well what to do.
For models that can't accept multiple images, I've tried a variation where I put a blue box around the object that I want to replace, and paste the object that I want it to put there at the bottom of the image on its own.
I've tried some older models, and ChatGPT, also qwen-image last week, and just now, this one. They all fail at it. To be fair, this model got pretty damn close, it replaced the wrong object in the scene, but it was close to the right position, and the object was perfectly oriented and lit. But it was wrong. (Using the bounding box method.. it should have been able to identify exactly what I wanted to do. Instead it removed the bounding box and replaced a different object in a different but close-by position.)
Are there any models that have been specifically trained to be able to infill or replace specific locations in an image with reference to an example image? Or is this just like a really esoteric task?
So far all the in-filling models I've found are only based on text inputs.
where you stitch two images together, one is the working image (the one you want to modify), and the other one is the reference image, you then instruct the model what to do. I'm guessing this approach is as brittle as the other attempts you've tried so far, but I thought this seemed like an interesting approach.
Sorry, there seems to be an error. Please try again soon.”
Never thought I would ever see this on a google owned websites!
Really? Google used to be famous not only for its errors, but for its creative error pages. I used to have a google.com bookmark that would send an animated 418.
Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111
There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.
No it's not.
We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".
Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.
Flux Kontext and Qwen are also possible to fine tune and run locally.
Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.
We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.
It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.
Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.
My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.
Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.
Flux Kontext is an editing model, but the set of things it can do is incredibly limited. The style of prompting is very bare bones. Qwen (Alibaba) and SeedEdit (ByteDance) are a little better, but they themselves are nowhere near as smart as Gemini 2.5 Flash or gpt-image-1.
Gemini 2.5 Flash and gpt-image-1 are in a class of their own. Very powerful instructive image editing with the ability to understand multiple reference images.
> Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.
Both gpt-image-1 and Gemini 2.5 Flash feel like "Comfy UI in a prompt", but they're still nascent capabilities that get a lot wrong.
When we get a gpt-image-1 with Midjourney aesthetics, better adherence and latency, then we'll have our "GPT 4" moment. It's coming, but we're not there yet.
They need to learn more image editing tricks.
I feel like most of the people on HN are paying attention to LLMs and missing out on all the crazy stuff happening with images and videos.
LLMs might be a bubble, but images and video are not. We're going to have entire world simulation in a few years.
It's not even close. https://twitter.com/fareszr/status/1960436757822103721
It took me a LOT of time to get things right, but if I was to get an actual studio to make those images, it would have cost me a thousands of dollars
But flash 2.5? Worked! It did it, crazy stuff
https://imgur.com/a/internet-DWzJ26B
Anyone can make images and video now.
- Midjourney (background)
- Qwen Image (restyle PG)
- Gemini 2.5 Flash (editing in PG)
- Gemini 2.5 Flash (adding YC logo)
- Kling Pro (animation)
I didn't spend too much time correcting mistakes.
I used a desktop model aggregation and canvas tool that I wrote [1] to iterate and structure the work. I'll be open sourcing it soon.
[1] https://getartcraft.com
I couldn't get the 3d thing to do much. I had assets in the scene but I couldn't for the life of me figure out how to use the move, rotate or scale tools. And the people just had their arms pointing outward. Are you supposed to pose them somehow? Maybe I'm supposed to ask the AI to pose them?
Inpainting I couldn't figure out either... It's for drawing things into an existing image (I think?) but it doesn't seem to do anything other than show a spinny thing for awhile...
I didn't test the video tool because I don't have a midjourney account.
(But yeah, some got a generator attached...)
The old top of the game is available to more people (though mid level people trying to level up now face a headwind in a further decoupling of easily read signals and true taste, making the old way of developing good taste harder).
This stuff makes people who were already "master rate" who are also nontrivially sophisticated machine learning hobbyists minimum and drives their peak and frontier out, drives break even collaboration overhead down.
It's always been possible to DIY code or graphic design, it's always been possible to tell the efforts of dabblers and pros apart, and unlike many commodities? There is rarely a "good enough". In software this is because compute is finite and getting more out of it pays huge, uneven returns, in graphic design its because extreme quality work is both aesthetically pleasing as well as a mark of quality (imperfect but a statement someone will commit resources).
And it's just hard to see it being different in any field. Lawyers? Opposing counsel has the best AI, your lawyer better have it too. Doctors? No amount of health is "enough" (in general).
I really think HN in particular but to some extent all CNBC-adjacent news (CEO OnlyFans stuff of all categories) completely misses the forest (the gap between intermediate and advanced just skyrocketed) for the trees (space-filling commodity knowledge work just plummeted in price).
But "commodity knowledge work" was always kind of an oxymoron, David Graeber called such work "bullshit jobs". You kinda need it to run a massive deficit in an over-the-hill neoliberal society, it's part of the " shift from production to consumption" shell game. But it's a very recent, very brief thing that's already looking more than wobbly. Outside of that? Apprentices, journeymen, masters is the model that built the world.
AI enables a new even more extreme form of mastery, blurs the line between journeyman and dabbler, and makes taking on apprentices a much longer-term investment (one of many reasons the PRC seems poised to enjoy a brief hegemony before demographics do in the Middle Kingdom for good, in China, all the GPUs run Opus, none run GPT-5 or LLaMA Behemoth).
The thing I really don't get is why CEOs are so excited about this and I really begin to suspect they haven't as a group thought it through (Zuckerberg maybe has, he's offering Tulloch a billion): the kind of CEO that manages a big pile of "bullshit jobs"?
AI can do most of their job today. Claude Opus 4.1? It sounds like if a mid-range CEO was exhaustively researched and gaff immune. Ditto career machine politicians. AI non practitioner prognosticators. That crowd.
But the top graphic communications people and CUDA kernel authors? Now they have to master ComfyUI or whatever and the color theory to get anything from it that stands out.
This is not a democratizing thing. And I cannot see it accruing to the Zuckerberg side of the labor/capital divvy up without a truly durable police state. Zuck offering my old chums nation state salaries is an extreme and likely transitory thing, but we know exactly how software professional economics work when it buckets as "sorcery" and "don't bother": that's 1950 to whenever we mark the start of the nepohacker Altman Era, call it 2015. In that world good hackers can do whatever they want, whenever they want, and the money guys grit their teeth. The non-sorcery bucket has paper mache hack-magnet hackathon projects in it at a fraction of the old price. So disruption, wow.
Whether that's good or bad is a value judgement I'll save for another blog post (thank you for attending my TED Talk).
Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.
Still needs more RLHF tuning I guess? As the previous version was even worse.
I didn't see it at first sight but it certainly is not the same jacket. If you use that as an advertisement, people can sue you for lying about the product.
But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.
Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?
Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.
Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.
“Nano banana” is probably good, given its score on the leaderboard, but the examples you show don't seem particularly impressive, it looks like what Flux Kontext or Qwen Image do well already.
Edit: the blog post is now loading and reports "1290 output tokens per image" even though on the AI studio it said something different.
Hope they get API issues resolved soon.
315 more comments available on Hacker News