Qwen-Image-Layered: Transparency and Layer Aware Open Diffusion Model

Posted15 days agoActive13 days ago

dvrp

127 points

21 comments

huggingface.coResearchstory

informativeneutral

Debate

20/100

AI ResearchAI Image Generation

Key topics

AI Research

AI Image Generation

The AI world is abuzz with Qwen-Image-Layered, a groundbreaking diffusion model that brings transparency and layer awareness to the table, setting it apart from other top-notch models like Flux and ChatGPT. As commenters dug in, they discovered that the model outputs a PowerPoint file containing decomposed layer images, sparking a mix of amusement and curiosity. While some were perplexed by the choice of format, others clarified that the GitHub repo includes a script to generate the PPTX file, allowing for flexible editing and layer manipulation. This innovative model is generating interest due to its open-weight and Apache-licensed nature, making it a standout in the AI landscape.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

17h

Peak period

18-21h

Avg / period

2.6

Comment distribution21 data points

Loading chart...

Based on 21 loaded comments

Key moments

01Story posted
Dec 18, 2025 at 10:24 PM EST
15 days ago
Step 01
02First comment
Dec 19, 2025 at 3:19 PM EST
17h after posting
Step 02
03Peak activity
4 comments in 18-21h
Hottest window of the conversation
Step 03
04Latest activity
Dec 20, 2025 at 4:34 PM EST
13 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (21 comments)

Showing 21 comments

dvrpAuthor

14 days ago

1 reply

- Model page: https://huggingface.co/Qwen/Qwen-Image-Layered

- Quantized model page: https://huggingface.co/QuantStack/Qwen-Image-Layered-GGUF

- Blog URL: https://qwenlm.github.io/blog/qwen-image-layered/ (404 at the time of writing this comment, but it'll probably release soon)

- GitHub page: https://github.com/QwenLM/Qwen-Image-Layered

smusamashah

13 days ago

Article link https://qwen.ai/blog?id=qwen-image-layered

SV_BubbleTime

14 days ago

1 reply

I’m still not clear if it’s going to deliver the unique layers to you?

If you set a variable layers of 5 for example will it determine what is on each layer, or do I need to prompt that?

And I assume you need enough VRAM because each layer will be effectively a whole image in pixel or latent space… so if I have a 1MP image, and 5 layers I would likely need to be able to fit a 5MP image in VRAM?

Or if this can be multiple steps, where I wouldn’t need all 5 layers in active VRAM, that the assembly is another step at the end after generating on one layer?

jamilton

14 days ago

2 replies

The linked GitHub readme says it outputs a powerpoint file of the layers.

Llamamoe

14 days ago

2 replies

...of all the possible formats, it outputs.. a powerpoint presentation..? What.

djfobbz

14 days ago

1 reply

Lol, right?!?! I would've expected sequential PNGs followed by SVGs once the model improved.

CamperBob2

14 days ago

1 reply

That's what the example code at https://old.reddit.com/r/StableDiffusion/comments/1pqnghp/qw... generates. You get 0.png, 1.png ... n.png, where n= the requested number of layers-1.

dvrpAuthor

14 days ago

I saw some people at a company called Pruna AI got it down to 8 seconds with Cloudflare/Replicate, but I don't know if it was on consumer hardware or an A100/H100/H200, and I don't know if the inference optimization is open-source yet.

dragonwriter

14 days ago

The github repo includes (among other things) a script (relying on python-pptx) to output decomposed layer images into a pptx file “where you can edit and move these layers flexibly.” (I've never user Powerpoint for this, but maybe it is good enough for this and ubiquitous enough that this is sensible?)

oefrha

14 days ago

1 reply

I don't see the word powerpoint anywhere in https://github.com/QwenLM/Qwen-Image-Layered, I only see a code snippet saving a bunch of PNGs:

  with torch.inference_mode():
      output = pipeline(**inputs)
      output_image = output.images[0]
  
  for i, image in enumerate(output_image):
      image.save(f"{i}.png")

Unless it's a joke that went over my head or you're talking about some other GitHub readme (there's only one GitHub link in TFA), posting an outright lie like this is not cool.

dragonwriter

14 days ago

1 reply

> I don't see the word powerpoint anywhere in https://github.com/QwenLM/Qwen-Image-Layered,

The word "powerpoint" is not there, however this text is:

“The following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into a pptx file, where you can edit and move these layers flexibly.”

oefrha

14 days ago

Oh okay I missed it, sorry. But that’s just using a separate python-pptx package to export the generated list of images to a .pptx file, not something inherent to the model.

ThrowawayTestr

14 days ago

1 reply

Anyone have a good workflow for combining images in comfyui? I could never get it to work.

firenode

13 days ago

1 reply

Did you try Civitai workflow? I also failed.

ThrowawayTestr

13 days ago

I tried a few workflows I got from civitai

joshstrange

13 days ago

1 reply

One of the most valuable things about code generation from LLMs is the ability to edit it, you have all the pieces and can tweak them after the fact. Same with normal generated text. Images, on the other hand, are much harder to modify and the times when you might want text or other “layers” is specifically where they fall apart in my experience. You might get exactly the person/place/thing rendered but the additions to the image aren’t right but it’s nearly impossible to change just the additions without losing at least some of the other image/images.

I’ve often thought “I wish I could describe what I want in Pixelmator and have it create a whole document with multiple layers that I can go back in and tweak as needed”.

Bombthecat

13 days ago

Yep! Wrote it already on discord: this the first step of further integrating and making use of humans.

I think the future is something like: start draft. Turn draft into image with AI refine the boring layers. Edit the important layer.

Alifatisk

13 days ago

It's incredible how much the Qwen team is pushing out in this field

firenode

13 days ago

any workflow on this? Civitai workflow doesn't work.

BimJeam

13 days ago

Woah. This is gross. Need to test that.

dvrpAuthor

14 days ago

Qwen-Image-Layered is a diffusion model that, unlike most SOTA-ish models out there (e.g. Flux, Krea 1, ChatGPT, Qwen-Image) it's (1) open-weight (unlike ChatGPT Image or Nano Banana) and Apache 2.0; and has 2 distinct inference-time features: (i) it's able to understand the alpha channel of images (RGBA, as opposed to RGB only) which makes it able to generate transparency-aware bitmaps; and (ii), it's able to understand layers [1]—this is how most creative professionals work in software like Photoshop or Figma, where you overlay elements into a single file, such as a foreground and a background.

This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.

The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.

View full discussion on Hacker News

ID: 46321972Type: storyLast synced: 12/22/2025, 3:10:28 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN