Qwen-Image-Layered: Transparency and Layer Aware Open Diffusion Model
Key topics
The AI world is abuzz with Qwen-Image-Layered, a groundbreaking diffusion model that brings transparency and layer awareness to the table, setting it apart from other top-notch models like Flux and ChatGPT. As commenters dug in, they discovered that the model outputs a PowerPoint file containing decomposed layer images, sparking a mix of amusement and curiosity. While some were perplexed by the choice of format, others clarified that the GitHub repo includes a script to generate the PPTX file, allowing for flexible editing and layer manipulation. This innovative model is generating interest due to its open-weight and Apache-licensed nature, making it a standout in the AI landscape.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
17h
Peak period
4
18-21h
Avg / period
2.6
Based on 21 loaded comments
Key moments
- 01Story posted
Dec 18, 2025 at 10:24 PM EST
15 days ago
Step 01 - 02First comment
Dec 19, 2025 at 3:19 PM EST
17h after posting
Step 02 - 03Peak activity
4 comments in 18-21h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 20, 2025 at 4:34 PM EST
13 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
- Paper page: https://huggingface.co/papers/2512.15603
- Model page: https://huggingface.co/Qwen/Qwen-Image-Layered
- Quantized model page: https://huggingface.co/QuantStack/Qwen-Image-Layered-GGUF
- Blog URL: https://qwenlm.github.io/blog/qwen-image-layered/ (404 at the time of writing this comment, but it'll probably release soon)
- GitHub page: https://github.com/QwenLM/Qwen-Image-Layered
If you set a variable layers of 5 for example will it determine what is on each layer, or do I need to prompt that?
And I assume you need enough VRAM because each layer will be effectively a whole image in pixel or latent space… so if I have a 1MP image, and 5 layers I would likely need to be able to fit a 5MP image in VRAM?
Or if this can be multiple steps, where I wouldn’t need all 5 layers in active VRAM, that the assembly is another step at the end after generating on one layer?
The word "powerpoint" is not there, however this text is:
“The following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into a pptx file, where you can edit and move these layers flexibly.”
I’ve often thought “I wish I could describe what I want in Pixelmator and have it create a whole document with multiple layers that I can go back in and tweak as needed”.
I think the future is something like: start draft. Turn draft into image with AI refine the boring layers. Edit the important layer.
This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.
The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.