Multimodal Diffusion Language Models for Thinking-Aware Editing and Generation
Mood
calm
Sentiment
neutral
Category
science
Key topics
AI
Language Models
Multimodal Diffusion
A new research paper introduces multimodal diffusion language models for thinking-aware editing and generation, but the post garners little attention or discussion on HN.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
1h
Peak period
2
Hour 2
Avg / period
1.7
Based on 10 loaded comments
Key moments
- 01Story posted
11/19/2025, 9:27:17 AM
9h ago
Step 01 - 02First comment
11/19/2025, 10:30:13 AM
1h after posting
Step 02 - 03Peak activity
2 comments in Hour 2
Hottest window of the conversation
Step 03 - 04Latest activity
11/19/2025, 6:47:54 PM
38m ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
> (ParaRL), a novel strategy that applies semantic rewards along the trajectory to enforce cross-modal consistency.
(emphasis mine)
This sounds really cool. The fact that one generation "attends" to the other is really interesting. I'm curious if this would hold for other modalities. I'm thinking coding specific applications, where things can change once something is generated. My hunch is that coding would benefit a lot from this approach, because the "manual" way of writing code often resembles diffusion more than autoregressive (that is, we often edit something here, then because we did that we have to import something, then change something there, then that leads to further changes, etc).
For now coding seems to benefit a lot from <thinking> -> <coding> -> <env_feedback> -> <reflexion> -> <thinking> -> <coding>, but this seems at a glance to be shoehorned in for autoregressive generation... GPT5 in particular seems to be better at this, with multiple "tool calls" interleaved in its thinking sessions. I wonder if this would get better with the paralel denoising thing proposed here, where both thinking and coding are done in paralel, and one can "attend" to the other. Add some feedback (linters, compilers, LSPs, tests, etc.) and this can go places. If it works.
If you haven't tried an agentic IDE such as Cursor yet, or at least an extension such as Copilot, I would recommend checking them out and trying out Anthropic's models as well.
What's cool with this thinking & generation in parallel is that one can attend to the other. So you're not limited by prompt influences code, but can do prompt influences both thinking and code, and code can influence thinking and thinking can influence code.
This may solve the additional clouding that comes from LLMs using what is an effectively an iteration of instants to introspect the past. You cannot ask a autoregressive model what the thinking was behind the output because the only memory it has of the past is the output. It has to infer what it meant just the same as anyone else would.
To some extent this probably also happens in humans. You have richer memories, but you still do a lot of post hoc rationalisation.
There are all sorts of places where the text and output is at least one degree of separation from the underlying activation vectors or other representations handled by a model, from floating point precision all the way up to tokenization abstraction, and a lot of experiments get run as if the tokens and context and representations are all one unified data concept. Have to match data abstractions appropriately, or the weird edge cases will break things in unexpected ways.
> We provide two varients of MMaDA-Parallel with different tokenizers. MMaDA-Parallel-A is trained with tokenizer Amused-VQ, and MMaDA-Parallel-M is trained with tokenizer Magvitv2.
tyfeld/MMaDA-Parallel-A: https://huggingface.co/tyfeld/MMaDA-Parallel-A/tree/main
tyfeld/MMaDA-Parallel-M: https://huggingface.co/tyfeld/MMaDA-Parallel-M/tree/main
"This approach could transform how AI assists with editing and generation."
"Cutting-edge research—looking forward to seeing practical applications!
2 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.