TurboDiffusion
github.comKey Features
Tech Stack
Key Features
Tech Stack
I like my buttons to stay where I left them.
I do like my buttons to stay where I left them - but that can be conditioned. instead of gnome "designers" telling me the button needs to be wide enough to hit with my left foot, I could tell the system I want this button to be small and in that corner - and add it to my prompt.
That will be Windows 12 and perhaps 2 generations in of iOS :)
I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.
Otherwise it's similar to the way nine women can make a baby in a month. :)
This on the other hand happily pretends to match any kind of realism requested like a skilled painter would, with the tradeoff mainly being control and artistic errors.
For now. We're not even a decade in with this tech, and look how far we've come in the last year alone with Veo 3, Sora 2, and Kling 4x, and Kling O1. Not to mention the editing models like Qwen Edit and Nano Banana!
This is going to be serious tech soon.
I think vision is easier than "intelligence". In essence, we solved it in closed form sixty years ago.
We have many formulations of algorithms and pipelines. Not just for the real physics, but also tons of different hacks to account for hardware limitations.
We understand optics in a way we don't understand intelligence.
Furthermore, evolution keeps evolving vision over and over. It's fast and highly detailed. It must be correspondingly simple.
We're going to optimize the shit out of this. In a decade we'll probably have perfectly consistent Holodecks.
Understanding optics instead of intelligence speaks to the traditional render workflow, a pure simulation of input data with no "creative processes". Either the massive hack that is traditional game render pipelines, or proper light simulation. We'll probably eventually get to the point where we can have full-scene, real-time ray-tracing.
The AI image generation approach is the "intelligence" approach where you throw all optics, physics and render knowledge up in the air and let the model "paint" according to how it imagines the scene, like handing a pencil to a cartoon/anime artist. Zero simulation, zero physics, zero roles - just the imagination of a black box.
No light, physics or existing render pipeline tricks are relevant. If that's what you want, you're looking for entirely new tricks: Tricks to ensure object permanence, attention to detail (no variable finger counts), and inference performance. Even if we have it running in real-time, giving up your control and definition of consistency is part of the deal when you hand off the role of artist to the box.
If you want AI in the simulation approach you'll be taking an entirely different path, skipping any involvement in rendering/image creation and instead just letting the model pupetteer the scene within some physics restraints. Makes for cool games, but completely unrelated to the technology being discussed.
They'll have player input controls, obviously, but they'll also be fed ControlNets for things like level layout, enemy placement, and game loop events.
When that happens, and when it gets good, it'll take over as the dominant type of game "engine".
[1] https://neurips.cc/virtual/2025/loc/san-diego/poster/121952
We’ve seen this play out before, when social media first came to prominence. I’m too old and cynical to believe anything will happen. But I really don’t know what to do about it at a person level. Even if I refuse to engage in this content, and am able to identify it, and keep my family away from it…it feels like a critical mass of people in my community/city/country are going to be engaging with it. It feels hopeless.
The best way in that case is education of the kids / people and automatically flag potentially harmful / disgusting content and let the owner of the device set-up the level of filtering he wants.
Like with LLMs they should be somewhat neutral in default mode but they should never refuse a request if user asks.
Otherwise the line between technology provider and content moderator is too blurry, and tomorrow SV people are going to abuse of that power (or be coerced by money or politics).
At a person / parent level, time limits (like you can do with web filtering device for TikTok), content policy would solve and taking time to spend with the kids as much as possible and to talk to them so they don’t become dumber and dumber due to short videos.
Censorship for generative AI simply doesn't work the way we are used to, unless we make it illegal to posess a model that might generate illegal content, or that might have been trained on illegal data.
Censorship doesn't work for stuff that is currently illegal. See pirated movies.
We need to take the threat of companies wresting control of our privacy and autonomy from us a lot more seriously, and not engage with ridiculous hyperbole from “ai ethics” types.
Now, I've not tested TurboDiffusion yet, but I am very actively generating AI video, I probably did a half hour of finished video clips yesterday. There is no test for this issue yet, and for the majority it is yet to be realized as an issue.
I haven't checked but it's likely already in there and if not will be pretty soon
https://gist.github.com/b7r6/94f738f4e5d1a67d4632a8fbd18d347...
Faster than Turbo with no pre-distill.
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.