Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
Key topics
The Ovi project presents a new AI model for generating audio-video content, sparking discussion on its potential applications and implications, with some users excited about its creative possibilities and others concerned about its potential misuse.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
43m
Peak period
92
0-6h
Avg / period
19
Based on 114 loaded comments
Key moments
- 01Story posted
Oct 22, 2025 at 3:42 PM EDT
3 months ago
Step 01 - 02First comment
Oct 22, 2025 at 4:25 PM EDT
43m after posting
Step 02 - 03Peak activity
92 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 24, 2025 at 6:53 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
(Of course, excluding the obvious "that guy just knocked down a building!" CGI)
Obligatory Jonas Ussing plug: https://www.youtube.com/watch?v=7ttG90raCNo&list=PLgdTaHO8FL...
Dandadan intro and its lack of FPS and sharp lines: https://www.youtube.com/watch?v=a4na2opArGY
Animation doesn’t feel fast if it’s too many FPS or too steady, anyway, ironically and counterintuitively. You can’t do everything on the ones and twos.
To your point about Dandadan’s intro, it’s jam packed with references, which is another kind of skill in and of itself:
https://www.youtube.com/watch?v=5sUaK0xahBU
Chainsaw Man is in that same vein, and is another Science Saru production. I’m looking forward to seeing what they will do with the Ghost in the Shell franchise next year.
https://en.wikipedia.org/wiki/Science_Saru
I get what you mean though regarding Dandadan’s animation style; it has a very hand drawn manga vibe, and the detail is minimal yet finely balanced against the overwhelming amount of noise and visuals. It’s like a slapdash superflat.
https://en.wikipedia.org/wiki/Superflat
On a side note, as an anime fan, MBS is doing great work lately. I liked Witch Watch much more than I expected to, and that’s a much better show than the genres involved would lead one to expect.
https://en.wikipedia.org/wiki/Mainichi_Broadcasting_System
that and the guitar player behind the singer in the concert example has three arms :)
https://news.ycombinator.com/item?id=45603435
https://news.ycombinator.com/item?id=45652726
Easier than ever now, as AI-assisted coding tools will build you that generic landing page and basic UI.
But I also suspect that most of these are indeed SEO scammers, that there's no actual service, and that all payments are pocketed. It might take a few days for the scam to be reported and the site taken down, but it's likely enough to get a few hundred bucks out of it. They'll never be pursued because of where they live, and they can have many of these up in no time, thanks to AI, as you say.
What a sad state of affairs that no "AI" company or government is taking seriously.
Lots of activity around Wan lately. It’s nice to see flexible open models make a strong showing against the massively funded closed competitors like OpenAI and Runway.
Kling still has the best proprietary video model, but Sora 2 is so smart that you don't need to edit anything if your target is social.
I don't see how Runway, Pika, or the rest of the purely foundation video model startups survive against the giants and the incredible open source Chinese models. They've got to be sweating bullets right now.
Everyone's also sleeping on xAI's high quality and insanely fast video model (10 second generations) that they're giving away completely for free without watermarks.
I think the moat here will ened up being value adds for convenience, tooling, IP licensing, integration into the rest of the pipeline used for content production, etc.
Wan – Open-source alternative to VEO 3 - https://news.ycombinator.com/item?id=44928997 - Aug 2025 (38 comments)
The only catch is that I'd need to get 32 people who want VMs like this since I would have to do it for the entire box of compute.
Wan2.2 runs just fine on AMD.
Though only a shared A40/A100 are in that price range.
Vultr is a box of 8 minimum and not on-demand and they don't offer VMs.
On the other hand, I offer the bare minimum (1 GPU for 1 minute) (or 2, 4, 8x), on-demand, no-contract, and an API to automate it all. We also have 100G unlimited bandwidth and free IPv4. Oh and our 8x box specs are generally better... 122TB of enterprise NVMe.
Three years ago we had a live streaming autogen-seinfeld twitch stream; some kind of coherent story telling via AI doesn't seem beyond reach today, the tools just haven't fully matured yet.
Circular reasoning. If you can't answer WHY people should come to like AI movies, then you have nothing to say.
And yes, I'm well aware of the allegory of the cave. So is everyone. What I don't understand is why it's such a popular rhetorical device with people who have no discernible point but want to sound as if they do. It's actually quite ironic.
they're not doing enough to optimize AI data generators for dopamine release with animalistic obsession. Instead they focus on scientific indistinguishablilitiness, and people aren't liking that. IMO that's has been an ongoing and growing costly mistake.
Younger generation who grow up with AI will just think it’s normal, like we think being connected to the internet via a rectangle you keep in your pocket is normal.
AI movies are not a "scientific idea". Liking them is a matter of taste, and there are plenty of things that never catch on.
My point is that you and I will probably never accept it - but our kids will never even think it’s weird in the first place.
So far not one commenter in this thread has articulated why AI movies are inevitable.
It's inevitable because you won't be able to tell the difference.
I see you responded to this point elsewhere in this thread, but frankly your reply is a non-sequitur. I'm not sure what you mean by it.
I recall eerily similar things said about Google Glass..
Maybe AI generation will be used in popular media more often, but purely AI generated content or AI brain rot seems to only appeal to a small crowd of people right now, and I don't see that crowd growing significantly.
Maybe it's a technology problem, as Google Glass was, but I think that's inseparable from the content it actually generates at this non-AGI stage.
Regardless, it sounds very uncertain and perhaps even unlikely that what we see being created now is the future.
[1] https://en.wikipedia.org/wiki/Planck%27s_principle
If you're talking about people firing up the ol' 5090 to make a "movie" about their favorite streamer falling madly in love with them for, ahem, personal use, I have no doubt that people will do that. And I will do everything in my power to avoid associating with such brain-rotted cretins.
Most people would use these tools for personal use, if nothing else. Seeing a celebrity, themselves, their friends, etc., act out any scenario they can think of is quite an appealing proposition. And porn, of course, for better or worse.
In the long-term, this has the potential to significantly change how media is created and consumed. Feature films produced by large studios will undoubtedly continue to exist, and they will also leverage the technology, but it's not difficult to imagine a new branch of personalized media becoming popular. The tools are practically already there; they just need to become more accessible, and slightly better.
> Most people would use these tools for personal use
Not what we're talking about. Not "personalized media", not large studios "leveraging the technology", not "visual effects".
See: "blockbuster movies produced by a guy in his basement for <$1000".
If you're unable to draw a line between the points I made and "blockbuster movies produced by a guy in his basement for <$1000", that's on you.
There is no line, and you never claimed there was in your original comment, so stop moving the goalposts. Vague language like "personalized media becoming popular" is not the same thing as "blockbuster movies".
Calling my answer "short-sighted" when you couldn't be bothered to read the thread or apparently even the thing I was replying to is, in fact, on you.
As a matter of fact, all the actually normal people I talk to about AI in person also find it offputting.
also, case in point, normal people don't dig through a random stranger's post history to look for an ad hominem opportunity, and instead evaluate individual posts by their contents. lol.
Porn is still taboo. It's understood that most people use it, but it's not exactly something you bring up in polite company.
Where on earth do you live that prostitution is "widely accepted by polite society"? You can go to jail for it where I am.
And I did address the rest of your comment. As I said, in my experience "normal" people do object to AI content. I don't know where you got the bit about "background checks" and being "allowed" to like stuff. Nobody I know had to be told to have an aversion to AI "art", it's a natural reaction.
AI is but a tool; if there is an artist using them, real art can be created, as with any other tool.
So far, Ai generated videos, and arguably photos seem to only please wishful thinkers, or untalented artists dreaming to make it.
I don't imply the tech will never get to the tipping point, but it so far provides so little value we are either many years to go, or it just won't happen.
Let's be an optimist. It will eventually get there. I doubt for any of parallels you made billions of people hammered daily by overblown posts about the upcoming revolution.
The reasons for critiques have a lot to do with promotion fatigue. Hyperboles eventually exhaust their impact.
https://reddit.com/r/singularity/comments/1lq299r/postscarci...
https://reddit.com/r/midjourney/comments/1o6ickx/dreaming_on...
https://reddit.com/r/midjourney/comments/1n6mzig/how_to_buil...
https://reddit.com/r/aivideo/comments/1nwdjdn/the_perfect_bo...
https://reddit.com/r/aivideo/comments/1m8a9wz/pinkington_rop...
https://reddit.com/r/aivideo/comments/1n52kut/derek_the_agin...
https://reddit.com/r/midjourney/comments/1muwyah/still_here_...
https://reddit.com/r/DefendingAIArt/comments/1mttoi4/my_not_...
I think we'll see AGI first.
Probably never. If AI is good enough to cover all the skills needed to do what would currently make a blockbuster movie for less than $1000, the demand for movies will be small enough relative to supply that there will be no such thing as a “blockbuster movie”
On the other hand, I think the quality of movies and expectations will be a lot higher.
This is obviously true, but I don't see how it relates to the question being discussed. "Short videos" and "blockbuster movies" are clearly widely separated categories, despite both being audiovisual content of some kind.
No, that would require a radically different argument, in pretty much every way.
> if everyone can upload videos that anyone can watch, nobody will really be famous because fame will become very evenly distributed, right?
No, Youtube makes distribution cheap, but it doesn't substitute for most of the other things that differentiate between videos; most of the skills that provide variation between videos are still there, and not cheaply substituted via YouTube.
Edit: perhaps 12 angry men was good enough at the time.
I recently watched it for the first time, and it was one of the best movies I've seen. I can't believe how invested I was, even though the plot was so simple.
When AI slop figures out that formula, we are truly cooked
https://www.energiavfx.com
https://m.youtube.com/watch?v=bS5P_LAqiVg
Im sure more wil follow.
(loud music warning)
https://www.youtube.com/watch?v=1ohaFZllmUE
Before we see this and higher level of quality accessible to enthusiasts, we'll see these tools adopted by mainstream studios first, which is starting to happen.
I'm a firm "AI" skeptic, but if this technology has revolutionized anything, it has been image generation. A few years ago it was science fiction to have the quality of upscaling we take for granted today. I reckon the same will happen with video generation as well a few years from now. Unlike "ASI" and "AGI", these improvements are achievable with better engineering, and don't necessarily require a breakthrough.
[1]: https://news.ycombinator.com/item?id=44564697
I was in a few of the early meetings on the Helsinki site where I overheard some executives expressing their intention to go after Google. These people had some balls. No clue whatsoever unfortunately. But it was the right kind of ballsy move that Nokia could have pulled off with a bit more vision.
The name was more or less a LOL WHUT?! kind of thing and it flopped horribly with consumers. But still there was some nice stuff in there that wasn't half bad. It's just that the whole branding and rudderless direction doomed it. And of course it was all tied to a failing device software strategy. So when that failed the rest failed as well. I'm not even sure when they pulled the plug on OVI exactly. It was such a non event in the grand scheme of things (mass layoffs, sale of the phone division to MS and subsequent closure, etc.) Must have been around 2013ish I would say. I was gone by then.
1. Goes to friends' place 2. Usual drinks, whatever gets you going activity 3. Each person writes a prompt 4. Chain them together 5. Watch the resulting movie together
That sounds hilarious and I can't wait to try
I have fond memories of laughing until I was in tears when playing with a group of friends over drinks during the lockdowns in 2020. Something about the process just naturally results in hilarity (especially if you're in a group where you can be offensive).
It's like exquisite corpse for t-shirts. Or, in your case, shorts.
Whenever one of my friend groups is gathered we always make it a point to do an exquisite corpse story on a piece of paper while we’re inebriated in some way xD Video version will be wild
I've been using Ovi for about a week and it's a blast. Like all AI gen, it's a slot machine and even putting in good inputs might lead to bad outputs, but if you run it enough you'll get something good or usable.
I've definitely made many things that look and sound real with both I2V and T2V, albeit T2V tends to look more like 90s tv quality at times, but that also makes it seem more real. If you use Flux SPRO as the image source you can get some pretty realistic looking videos.
I do have a 5090, so it takes about 4 to 5 minutes to make a 5 second clip.
what is your setup? took 2 hours for me on 9950x3d with 5090. Any idea what I could be missing? or maybe some other variable is off - i was using default .yml values.
and here you are, clutching pearls about AI girlfriends. lol. lmao.
https://www.youtube.com/watch?v=DME86-QucsA
Great work all around though.
Also this model seems to benefit noticeably from having both Cuda >= 12.8 and Torch >= 2.8, and separately SageAttention over Flash 2. But I have yet to see any cache threshold with Easy or Tea that doesn’t get a bit postmodern.