Voyager – an Interactive Video Generation Model with Realtime 3d Reconstruction
Posted4 months agoActive4 months ago
github.comTechstoryHigh profile
excitedpositive
Debate
60/100
AI3d ReconstructionVideo Generation
Key topics
AI
3d Reconstruction
Video Generation
Voyager is an interactive video generation model with real-time 3D reconstruction that has garnered significant interest and discussion on HN, with users exploring its capabilities and limitations.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
108
0-6h
Avg / period
20
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 3, 2025 at 7:07 AM EDT
4 months ago
Step 01 - 02First comment
Sep 3, 2025 at 7:07 AM EDT
0s after posting
Step 02 - 03Peak activity
108 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 5, 2025 at 10:32 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45114379Type: storyLast synced: 11/20/2025, 8:28:07 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Fixed question: Thanks a lot for the feedback that human perception is not 2D. Let me rephrase the question: since all the visual data we see on computers can be represented as 2D images (indexed by time, angle, etc.), and we have many such 2D datasets, do we still need to explicitly model the underlying 3D world?
And of course it really makes more sense to say human perception is 3+1-dimensional since we perceive the passage of time.
[1] https://en.wikipedia.org/wiki/Proprioception
None of these world models have explicit concepts of depth or 3D structure, and adding it would go against the principle of the Bitter Lesson. Even with 2 stereo captures there is no explicit 3D structure.
The model can learn 3D representation on its own from stereo captures, but there is still richer, more connected data to learn from with stereo captures vs monocular captures. This is unarguable.
You're needlessly making things harder by forcing the model to also learn to estimate depth from monocular images, and robbing it of a channel for error-correction in the case of faulty real-world data.
I still don't understand what the bitter lesson has to do with this. First of all, it's only a piece of writing, not dogma, and second of all it concerns itself with algorithms and model structure itself, increasing the amount of data available to train on does not conflict with it.
There are other sensors as well. Is the inner ear a 2d sensor?
The 3rd dimension gets inferred from that data.
(Unless you have a supernatural sensory aura!)
You are inferring 3D positions based on many sensory signals combined.
From mechanoreceptors and proprioceptors located in our skin, joints, and muscles.
We don’t have 3-element position sensors, nor do we have 3-d sensor volumes, in terms of how information is transferred to the brain. Which is primarily in 1D (audio) or 2D (sensory surface) layouts.
From that we learn a sense of how our body is arranged very early in life.
EDIT: I was wrong about one thing. Muscle nerve endings are distributed throughout the muscle volume. So 3D positioning is not sensed, but we do have sensor locations distributed in rough and malleable 3D topologies.
Those don’t give us any direct 3D positioning. In fact, we are notoriously bad at knowing which individual muscles we are using. Much less what feeling correspond to what 3D coordinate within each specific muscle, generally. But we do learn to identify anatomical locations and then infer positioning from all that information.
Then the mapping of those sensors to the bodies anatomical state in 3D space is learned.
A surprising number of kinds of dimension involved in categorizing sensors.
It doesn’t make it any less 3d though. It’s the additive sensing of all sensors within a region that gives you that perception. Fascinating stuff.
a) In a technical sense the actual receptors are 1D, not 2D. Perhaps some of them are two dimensional, but generally mechanical touch is about pressure or tension in a single direction or axis.
b) The rods and cones in your eyes are also 1D receptors but they combine to give a direct 2D image, and then higher-level processing infers depth. But touch and proprioception combine to give a direct 3D image.
Maybe you mean that the surface of the skin is two dimensional and so is touch? But the brain does not separate touch on the hand from its knowledge of where the hand is in space. Intentionally confusing this system is the basis of the "rubber hand illusion" https://en.wikipedia.org/wiki/Body_transfer_illusion
Point (I.e. single point/element) receptors, that encode a single magnitude of perception, each.
The cochlea could be thought of 1D. Magnitude (audio volume) measured across 1D = N frequencies. So a 1D vector.
Vision and (locally) touch/pressure/heat maps would be 2D, together.
The measurement of any one of those is a 0 dimensional tensor, a single number.
But then you are right, what. is being measured by that one sensor is 1 dimensional.
But all single sensors measure across a 1 dimensional variable. Whether it’s linear pressure, rotation, light intensity, audio volume at 1 frequency, etc.
Many of our signals are "on" and are instead suppressed by detection. Ligand binding, suppression, the signalling cascade, all sorts of encoding, ...
In any case, when all of our senses are integrated, we have rich n-dimensional input.
- stereo vision for depth
- monocular vision optics cues (shading, parallax, etc.)
- proprioception
- vestibular sensing
- binaural hearing
- time
I would not say that we sense in three dimensions. It's much more.
[1] https://en.m.wikipedia.org/wiki/G_protein-coupled_receptor
https://theoatmeal.com/comics/mantis_shrimp
I'm not entirely convinced that this isn't one of those, or if its not it sure as shit was trained on one.
Cool, I guess… If you have tens of thousands of $ to drop on a GPU for output that’s definitely not usable in any 3D project out-of-the-box.
Is more approachable than one might think, as you can currently find two of these for less than 1,000 USD.
https://blog.emojipedia.org/why-does-the-chart-increasing-em...
It's interesting to me that this breaks convention with the visual spectrum.
IE
red ~700nm
green ~550nm
yellow ~580nm
Weird that they aren't in order.
Also, there is no training data, which would be the "preferred form" of modification.
From their license: [1]
As well as an acceptable use policy: [1] https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager/blob...Or, those countries are trying to regulate AI.
Hard to feel bad for EU/UK. They tried their best to remain relevant, but lost in the end (talent, economy, civil rights).
We didn't regulate adtech and now we're stuck with pervasive tracking that's hurting society and consumer privacy. Better to be more cautious with AI too so we can prevent negative societal effects rather than trying to roll them back when billions of euros are already at play, and thus the corporate lobby and interests in keeping things as they are.
We didn't regulate social media algorithms which started optimising for hate (as it's the best means of "engagement") and it led to polarisation in society, the worst effects of which can be seen in the US itself. The country is tearing itself apart. And we see the effects in Europe too. Again, something we should have nipped in the bud.
And the problem isn't mainly the tech. It's the perverse business models behind it, which don't care about societal diruption. That's pretty hard to predict, hence the caution.
Isn't fine-tuning a heck of a lot cheaper?
Just training on new data moves a model away from its previous behavior, to an unpredictably degree.
You can’t even reliably test for the change without the original data.
> Also, there is no training data, which would be the "preferred form" of modification.
This is not open source because the license is not open source. The second line is not correct, tho. "Preferred form" of modification are weights, not data. Data is how you modify those weights.
I think at this point, open source is practically shorthand for weights available
> 8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
"Do as I say, not as I do."
> 15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
This, and other clauses, effectively prohibit the use of this system within any jurisdiction.
What a ridiculous policy.
A more plausible explanation is the requirements and obligations of those markets are ambiguous or open-ended in such a way that they cannot be meaningfully limited by a license, per the lawyers they retain to create things like licenses. Lawyers don’t like vague and uncertain risk, so they advised the company to reduce their risk exposure by opting out of those markets.
Since the law is very well developed in the EU, I think the people who wrote the license were just lazy.
So, they reduced their liability by prohibiting usage of the model to show those jurisdictions' decision makers they were complying. I considered doing the same thing for EU. Although, I also considered one mught partner with an EU company if they are willing to make models legal in their jurisdiction. Just as a gift to Europeans mainly but maybe also a profit-sharing agreement.
It's the EU AI act. I've tried their cute little app a week ago, designed to let you know if you comply, what you need to report and so on. I got a basically yes, but likely no, still have to register to bla-bla and announce yak-yak and do the dooby-doo, after selecting SME - open source - research - no client facing anything.
It was a mess when they proposed it, it was said to be better while they were working on it, turns out to be as unclear and as bureaucratic now that it's out.
I live in Europe, I don't want Europe to become a vassal of China/Russia - but if something drastically does not change it will. Russia is Europe's Carthage, Russia must fall. There is no future with a Russia as it is today and a Europe as it is today in it, not because of Europe, but because of Russia. If Europe does not eliminate Russia, Russia will eliminate Europe. I have no doubts about this.
But as things stand, there just seems no way in which we practically can counter Russia at all. If Europe had determination, it would have sent Troops into Ukraine and created a no-fly zone — it should do that, but here we are.
When it's getting to a point where far-right leaders appear to care more about the prosperity of Russia than their own nation or their allies... yeah it's probably misinformation. At best. At worst, it's targeted propaganda - lots of bots online!
Ukraine will all the backing of Europe is making no progress, if this was true, Russia would be expelled from Ukraine tomorrow, as it should be. Ukraine is an embarrassment for Europe, it strongly suggests that Europe is basically meaningless on the global stage.
And the most embarrassing of all is, Europe is still buying gas from Russia.
"suggests that Europe is basically meaningless on the global stage" ... it will take many years of deep military investment to provide a proper counter to Russian aggression. As of right now, Europe has been shown to be in a very weak and exposed position. This was obvious years ago, and should not be a surprise today. This is true of most of the NATO member states.
That said, simply because Ukraine is unable to expel Russia does not mean that it is a grand threat to Europe proper. Perhaps some eastern countries face some limited conflict, but I'm not convinced by this "domino theory" that Russia would engage in a WWII style invasion of Poland, Finland, etc.
It's better to have a balanced number of warheads so that this isn't possible.
For this reason the US and Russia always negotiated before reducing warhead numbers bilaterally. But the EU on its own is not in the same league.
But say that you were right, and you have to choose between privacy and relevance, if you choose privacy, then once you are entirely economically dependent on Russia (Europe is still paying more in energy money to Russia than in aid to Ukraine) and China — when Europe is a vassal — it won't be able to make its own laws anymore.
I got the same impression seeing Trump meet Putin. The US is a vassal state of Russia.
There's nothing special about EU regulations vis-a-vis other laws. China, Russia and the US also have laws, many of which are also perceived as overly bureaucratic.
Pharma as a whole is still dwarfed by the trillion-dollar US tech giants that the EU has no equivalent of. One standout doesn't change the broader lag.
Russia is currently struggling to make inroads on invading its relatively small neighbor, so I really doubt it would be able to make a bunch of nuclear powers who have a nuclear alliance its "vassal"
I understand that Russia's not fighting just Ukraine but rather Ukraine with massive US and EU assistance but my point still stands.Ukraine doesn't have that "benefit".
That the west is also doing some bad stuff (though really in the EU we're not that bad IMO, most EU countries recognise Palestina now, it's just for a few blocking hard measures against Israel) isn't really a relevant topic in this. We're not going to have boots on the ground in this conflict until an agreement is reached because of the risk of escalation.
Also the EU pays for countries like Turkey and Libya to prevent refugee ships from coming to their continent. If that means sinking those ships with people on them, well...
People really don't understand war and death, they treat them as some silly sports game. As a result they completely miss the boat not only about military conflicts but also about peace politics.
That's why democracies are so good. Because it's hard to do too stupid things in them persistently.
OK but Ukraine isn't trying to invade a small country next door and claim a global superpower status.
It's expected they would struggle against a much larger neighbor invading them.
Russia is struggling where nobody expected it to struggle.
Will take them a while to get out from under the US umbrella. But acknowledging the problem is the first step.
Spending on defense is not the same as. Norway is spending more on everything all the time and getting worse outcomes all the time. We spend more on police than ever, even per capita, and crime is up, we spend more on military than ever, and our actual metrics are down. I think with most of Europe the defense spending is the same, I hope I'm wrong, but if you up regulation then you have to spend more to get the same results, and Europe has runaway regulation in addition to people who try to hijack institutions for other purposes.
Overconfidence bias is real.
Knowing your circle of competence is a gift.
Today, we have fully automated the methods from this manual in the form of LLM Chatbots, which we have for some reason deployed against ourselves.
[1] https://en.wikipedia.org/wiki/Simple_Sabotage_Field_Manual
In general, it is hard to compare the US and the EU; we got a head start while the rest of the world was rebuilding itself from WW2. That started up some feedback loops. We can mess up and siphon too much off a loop, destroying it, and still be ahead. They can be setting up loops without benefitting from them yet.
Personally I'm not too worried anyone is going to become a global superpower from generative AI slop.
Start on the right, and click through the options. At the end you'll get a sort of assessment of what you need to do.
The UK has their chat thing where if you provide chat (even with bots!) you have to basically be a megacorp to afford the guardrails they think "the kids" need. It's not clear if open source models fall into that, but who's gonna read 300+ pages of insanity to make sure?
[1] https://en.wikipedia.org/wiki/Restrictions_on_geographic_dat...
[2] https://cset.georgetown.edu/publication/south-korea-ai-law-2...
65 more comments available on Hacker News