Sharp, an Approach to Photorealistic View Synthesis From a Single Image
Key topics
The unveiling of SHARP, a technique for photorealistic view synthesis from a single image, has sparked a lively discussion about its potential applications and implications. Some commenters are abuzz about the technology's connection to features like Cinematic mode, while others are exploring its potential uses in simulation and 3D programming. A debate is brewing between those who see the value in AI-generated visuals, like accurrent, who notes its potential to aid in simulation, and skeptics like calvinmorrison, who questions the practicality of investing in such technology. As the conversation unfolds, it becomes clear that this innovation is stirring up both excitement and unease.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
59m
Peak period
60
0-6h
Avg / period
13.8
Based on 110 loaded comments
Key moments
- 01Story posted
Dec 15, 2025 at 11:06 PM EST
18 days ago
Step 01 - 02First comment
Dec 16, 2025 at 12:05 AM EST
59m after posting
Step 02 - 03Peak activity
60 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 20, 2025 at 12:25 AM EST
14 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I doubt this will be useful for robotics or industrial automation, where you need an actual spatial, or functional understanding of the object/environment.
I have worked on simulation and in my day job do a lot of simulation. While physics is oftem hard and expensive you only need to write the code once.
Assets? You need to comission 3d artists and then spend hours wrangling file formats. Its extremely tedious. If we could take a photo and extract meshes Im sure we'd have a much easier time.
[1] https://trianglesplatting.github.io/
Is there a similar flow but to transform either a video/photo/NeRF of a scene into a tighter, minimal polygon approximation of it. The reason I ask is that it would make some things really cool. To make my baby monitor mount I had to knock out the calipers and measure the pins and this and that, but if I could take a couple of photos and iterate in software that would be sick.
[0]https://www.spaitial.ai/
Why no landscape or underwater scenes or something in space, etc.?
I believe this company is doing image (or text) -> off the shelf image model to generate more views -> some variant of gaussian splatting.
So they aren't really "generating" the world as one might imagine.
It’s a website that collects people’s email addresses
https://m.youtube.com/watch?v=DgPaCWJL7XI&t=1s&pp=2AEBkAIB0g...
https://www.youtube.com/watch?v=X0oSKFUnEXc
It will evolve into people hooked into entertainment suits most of the day, where no one has real relationships or does anything real of consequence, like some sad mashup of Wall-E and Ready Player One.
If we’re lucky, some will want to “real world” with augmented reality.
Maybe we’ll get really nice holovisions, where we can chat with virtual celebrities.
Who needs that? We’re already having to shoot up weight-loss drugs because we binge watch streaming all the time, because we’ve all given up, assuming AI will do everything. What good will come from having better technology when technology is already doing harm?
https://en.wikipedia.org/wiki/Great_Filter
like there are people who avoid alcohol, opioids, heroin, all other wireheading-style drugs and experiences that exist already, and people who do exercise and stay thin in a world of fast food and cars.
A great filter needs to apply to every civilisation imaginable, no exceptions, nerfing billions of species before they get to a higher Kardashev scale, not just be the latest “Dunning-Kruger” mic drop to spam into every thread all the time.
Maybe when NASA, ESA, SpaceX, RosCOSMOS, CNSA, IRSA all collapse because of this effect… look how many countries have a space agency! https://en.wikipedia.org/wiki/List_of_government_space_agenc...
1. Sky looks jank 2. Blurry/warped behind the horse 3. The head seems to move a lot more than the body. You could argue that this one is desirable 4. Bit of warping and ghosting around the edges of the flowers. Particularly noticeable towards the top of the image. 5. Very minor but the flowers move as if they aren't attached to the wall
https://github.com/apple/ml-sharp#rendering-trajectories-cud...
(I am oversimplifying).
I just want to emphasize that this is not a NERF where the model magically produces an image from an angle and then you ask "ok but how did you get this?" and it throws up its hands and says "I dunno, I ran some math and I got this image" :D.
Gaussian splashing is pretty awesome.
Imagine history documentaries where they take an old photo, free objects from the background, and then move them round to give the illusion of parallax.
Even using commas, if you leave the ambiguous “free” I suggest you prefix “objects” with “the” or “any”.
Already you sometimes see where manually cut out a foreground person from the background and enlarge them a little bit and create a multi-layer 3D effect, but it's super-primitive and I find it gimmicky.
Bringing actual 3D to old photographs as the camera slowly pans or rotates slightly feels like it could be done really tastefully and well.
I guess this is what they use for the portrait mode effects.
Photoshop content aware fill could do equally or better many years ago.
My experience with all these solutions to date (including whatever apple are currently using) is that when viewed stereoscopically the people end up looking like 2d cutouts against the background.
I haven't seen this particular model in use stereoscopically so I can't comment as to its effectiveness, but the lack of a human face in the example set is likely a bit of a tell.
https://github.com/apple/ml-depth-pro
https://learnopencv.com/depth-pro-monocular-metric-depth/
https://github.com/rcarmo/ml-sharp (has a little demo GIF)
I am looking at ways to approximate Gaussian splats without having to reinvent the wheel, but I'm a bit over my depth since I haven't been playing a lot of attention to those in general.
Keep in mind that this is not Gaussian splat rendering but just a hacked approximation--on my NVIDIA machine that looks way smoother.
https://github.com/sparkjsdev/spark
What's weird is we're getting better at faking 3D from 2D than we are at just... capturing actual 3D data. Like we have LiDAR in phones already, but it's easier to neural-net your way around it than deal with the sensor data properly.
Five years from now we'll probably look back at this as the moment spatial computing stopped being about hardware and became mostly inference. Not sure if that's good or bad tbh.
Will include this one in my https://hackernewsai.com/ newsletter.
We have two eyes that gives us depth by default.
This is really interesting to me because the model would have to encode the reflection as both the depth of the reflecting surface (for texture, scattering etc) as well as the "real depth" of the reflected object. The examples in Figure 11 and 12 already look amazing.
Long tail problems indeed.
Not only do many VR and AR systems acquire stereo, we have historical collections of stereo views in many libraries and museums.
Without that that it's hard to tell how cherry-picked the NVS video samples are.
[0] https://news.ycombinator.com/item?id=46252114