We Replaced H.264 Streaming with Jpeg Screenshots (and It Worked Better)
Key topics
A team swapped H.264 streaming for JPEG screenshots to transmit robot code-writing sessions, and to their surprise, it worked better. Commenters were abuzz, with some pointing out that sending text instead of graphics would be more efficient, while others drew parallels with existing tools like asciinema, which records terminal sessions. The discussion sparked a lively debate about the merits of different approaches, with some revelations, such as the simplicity of MJPEG streaming, and nods to historical precedents, like an MPEG-1-based screen sharing experiment from a decade ago. As commenters dug deeper, they unearthed some fascinating analogies, like the Huygens space probe's video feed, highlighting the thread's relevance in an era of remote collaboration and low-latency streaming.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
35m
Peak period
127
0-6h
Avg / period
26.7
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 23, 2025 at 1:00 PM EST
10 days ago
Step 01 - 02First comment
Dec 23, 2025 at 1:35 PM EST
35m after posting
Step 02 - 03Peak activity
127 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 26, 2025 at 1:28 AM EST
7 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Thinks: why not send text instead of graphics, then? I'm sure it's more complicated than that...
Look at the end of the video, the photometry data count stops at "7996 kbytes received"(!)
> "Turns out, 40Mbps video streams don’t appreciate 200ms+ network latency. Who knew. “Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage"
Who could do anything useful with 10Mbps. :/
[1] https://en.wikipedia.org/wiki/File:Huygens_descent.ogv
- https://news.ycombinator.com/item?id=9954870
- https://phoboslab.org/log/2015/07/play-gta-v-in-your-browser...
That's not enough.
Corporate networks also love to MITM their own workstations and reinterpret http traffic. So, no WebSockets and no Server-Side Events either, because their corporate firewall is a piece of software no one in the world wants and everyone in the world hates, including its own developers. Thus it only supports a subset of HTTP/1.1 and sometimes it likes to change the content while keeping Content-Length intact.
And you have to work around that, because IT dept of the corporation will never lift restrictions.
I wish I was kidding.
I try to remember many environments once likely supported Flash.
If you wanna kill corporate IT, you have to kill capitalism first.
playing devil's advocate for a second, but corpIT is also working with morons as employees. most draconian rules used by corpIT have a basis in at least one real world example. whether that example happened directly by one of the morons they manage or passed along from corpIT lore, people have done some dumb as things on corp networks.
I would say the problem in the picture is your belief that corporate IT is introducing technical impediments against every instance of stupidity. I bet there's loads of stupidity they don't introduce technical impediments against. It would just not meet the cost-benefit analysis to spend thousands of tech man-hours introducing a new impediment that didn't cost the company much if any money.
Wherever Tech is a first class citizen and seat at the corporate table, it can be different.
They delegate that stuff. To the corporate IT department.
For example, we got dinged on an audit because instead of using RSA4096, we used ed25519. I kid you not, their main complaint was there wasn't enough bits which meant it wasn't secure.
Auditors are snake oil salesman.
And produce a piece of software no one in the world wants and everyone in the world hates. Yourself included.
Because otherwise people do dumb stuff like pasting proprietary designs or PII into deepseek
It's purely illusory security, that doesn't protect anything but does levy a constant performance tax on nearly every task.
I know that some guardrails and restrictions in a corporate setting can backfire. I know that onerous processes to get approval for needed software access can drive people to break the rules or engage in shadow IT. As a member of a firewall team, I did it myself! We couldn't get access to Python packages or PHP for a local webserver we had available to us from a grandfather clause. My team hated our "approved" Sharepoint service request system. So a few of us built a small web app with Bottle (single file web server microframework, no dependencies) and Bootstrap CSS and SQLite backend. Everyone who interacted with our team loved it. Had we more support from corporate it might have been a lot easier.
Good cybersecurity needs to work with IT to facilitate peoples' legitimate use cases, not stand in the way all the time just because it's easier that way.
But saying "corporate IT controls are all useless" is just as foolish to me. It is reasonable and moral for a business to put controls and visibility on what data is moving between endpoints, and to block unsanctioned behavior.
You and op can be right at the same time. You imply the rules probably help a lot even while imperfect. They imply that pretending rules alone are enough to be perfect is incomplete.
This is assuming the DLP service blocks the request, rather than doing something like logging it and reported to your manager and/or CIO.
>It's purely illusory security, that doesn't protect anything but does levy a constant performance tax on nearly every task.
Because you can't ask deepseek to extract some unstructured data for you? I'm not sure what the alternative is, just let everyone paste info into deepseek?
Unless the corporation is 100% in-office, I’d wager they do in fact make exceptions - otherwise they wouldn’t have a working videoconferencing system.
The challenge is getting corporate insiders to like your product enough to get it through the exception process (a total hassle) when the firewall’s restrictions mean you can’t deliver a decent demo.
Split tunnelling means the UDP packets just go through the normal internet.
It's not usually IT idiocy, that usually comes from higher up cosplaying their inner tech visionaries.
The corporate firewall debate came up when we considered websockets at a previous company. Everyone has parroted the same information for so long that it was just assumed that websockets and corporate firewalls were going to cause us huge problems.
We went with websockets anyway and it was fine. Almost no traffic to the no-websockets fallback path, and the traffic that did arrive appeared to be from users with intermittent internet connections (cellular providers, foreign countries with poor internet).
I'm 100% sure there are still corporate firewalls out there blocking or breaking websocket connections, but it's not nearly the same problem in 2025 as it was in 2015.
Then we ran into a network where WebSockets were blocked, so we switched to streaming http.
No trouble with streaming http yet.
thanks, i had repressed that memory
Request lives for longer than 15 sec? Fuck you.
Request POSTs some JSON? Maybe fuck you just a little bit, when we find certain strings in the payload. We won't tell you which though.
I've had similar experiences in the past when trying to do remote desktop streaming for digital signage (which is not particularly demanding in bandwidth terms). Multicast streaming video was the most efficent, but annoying to decode when you dropped data. I now wonder how far I could have gone with JPEGs...
I love the style of this blog-post, you can really tell that Luke has been deep down in the rabbit hole, encountered the Balrog and lived to tell the tale.
You can have still have weird broken stallouts though.
I dunno, this article has some good problem solving but the biggest and mostly untouched issue is that they set the h.264 bandwidth too high. H.264 can do a lot better than JPEG with a lot less bandwidth. But if you lock it at 40Mbps of course it's flaky. Try 1Mbps and iterate from there.
And going keyframe-only is the opposite of how you optimize video bandwidth.
From the article:
“Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.
10Mbps is still way too high of a minimum.
And it would not be blocky garbage, it would still look a lot better than JPEG.
For mostly-static content at 4fps you can cut a lot more bitrate corners before it looks bad.
> And 2-3 JPEGs per second won't even look good at 1Mbps.
Unqualified claims like these are utterly meaningless. It depends too much on exactly what you're doing, some sorts of images will compress much better than others.
When you try and use h264 et al at low latency you have to get rid of a lot of optimisations to encode it as quickly as possible. I also highly suspect the vaapi encoder is not very good esp at low bitrates.
I _think_ moonlight also forces CBR instead of VBR, which is pretty awful for this use case - imagine you have 9 seconds of 'nothing changing' and then the window moves for 0.25 seconds. If you had VBR the encoder could basically send ~0kbit/sec apart from control metadata, and then spike the bitrate up when the window moved (for brevity I'm simplifying here, it's more complicated than this but hopefully you get the idea).
Basically they've used the wrong software entirely. They should try and look at xrdp with x264 as a start.
JPEG is nice and simple, there is a canonical way to encode an image and most encoders will produce (more or less) the same result when given the same quality settings. Some encoders (like mozjpeg) do go a bit further to produce better compression, but most people don’t use them.
With h264, the standard essentially just says how decompressors should work, and it’s up to the individual encoders to work out to make best use of the available functionality for their intended use case. I’m not sure any encoder uses the full functionality (x264 refuses to use arbitrary frame order without b-frames, and I haven’t found an encoder that takes advantage of that).
I’m guessing moonlight makes the assumption that most of its compression will come from motion prediction, and then takes massive shortcuts when encoding iframes.
Video players used to call it buffering, and resolving it was called buffering issues.
Players today can keep an eye on network quality while playing too, which is neat.
The idea is that if the fancy system works well on connection A and works poorly on connection B, what are the differences and how can we modify the system so that A and B are the same from it's perspective.
smaller thing: many, many, moons ago, I did a lot of work with H.264. "A single H.264 keyframe is 200-500KB." is fantastical.
Can't prove it wrong because it will be correct given arbitrary dimensions and encoding settings, but, it's pretty hard to end up with.
Just pulled a couple 1080p's off YouTube, biggest I-frame is 150KB, median is 58KB (`ffprobe $FILE -show_frames -of compact -show_entries frame=pict_type,pkt_size | grep -i "|pict_type=I"`)
Maybe because the basic frequency transform is 4x4 vs 8x8 for JPG?
https://en.wikipedia.org/wiki/Teleprinter
https://developers.google.com/speed/webp/docs/webp_study
https://caniuse.com/jpegxl
If having native support in a web browser is important, though, then yes, WebP is a better choice (as is JPEG).
This would really cut down on the bandwidth of static coding terminals where 90% of screen is just cursor flashing or small bits of text moving.
If they really wanted to be ambitious they could also detect scrolling and do an optimization client-side where it translates some of the existing areas (look up CopyRect command in VNC).
Also... I get that the dumb solution to "ugly text at low bitrates" is "make the bitrate higher." But still, nobody looked at a 40M minimum and wondered if they might be looking at this problem from the wrong angle entirely?
I spent some time compiling the "new" xrdp with x264 and it is incredibly good, basically cannot really tell that I'm remote desktoping.
The bandwidth was extremely low as well. You are correct on that part, 40mbit/sec is nuts for high quality. I suspect if they are using moonlight it's optimized for extremely low latency at the expense of bandwidth?
I worked on a project that started with VNC and had lots of problems. Slow connect times and backpressure/latency. Switching to neko was quick/easy win.
An extension was introduced for continuous updates that allows the server to push frames without receiving requests, so this isn't universally true for all RFB (VNC) software. This is implemented in TigerVNC and noVNC to name a few.
Of course, continuous updates have the buffer-bloat problem that we're all discussing, so they also implemented fairly complex congestion control on top of the whole thing.
Effectively, they just moved the role of congestion control over to the server from the client while making things slightly more complicated.
Why not just send text? Why do you need video at all?
> You’re watching the AI type code from 45 seconds ago > > By the time you see a bug, the AI has already committed it to main > > Everything is terrible forever
Is this satire? I mean: if the solution for things to not be terrible forever consists in catching what an AI is doing in 45 seconds (!) before the AI commits to trunk, I'm sorry but you should seriously re-evaluate your life plans.
And I wonder how many other massive issues are being committed to main, but would take longer to reason out, but you're already looking at the next 45-second shallow bug.
This has to be a joke, right?
(Although the fact they decided to use Moonlight in an enterprise product makes me wonder if their product actually was vibe coded)
There are usage cases where you might want lossy PNG over other formats; one is for still captures of 2d animated cartoon content, where H.264 tended to blur the sharp edges and flat color areas and this approach can compensate for that.
> We set GOP to 60 (one keyframe per second at 60fps). We tested.
Why muck around with P-frames and keyframes? Just make your video 1fps.
> Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.
10 Mbps is way too much. I occasionally watch YouTube videos where someone writes code. YouTube serves me the video at way less than 1Mbps. I did a quick napkin math for a random coding video and it was 0.6Mbps. It’s not blocky garbage at all.
That's my takeaway from this too. I think they tried the first thing the LLM suggested, it didn't work, they asked the LLM to fix it, and ended up with this crap. They never tried to really understand the problems they were facing.
Video is really fiddly. You have all sorts of parameters to fiddle with. If you don't dig into that and figure out what tradeoffs you need to make, you'll easily end up in the position where checks notes you think you need 40Mbps for 1080p video and 10Mbps is just too shitty.
There's various points in the article where they talk about having 30 seconds of latency. Whatever's causing this, this is a solved problem. We all have experience dealing with video teleconferencing, this isn't anything new, it's nothing special, they're just doing it wrong. They say it doesn't work because of corporate network policy, but we all use Teams or Slack.
I think you're right. They just did a bunch of LLM slop and decided to just send it. At no point did they understand any of their problems any deeper than the LLM tried to understand the problem.
But it's really not! Not for "Tweak a few of the default knobs for your use case".
It takes five minutes to play around with whatever FFMPEG gui front end (like even OBS) to get some intuition about those knobs.
Like, people stream coding all the time with OBS itself.
Every twitch streamer and Youtube creator figured out video encoding options, why couldn't they?
They are using a copy of a game streaming code base for this, which is entirely the opposite set of optimizations they should have sought out.
Like, this is rank incompetence. Your average influencer knows more about video encoding than these people. So much for LLMs helping people learn!
My experience is that real-time hardware encoding is way worse quality than offline encoding (what YouTube does when you upload a video).
WebRTC over UDP is one choice for lossy situations. Media over Quic might be another (is the future here?), and it might be more enterprise firewall friendly since HTTP3 is over Quic.
Good engineering: when you're not too proud to do the obvious, but sort of cheesy-sounding solution.
The standard supports adaptive bit rate playback so you can provide both low quality and high quality videos and players can switch depending on bandwidth available.
Or is it intra-only H.264?
I mean, none of this is especially new. It's an interesting trick though!
JPEG is extremely efficient to [de/en]code on modern CPUs. You can get close to 1080p60 per core if you use a library that leverages SIMD.
I sometimes struggle with the pursuit of perfect codec efficiency when our networks have become this fast. You can employ extremely half-assed compression and still not max out a 1gbps pipe. From Netflix & Google's perspective it totally makes sense, but unless you are building a streaming video platform with billions of customers I don't see the point.
Bargaining.
I wonder if they just tried restarting the stream at a lower bitrate once it got too delayed.
The talk about how the images looks more crisp at a lower FPS is just tuning that I guess they didn't bother with.
You can run all WebRTC traffic over a single port. It’s a shame you spent so much time/were frustrated by ICE errors
That’s great you got something better and with less complexity! I do think people push ‘you need UDP and BWE’ a little too zealously. If you have a homogeneous set of clients stuff like RTMP/Websockets seems to serve people well
Screenshot once per second. Works everywhere.
I’m still waiting for mobile screenshare api support, so I could quickly use it to show stuff from my phone to other phones with the QR link.
Clause code built it in 5 minutes and it works perfect.
I'm not sure if they are making a joke about JPEG screenshots.
So only plausible thing to do was pre-build html pages for content pages and let load angular’s JS take its time to load ( for ux functionality). It looked like page flickered when JS loads for the first time but we solved the search engine problem.
I started the first ISP in my area. We had two T1s to Miami. When HD audio and the rudiments of video started to increase in popularity, I'd always tell our modem customers, "A few minutes of video is a lifetime of email. Remember how exciting email was?"
For a fast start of the video, reverse the implementation: instead of downgrading from Websockets to polling when connection fails, you should upgrade from polling to Websockets when the network allows.
Socket.io was one of the first libraries that did that switching and had it wrong first, too. Learned the enterprise network behaviour and they switched the implementation.
156 more comments available on Hacker News