We Replaced H.264 Streaming with Jpeg Screenshots (and It Worked Better)

Posted10 days agoActive7 days ago

quesobob

517 points

316 comments

blog.helix.mlTech DiscussionstoryHigh profile

informativepositive

Debate

20/100

Video StreamingAI Image GenerationAI Performance Analysis

Key topics

Video Streaming

AI Image Generation

AI Performance Analysis

A team swapped H.264 streaming for JPEG screenshots to transmit robot code-writing sessions, and to their surprise, it worked better. Commenters were abuzz, with some pointing out that sending text instead of graphics would be more efficient, while others drew parallels with existing tools like asciinema, which records terminal sessions. The discussion sparked a lively debate about the merits of different approaches, with some revelations, such as the simplicity of MJPEG streaming, and nods to historical precedents, like an MPEG-1-based screen sharing experiment from a decade ago. As commenters dug deeper, they unearthed some fascinating analogies, like the Huygens space probe's video feed, highlighting the thread's relevance in an era of remote collaboration and low-latency streaming.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

35m

Peak period

127

0-6h

Avg / period

26.7

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 23, 2025 at 1:00 PM EST
10 days ago
Step 01
02First comment
Dec 23, 2025 at 1:35 PM EST
35m after posting
Step 02
03Peak activity
127 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 26, 2025 at 1:28 AM EST
7 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (316 comments)

Showing 160 comments of 316

robrain

10 days ago

2 replies

"Think “screen share, but the thing being shared is a robot writing code.”"

Thinks: why not send text instead of graphics, then? I'm sure it's more complicated than that...

bambax

10 days ago

1 reply

Yeah, I'm thinking the same thing. Capture the text somehow and send that, and reconstruct it on the other end; and the best part is you only need to send each new character, not the whole screen, so it should be very small and lightning fast?

Snild

10 days ago

1 reply

Sounds kind of like https://asciinema.org/ (which I've never used, but it seems cool).

ku1ik

10 days ago

Which features terminal live streaming since recently released 3.0 :)

jodrellblank

10 days ago

1 reply

Thinks: this video[1] is the processed feed from the Huygens space probe landing on Saturn's moon Titan circa 2005. Relayed through the Cassini probe orbiting Saturn, 880 million miles from the Sun. At a total mission cost of 3.25 billion dollars. This is the sensor data, altitude, speed, spin, ultra violet, and hundreds of photos. (Read the description for what the audio is encoding, it's neat!)

Look at the end of the video, the photometry data count stops at "7996 kbytes received"(!)

> "Turns out, 40Mbps video streams don’t appreciate 200ms+ network latency. Who knew. “Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage"

Who could do anything useful with 10Mbps. :/

[1] https://en.wikipedia.org/wiki/File:Huygens_descent.ogv

stefan_

10 days ago

This is a great new "you can land on the moon with 10 MHz".

petcat

10 days ago

3 replies

so did they reinvent mjpeg

rzzzt

10 days ago

1 reply

An MPEG-1-based screen sharing experiment appeared here 10 years ago:

- https://news.ycombinator.com/item?id=9954870

- https://phoboslab.org/log/2015/07/play-gta-v-in-your-browser...

dimatura

10 days ago

Yup, when reading this I immediately thought of jsmpeg, which I'm fond of.

scoopdewoop

10 days ago

I was blown away when I realized I could stream mjpeg from a raspberry pi camera with lower latency and less ceremony than everything I tried with webrtc and similar approaches.

ok123456

10 days ago

from first principles.

egorfine

10 days ago

9 replies

> The constraint that ruined everything: It has to work on enterprise networks. > You know what enterprise networks love? HTTP. HTTPS. Port 443. That’s it. That’s the list.

That's not enough.

Corporate networks also love to MITM their own workstations and reinterpret http traffic. So, no WebSockets and no Server-Side Events either, because their corporate firewall is a piece of software no one in the world wants and everyone in the world hates, including its own developers. Thus it only supports a subset of HTTP/1.1 and sometimes it likes to change the content while keeping Content-Length intact.

And you have to work around that, because IT dept of the corporation will never lift restrictions.

I wish I was kidding.

rcarmo

10 days ago

2 replies

They even break server-sent events (which is still my default for most interactive apps)

j45

10 days ago

There are other ways to make server-sent events work.

I try to remember many environments once likely supported Flash.

jamiesonbecker

10 days ago

how? cant just fall back to polling? (ie sse reconnect)

ris

10 days ago

3 replies

Corporate IT needs to die.

embedding-shape

10 days ago

2 replies

I think the general idea/flow of things is "numbers go up, until $bubble explodes, and we built up smaller things from the ground up, making numbers go up, bloating go up, until $bubble explodes..." and then repeat that forever. Seems to be the end result of capitalism.

If you wanna kill corporate IT, you have to kill capitalism first.

gspr

10 days ago

4 replies

I don't believe that. I don't necessarily love capitalism (though I can't say I see very many realistic better alternatives either), but if HN is full of people who could do corporate IT better (read: sanely), then the conclusion is just that corporate IT is run by morons. Maybe that's because the corporate owners like morons, but nothing about capitalism inherently makes it so.

dylan604

10 days ago

1 reply

> corporate IT is run by morons

playing devil's advocate for a second, but corpIT is also working with morons as employees. most draconian rules used by corpIT have a basis in at least one real world example. whether that example happened directly by one of the morons they manage or passed along from corpIT lore, people have done some dumb as things on corp networks.

mananaysiempre

10 days ago

1 reply

Yes, and the problem in that picture is the belief (on any possible level) that you can introduce technical impediments against every instance of stupidity one by one until morons are no longer able to stupid. Morons will always be able to stupid, and most push the impediments well past the point of diminishing returns.

KPGv2

10 days ago

> the problem in that picture is the belief (whichever level of the management hierarchy it comes from) that you can introduce technical impediments against every instance of stupidity one by one until morons are no longer able to stupid

I would say the problem in the picture is your belief that corporate IT is introducing technical impediments against every instance of stupidity. I bet there's loads of stupidity they don't introduce technical impediments against. It would just not meet the cost-benefit analysis to spend thousands of tech man-hours introducing a new impediment that didn't cost the company much if any money.

mananaysiempre

10 days ago

I’d say there’s nothing inherently capitalist about stupid large bureaucracies (but I repeat myself) spending money in stupid ways. Military bureaucracies in capitalist countries do it. Military bureaucracies in socialist countries did it. Everything else in end-stage socialist countries did it too. I’m sorry, it’s not the capitalism—things’d be much easier if it were.

layer8

10 days ago

Apparently capitalism doesn’t pay enough for corporate IT admin jobs.

KPGv2

10 days ago

It's because corporate IT has to service non-tech people, and non-tech people get pwned by tech savvy nogoodniks. So the only sane behavior of corporate IT is to lock everything down and then whitelist things rarely.

mananaysiempre

10 days ago

1 reply

queenkjuul

9 days ago

1 reply

Maybe military people are just uniquely stupid

mananaysiempre

9 days ago

Not at all, no. I gave that example because, first, even in a profoundly capitalist country (whatever that means) the military itself is not particularly motivated by profit; and second, because it’s one of the few bureaucratic organizations that will not (be allowed to) collapse under the weight of its own inefficiencies and so easily grows much larger than is othetwise typical.

j45

10 days ago

2 replies

It's not corporate IT's fault, it's usually corporate leaderships fault who often cosplay leading technology and not understanding it.

Wherever Tech is a first class citizen and seat at the corporate table, it can be different.

michaelt

10 days ago

1 reply

Believe me, the average Fortune 500 CEO does not know or care what “SSL MITM” is, or whether passwords should contain symbols and be changed monthly.

They delegate that stuff. To the corporate IT department.

esseph

10 days ago

2 replies

[delayed]

cogman10

10 days ago

This is where the problems come from. Auditors are definitely what ultimately causes IT departments to make dumb decisions.

For example, we got dinged on an audit because instead of using RSA4096, we used ed25519. I kid you not, their main complaint was there wasn't enough bits which meant it wasn't secure.

Auditors are snake oil salesman.

RankingMember

10 days ago

This is 100% it- the auditor is confirming the system is configured to a set of requirements, and those requirements are rarely in lockstep with actual best practices.

pmontra

10 days ago

Sometimes they have checkboxes to tick in some compliance document and they must run the software that let them tick those checkboxes, no exceptions, because those compliances allow the company to be on the market. Regulatory captures, etc.

convolvatron

10 days ago

1 reply

where else are you going to find customers that are so sticky it will take years for them to select another solution regardless of how crappy you are. that will staff teams to work around your failures. who, when faced with obvious evidence of the dysfunction of your product, will roundly blame themselves for not holding it properly. gaslight their own users. pay obscene amounts for support when all you provide is a voice mailbox that never gets emptied. will happily accept your estimate about the number of seats they need. when holding a retro about your failure will happily proclaim that there wasn't anything _they_ could have done, so case closed.

egorfine

10 days ago

Oh yes you can absolutely profit off that but you have to be dead inside a little bit.

And produce a piece of software no one in the world wants and everyone in the world hates. Yourself included.

gruez

10 days ago

1 reply

>And you have to work around that, because IT dept of the corporation will never lift restrictions.

Because otherwise people do dumb stuff like pasting proprietary designs or PII into deepseek

kbelder

10 days ago

2 replies

Oh, they'll do that anyway, once they find the workaround (Oh... you can paste a credit card if you put periods instead of dashes! Oh... I have to save the file and do it from my phone! Oh... I'll upload it as a .txt file and change the extension on the server!)

It's purely illusory security, that doesn't protect anything but does levy a constant performance tax on nearly every task.

unethical_ban

10 days ago

3 replies

What's the term for the ideology that "laws are silly because people sometimes break them"?

jeltz

10 days ago

1 reply

No, posting stuff into Deepseek is banned. The corporate firewall is like putting a camera in your home because you may break the law.

unethical_ban

10 days ago

1 reply

Disclaimer: I work in corporate cybersecurity.

I know that some guardrails and restrictions in a corporate setting can backfire. I know that onerous processes to get approval for needed software access can drive people to break the rules or engage in shadow IT. As a member of a firewall team, I did it myself! We couldn't get access to Python packages or PHP for a local webserver we had available to us from a grandfather clause. My team hated our "approved" Sharepoint service request system. So a few of us built a small web app with Bottle (single file web server microframework, no dependencies) and Bootstrap CSS and SQLite backend. Everyone who interacted with our team loved it. Had we more support from corporate it might have been a lot easier.

Good cybersecurity needs to work with IT to facilitate peoples' legitimate use cases, not stand in the way all the time just because it's easier that way.

But saying "corporate IT controls are all useless" is just as foolish to me. It is reasonable and moral for a business to put controls and visibility on what data is moving between endpoints, and to block unsanctioned behavior.

unethical_ban

9 days ago

Gotta wonder who objects to this and why, and if they have any experience managing IT or business.

collingreen

10 days ago

I don't think that's a good read if the post you're implying this at. I think a more charitable read would be something like "people break rules for convenience so if your security relies on nobody breaking rules then you don't have thorough security".

You and op can be right at the same time. You imply the rules probably help a lot even while imperfect. They imply that pretending rules alone are enough to be perfect is incomplete.

pigeonhole123

10 days ago

It's called black and white thinking

gruez

10 days ago

>Oh, they'll do that anyway, once they find the workaround ...

This is assuming the DLP service blocks the request, rather than doing something like logging it and reported to your manager and/or CIO.

>It's purely illusory security, that doesn't protect anything but does levy a constant performance tax on nearly every task.

Because you can't ask deepseek to extract some unstructured data for you? I'm not sure what the alternative is, just let everyone paste info into deepseek?

michaelt

10 days ago

1 reply

> And you have to work around that, because IT dept of the corporation will never lift restrictions.

Unless the corporation is 100% in-office, I’d wager they do in fact make exceptions - otherwise they wouldn’t have a working videoconferencing system.

The challenge is getting corporate insiders to like your product enough to get it through the exception process (a total hassle) when the firewall’s restrictions mean you can’t deliver a decent demo.

darthwalsh

10 days ago

I think our corporate VPN doesn't send zoom video traffic through the VPN. As you enabled the VPN, you didn't see any dropped frames.

Split tunnelling means the UDP packets just go through the normal internet.

j45

10 days ago

1 reply

At the same time, enterprise is where the revenue is.

isoprophlex

10 days ago

1 reply

[delayed]

j45

10 days ago

Often, enterprises create moats and then profit from them.

It's not usually IT idiocy, that usually comes from higher up cosplaying their inner tech visionaries.

Aurornis

10 days ago

1 reply

> So, no WebSockets

The corporate firewall debate came up when we considered websockets at a previous company. Everyone has parroted the same information for so long that it was just assumed that websockets and corporate firewalls were going to cause us huge problems.

We went with websockets anyway and it was fine. Almost no traffic to the no-websockets fallback path, and the traffic that did arrive appeared to be from users with intermittent internet connections (cellular providers, foreign countries with poor internet).

I'm 100% sure there are still corporate firewalls out there blocking or breaking websocket connections, but it's not nearly the same problem in 2025 as it was in 2015.

kevlened

10 days ago

I've had to switch from SSE to WebSockets to navigate a corporate network (the entire SSE would have to close before the user received any of the response).

Then we ran into a network where WebSockets were blocked, so we switched to streaming http.

No trouble with streaming http yet.

thescriptkiddie

10 days ago

1 reply

> it likes to change the content while keeping Content-Length intact

thanks, i had repressed that memory

egorfine

10 days ago

Please suffer.

streptomycin

10 days ago

Back when I had a job at a big old corporation, a significant part of my value to the company was that I knew how to bypass their shitty MITM thing that broke tons of stuff (including our own software that we wrote).

isoprophlex

10 days ago

Request URL has a query parameter with more than 64 characters? Fuck you.

Request lives for longer than 15 sec? Fuck you.

Request POSTs some JSON? Maybe fuck you just a little bit, when we find certain strings in the payload. We won't tell you which though.

rcarmo

10 days ago

1 reply

This was the most entertaining thing I read all day. Kudos.

I've had similar experiences in the past when trying to do remote desktop streaming for digital signage (which is not particularly demanding in bandwidth terms). Multicast streaming video was the most efficent, but annoying to decode when you dropped data. I now wonder how far I could have gone with JPEGs...

j45

10 days ago

1 reply

If playing with Chromecast types multicast or streaming one frame at a time manually worked pretty good.

rcarmo

9 days ago

I have already started hacking at a proof of concept… let’s see how fun it turns out to be.

binocarlos

10 days ago

2 replies

> I mashed F5 like a degenerate.

I love the style of this blog-post, you can really tell that Luke has been deep down in the rabbit hole, encountered the Balrog and lived to tell the tale.

KptMarchewa

9 days ago

That's amazing! You're going to see a lot more of those AI generated blogs in the coming century!

jamiesonbecker

10 days ago

I like it too, even though it has that distinctive odor of being totally written by chatgpt though. (a bit distracting tbh)

lostmsu

10 days ago

2 replies

[delayed]

bobmcnamara

10 days ago

Yeah, monitor the send queue length and reduce bit rate accordingly.

brigade

10 days ago

That would require that they understand the protocol they're using to send H.264 frames

epx

10 days ago

1 reply

Would HLS be an option? I publish my home security cameras via WebRTC, but I keep HLS as a escape for hotel/cafe WiFi situations (MediaMTX makes it easy to offer both).

originalvichy

10 days ago

1 reply

Thought of the same. I have not set it up outside of hobby projects, but it should work over HTTP as it says on the box, even inside a strict network?

epx

10 days ago

Yes, it is strictly HTTP, not even persistent connections required.

Dylan16807

10 days ago

2 replies

> When the network is bad, you get... fewer JPEGs. That’s it. The ones that arrive are perfect.

You can have still have weird broken stallouts though.

I dunno, this article has some good problem solving but the biggest and mostly untouched issue is that they set the h.264 bandwidth too high. H.264 can do a lot better than JPEG with a lot less bandwidth. But if you lock it at 40Mbps of course it's flaky. Try 1Mbps and iterate from there.

And going keyframe-only is the opposite of how you optimize video bandwidth.

HelloUsername

10 days ago

4 replies

> Try 1Mbps and iterate from there.

From the article:

“Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

Dylan16807

10 days ago

2 replies

Rejecting it out of hand isn't actually trying it.

10Mbps is still way too high of a minimum.

And it would not be blocky garbage, it would still look a lot better than JPEG.

vscode-rest

10 days ago

2 replies

1Mbps for video is rule of thumb I use. Of course that will depend on customer expectations. 500K can work, but it won’t be pretty.

Dylan16807

10 days ago

3 replies

For normal video I think that's a good rule of thumb.

For mostly-static content at 4fps you can cut a lot more bitrate corners before it looks bad.

vscode-rest

10 days ago

Still images will use much more BW for the same perceived quality in my experience.

qilo

9 days ago

For mostly static content like screencasts by dropping duplicate frames and producing variable framerate h.264 yuv444 videos with lossless encoding I was getting <100 kbps files for 1024x768 resolution more than a decade ago.

jcalvinowens

10 days ago

>> 10Mbps is still way too high of a minimum. It's more than YouTube uses for full motion 4k.

> And 2-3 JPEGs per second won't even look good at 1Mbps.

Unqualified claims like these are utterly meaningless. It depends too much on exactly what you're doing, some sorts of images will compress much better than others.

hn_acker

9 days ago

I can confirm that 500Kbps is not pretty. But when I'm sending screen recordings where text doesn't have to be readable (or isn't present), I try to approach 500K from above.

TiredOfLife

9 days ago

1 reply

Youtube 4k uses VP9 and AV1 codecs that are multiple generations ahead of H.264

antonkochubey

9 days ago

1 reply

VP9 is inferior to H.264

Dylan16807

9 days ago

VP8 sucked, VP9 is somewhat better than H.264, and AV1 is a lot better.

martinald

10 days ago

1 reply

The problem is I think that they are using moonlight which is "designed" to stream games at very low latency. I very much doubt that people need <30ms response times watching an agent terminal or whatever they are showing!

When you try and use h264 et al at low latency you have to get rid of a lot of optimisations to encode it as quickly as possible. I also highly suspect the vaapi encoder is not very good esp at low bitrates.

I _think_ moonlight also forces CBR instead of VBR, which is pretty awful for this use case - imagine you have 9 seconds of 'nothing changing' and then the window moves for 0.25 seconds. If you had VBR the encoder could basically send ~0kbit/sec apart from control metadata, and then spike the bitrate up when the window moved (for brevity I'm simplifying here, it's more complicated than this but hopefully you get the idea).

Basically they've used the wrong software entirely. They should try and look at xrdp with x264 as a start.

phire

9 days ago

Yeah, i think the author has been caught out by the fact that there simply isn’t a canonical way to encode h264.

JPEG is nice and simple, there is a canonical way to encode an image and most encoders will produce (more or less) the same result when given the same quality settings. Some encoders (like mozjpeg) do go a bit further to produce better compression, but most people don’t use them.

With h264, the standard essentially just says how decompressors should work, and it’s up to the individual encoders to work out to make best use of the available functionality for their intended use case. I’m not sure any encoder uses the full functionality (x264 refuses to use arbitrary frame order without b-frames, and I haven’t found an encoder that takes advantage of that).

I’m guessing moonlight makes the assumption that most of its compression will come from motion prediction, and then takes massive shortcuts when encoding iframes.

cyberrock

9 days ago

10Mbits is more than the maximum ingest bitrate allowed on Twitch. Granted, anyone who watches a recent game or an IRL stream there might tell you that it should go up to 12 or 15, but I don't think an LLM interface should have trouble. This feels like someone on a 4K monitor defeating themselves through their hedonic treadmill.

brigade

10 days ago

Proper rate control for such realtime streaming would also lower framerate and/or resolution to maintain the best quality they can over dynamic network conditions and however little bandwidth they have. The fundamental issue is that they don't have this control loop at all, and are badly simulating it client-side by polling JPEGs.

j45

10 days ago

It might be possible to buffer and queue jpegs for playback as well to help with weird broken stall outs.

Video players used to call it buffering, and resolving it was called buffering issues.

Players today can keep an eye on network quality while playing too, which is neat.

ddtaylor

10 days ago

1 reply

A very stupid hack that can work to "fix" this could be to buffer the h264 stream at the data center using a proxy before sending it to the real client, etc.

jamiesonbecker

10 days ago

1 reply

One of the big issues was latency.

ddtaylor

10 days ago

Yes, but the real issue (IMO) is that something is causing an avalanche of some kind. You would much rather have a consistent 100ms increased latency for this application if it works much better for users with high loss, etc. Also, to be clear, this is basically just a memory cache. I doubt it would add any "real" latency like that.

The idea is that if the fancy system works well on connection A and works poorly on connection B, what are the differences and how can we modify the system so that A and B are the same from it's perspective.

refulgentis

10 days ago

1 reply

The LinkedIn slop tone, random bolding, miscopied Markdown tables makes me invoke: "please read the copy you worked on with AI"

smaller thing: many, many, moons ago, I did a lot of work with H.264. "A single H.264 keyframe is 200-500KB." is fantastical.

Can't prove it wrong because it will be correct given arbitrary dimensions and encoding settings, but, it's pretty hard to end up with.

Just pulled a couple 1080p's off YouTube, biggest I-frame is 150KB, median is 58KB (`ffprobe $FILE -show_frames -of compact -show_entries frame=pict_type,pkt_size | grep -i "|pict_type=I"`)

jamiesonbecker

10 days ago

at least it had a minimum of Clause. Clause. Punchline.

escapecharacter

10 days ago

1 reply

I guess this is great as long as you don't worry about audio sync?

htrp

10 days ago

1 reply

at least the ai agents aren't talking back to us

lostmsu

10 days ago

You're behind by 1.5 years on that thought. They certainly can.

tverbeure

10 days ago

1 reply

I’m surprised that H264 I-frame only compresses less than JPG.

Maybe because the basic frequency transform is 4x4 vs 8x8 for JPG?

plorkyeran

10 days ago

Their h264 iframes were bigger than the jpegs because they told the h264 encoder to produce bigger images. If they had set it to produce images the same size as the jpegs it most likely would have resulted in higher quality.

dotancohen

10 days ago

2 replies

They're just streaming a video feed of an LLC running in a terminal? Why not stream the actual text? Or fetch it piecemeal over AJAX requests? They complain that corporate networks support only HTTPS and nothing else? Do they not understand what the first T stands for?

TZubiri

10 days ago

1 reply

Suppose an LLM opens a browser, or opens a corporate .exe and GUI and starts typing in there and clicking buttons.

worksonmine

10 days ago

1 reply

You don't give it a browser or buttons to click.

j-me

10 days ago

I think we've passed the Rubicon when it comes to that

eterm

10 days ago

Indeed, live text streaming is well over 100 years old:

https://en.wikipedia.org/wiki/Teleprinter

wewewedxfgdf

10 days ago

2 replies

webp is smaller than jpeg

https://developers.google.com/speed/webp/docs/webp_study

F3nd0

10 days ago

1 reply

… and JPEG XL is smaller than WebP.

wewewedxfgdf

10 days ago

1 reply

JPEG XL looks to have pretty poor support.

https://caniuse.com/jpegxl

F3nd0

10 days ago

Yes, though hopefully not for long; unfortunately not all codecs are given equal treatment...

If having native support in a web browser is important, though, then yes, WebP is a better choice (as is JPEG).

brigade

10 days ago

They actually really don't want the built-in mjpeg streaming; they want polling because that's the only way they could figure out how to implement backpressure to keep latency down.

adamjs

10 days ago

6 replies

They might want to check out what VNC has been doing since 1998– keep the client-pull model, break the framebuffer up into tiles and, when client requests an update, perform a diff against last frame sent, composite the updated tiles client-side. (This is what VNC falls back to when it doesn’t have damage-tracking from the OS compositor)

This would really cut down on the bandwidth of static coding terminals where 90% of screen is just cursor flashing or small bits of text moving.

If they really wanted to be ambitious they could also detect scrolling and do an optimization client-side where it translates some of the existing areas (look up CopyRect command in VNC).

djmips

10 days ago

2 replies

The blog post did smell of inexperience. Glad to hear there is other approaches - is something like that open source?

tombert

10 days ago

1 reply

I'm not sure; sometimes being an experienced dev gravitates you towards the lazy solutions that are "good enough". Senior engineers are often expected to work at a rate that precludes solving interesting problems, and so the dumber solution will often win; at least that's been my experience, and what I tell myself to go to sleep at night when I get told for the millionth time that the company can't justify formal verification.

djmips

10 days ago

I understand what you're saying and certainly I've come up against that myself. I didn't intend my comment to be super pejorative.

cogman10

10 days ago

Yup. Go look into tigervnc if you want to see the source. But also you can just search for "tigervnc h.264" and you'll see extensive discussions between the devs on h.264 and integrating it into tiger. This is something that people spent a LOT of brainpower on.

ryukoposting

10 days ago

1 reply

Of all the suggestions in the comments here, this seems like the best one to start with.

Also... I get that the dumb solution to "ugly text at low bitrates" is "make the bitrate higher." But still, nobody looked at a 40M minimum and wondered if they might be looking at this problem from the wrong angle entirely?

martinald

10 days ago

2 replies

In fairness VNC-style approaches are bloody awful even over my 2.5gbit/sec lan on very fast hardware. It just cannot do 4K well (not sure if they need 4k or not).

I spent some time compiling the "new" xrdp with x264 and it is incredibly good, basically cannot really tell that I'm remote desktoping.

The bandwidth was extremely low as well. You are correct on that part, 40mbit/sec is nuts for high quality. I suspect if they are using moonlight it's optimized for extremely low latency at the expense of bandwidth?

Scaevolus

10 days ago

Moonlight is mostly designed to stream your gaming desktop to a portable device or your TV at minimal latency and maximum quality within a LAN. For that, 40Mbps is quite reasonable. It's obviously absurd for mundane VNC/productivity workloads.

adastra22

9 days ago

They are streaming AI coding agents. They are not streaming 4K video.

Sean-Der

10 days ago

1 reply

https://github.com/m1k1o/neko before VNC check neko out.

I worked on a project that started with VNC and had lots of problems. Slow connect times and backpressure/latency. Switching to neko was quick/easy win.

majorchord

7 days ago

if you want something more lightweight... rustdesk has been great for me, it supports multiple adaptable video codecs and can optimize for latency vs image quality.

krater23

9 days ago

Maybe it would be easier to just USE VNC instead. But the mentioned they have written their software in Rust. Looks like nothig is good enough for Rust coders, they need to fail by reprogramming their things in Rust before they accept that there are still tools for exactly that.

klipklop

10 days ago

Copying how VNC does it is exactly how my first attempt would go. Seems odd to try something like Moonlight which is designed for low latency remote gameplay.

any1

9 days ago

Yes, in fact, the protocol states that the client can queue up multiple requests. The purpose of this is to fill up the gap created by the RTT. It is actually quite elegant in its simplicity.

An extension was introduced for continuous updates that allows the server to push frames without receiving requests, so this isn't universally true for all RFB (VNC) software. This is implemented in TigerVNC and noVNC to name a few.

Of course, continuous updates have the buffer-bloat problem that we're all discussing, so they also implemented fairly complex congestion control on top of the whole thing.

Effectively, they just moved the role of congestion control over to the server from the client while making things slightly more complicated.

MBCook

10 days ago

2 replies

So it’s video of an AI typing text?

Why not just send text? Why do you need video at all?

TacticalCoder

10 days ago

1 reply

You apparently need video for the 45 seconds window you then get before preventing catastrophic things to happen. From TFA:

> You’re watching the AI type code from 45 seconds ago > > By the time you see a bug, the AI has already committed it to main > > Everything is terrible forever

Is this satire? I mean: if the solution for things to not be terrible forever consists in catching what an AI is doing in 45 seconds (!) before the AI commits to trunk, I'm sorry but you should seriously re-evaluate your life plans.

kimixa

10 days ago

If you can realistically notice and reason out a bug within ~45 seconds of seeing the diff, then they are really shallow "dumb" bugs. The sort that even a junior would be expected to avoid.

And I wonder how many other massive issues are being committed to main, but would take longer to reason out, but you're already looking at the next 45-second shallow bug.

This has to be a joke, right?

bogwog

10 days ago

Why send anything at all if the AI isn't even good enough to solve their own problems?

(Although the fact they decided to use Moonlight in an enterprise product makes me wonder if their product actually was vibe coded)

sevensor

10 days ago

4 replies

No mention of PNGs? I don’t usually go to jpegs first for screenshots of text. Did png have worse compression? Burn more cpu? I’m sure there are good reasons, but it seems like they’ve glossed over the obvious choice here.

StilesCrisis

10 days ago

1 reply

PNGs are lossless so you can’t really dial up the compression. You can save space by reducing to 8-bit color (or grayscale!) but it’s basically the equivalent of raw pixels plus zlib.

vikingerik

10 days ago

PNG can be lossy. It can be done by first discarding some image detail, to make adjacent almost-matching pixel values actually match, to be more amenable to PNG's compression method. pngquant.org has a tool that does it.

There are usage cases where you might want lossy PNG over other formats; one is for still captures of 2d animated cartoon content, where H.264 tended to blur the sharp edges and flat color areas and this approach can compensate for that.

wewewedxfgdf

10 days ago

PNG is VERY slow compared to other formats. Not suitable for this sort of thing.

dimatura

10 days ago

PNGs of screenshots would probably compress well, and the quality to size ratio would definitely be better than JPG, but the size would likely still be larger than a heavily compressed JPG. And PNG encoding/decoding is relatively slow compared to JPG.

j45

10 days ago

PNGs likely perform great, existing enterprise network filters, browser controls, etc, might not, even with how old PNGs are now.

kccqzy

10 days ago

4 replies

There are so many things that I would have done differently.

> We set GOP to 60 (one keyframe per second at 60fps). We tested.

Why muck around with P-frames and keyframes? Just make your video 1fps.

> Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

10 Mbps is way too much. I occasionally watch YouTube videos where someone writes code. YouTube serves me the video at way less than 1Mbps. I did a quick napkin math for a random coding video and it was 0.6Mbps. It’s not blocky garbage at all.

mdavid626

10 days ago

1 reply

Setting to 1 FPS might not be enough. GOP or P frame setting needs to be adjusted to make every frame keyframe.

Dylan16807

10 days ago

1 reply

Why would you do that?

mdavid626

9 days ago

2 replies

1 FPS with GOP 60 might just simply not play in some players.

kccqzy

9 days ago

1 reply

You wouldn’t use 1fps in conjunction with GOP 60. The original article wanted exactly one key frame every 60 frames and the server drops all frames other than keyframes. I was pointing out that this is a roundabout way of achieving 1 fps.

mdavid626

9 days ago

I don’t disagree.

Dylan16807

9 days ago

Is this based on experience? I can't think of a reason for a decoder to care.

taberiand

10 days ago

1 reply

This blog post smells of LLM, both in the language style and the muddled explanations / bad technical justifications. I wouldn't be surprised if their code is also vibe coded slop.

nwallin

10 days ago

1 reply

> I wouldn't be surprised if their code is also vibe coded slop.

That's my takeaway from this too. I think they tried the first thing the LLM suggested, it didn't work, they asked the LLM to fix it, and ended up with this crap. They never tried to really understand the problems they were facing.

Video is really fiddly. You have all sorts of parameters to fiddle with. If you don't dig into that and figure out what tradeoffs you need to make, you'll easily end up in the position where checks notes you think you need 40Mbps for 1080p video and 10Mbps is just too shitty.

There's various points in the article where they talk about having 30 seconds of latency. Whatever's causing this, this is a solved problem. We all have experience dealing with video teleconferencing, this isn't anything new, it's nothing special, they're just doing it wrong. They say it doesn't work because of corporate network policy, but we all use Teams or Slack.

I think you're right. They just did a bunch of LLM slop and decided to just send it. At no point did they understand any of their problems any deeper than the LLM tried to understand the problem.

mrguyorama

9 days ago

>Video is really fiddly.

But it's really not! Not for "Tweak a few of the default knobs for your use case".

It takes five minutes to play around with whatever FFMPEG gui front end (like even OBS) to get some intuition about those knobs.

Like, people stream coding all the time with OBS itself.

Every twitch streamer and Youtube creator figured out video encoding options, why couldn't they?

They are using a copy of a game streaming code base for this, which is entirely the opposite set of optimizations they should have sought out.

Like, this is rank incompetence. Your average influencer knows more about video encoding than these people. So much for LLMs helping people learn!

kalleboo

9 days ago

> I occasionally watch YouTube videos

My experience is that real-time hardware encoding is way worse quality than offline encoding (what YouTube does when you upload a video).

jcelerier

10 days ago

One man's not-blocky-garbage is another's insufferable hell. Even at 4k I find YouTube quality to be just awful with artefacts everywhere.

avsn

10 days ago

We did something similar in one of the places I've worked at. We sent xy coordinates and pointer events from our frontend app to our backend/3d renderer and received JPEG frames back. All of that wrapped in protobuf messages and sent via WS connection. Surpassingly it kinda worked, not "60fps worked" though obviously.

algesten

10 days ago

WebSockets over TCP is probably always going to cause problems for streaming media.

WebRTC over UDP is one choice for lossy situations. Media over Quic might be another (is the future here?), and it might be more enterprise firewall friendly since HTTP3 is over Quic.

mannyv

10 days ago

Awesome!

Good engineering: when you're not too proud to do the obvious, but sort of cheesy-sounding solution.

laurencerowe

10 days ago

If you are ok with a second or so of latency then HTTP Live Streaming is likely the best bet. You simply serve the video chunks over HTTP so it should be just as compatible as the JPEG solution used here but provide 60fps video rather than crappy jpegs.

The standard supports adaptive bit rate playback so you can provide both low quality and high quality videos and players can switch depending on bandwidth available.

ErroneousBosh

10 days ago

So, they've invented MJPEG?

Or is it intra-only H.264?

I mean, none of this is especially new. It's an interesting trick though!

bob1029

10 days ago

> Why JPEGs Actually Slap

JPEG is extremely efficient to [de/en]code on modern CPUs. You can get close to 1080p60 per core if you use a library that leverages SIMD.

I sometimes struggle with the pursuit of perfect codec efficiency when our networks have become this fast. You can employ extremely half-assed compression and still not max out a 1gbps pipe. From Netflix & Google's perspective it totally makes sense, but unless you are building a streaming video platform with billions of customers I don't see the point.

asim

10 days ago

Wait, Luke Marsden wrote this. Kudos to him for doing something crazy. I tried this in 2019 as an experiment for live streaming and chewed up 100% of the CPU on my machine and the server. I could never figure it out without lag.

keepamovin

10 days ago

[delayed]

wood_spirit

10 days ago

A long time ago I was trying to get video multiplexing to work over mobile over 3G. We struggled with H264 which had broad enough hardware support but almost no tooling and software support on the phones we were targeting. Even with engineers from the phone manufacturer as liaison we struggled to get access to any kind or SDK etc. We ended up doing JPEG streaming instead, much like the article said. And it worked great but we discovered we were getting a fraction of the framerate reported in Flash players - the call to refresh the screen was async and the act of receiving and deciding the next frame staved the redraw so the phone spent more time receiving lots of frames but not showing them. Super annoying and I don’t think the project survived long enough for us to find a fix.

gametheory87

10 days ago

It’s always TCP_NODELAY seems relevant here: https://news.ycombinator.com/item?id=40310896

ddtaylor

10 days ago

> I mashed F5 like a degenerate

Bargaining.

jayd16

10 days ago

So they replaced a TCP connection with no congestion control with a sycnronous poll of an endpoint which is inherently congestion controlled.

I wonder if they just tried restarting the stream at a lower bitrate once it got too delayed.

The talk about how the images looks more crisp at a lower FPS is just tuning that I guess they didn't bother with.

Sean-Der

10 days ago

Doesn’t matter now, but what led you to TURN?

You can run all WebRTC traffic over a single port. It’s a shame you spent so much time/were frustrated by ICE errors

That’s great you got something better and with less complexity! I do think people push ‘you need UDP and BWE’ a little too zealously. If you have a homogeneous set of clients stuff like RTMP/Websockets seems to serve people well

karhuton

10 days ago

I made this because I got tired of screensharing issues in corporate environments: https://bluescreen.live (code via github).

Screenshot once per second. Works everywhere.

I’m still waiting for mobile screenshare api support, so I could quickly use it to show stuff from my phone to other phones with the QR link.

dataviz1000

10 days ago

I stream JPEG from CDP + Playwright in order to manually do 2FA in a service running in the cloud. I capture user events, pointer move, keypress, ect., in the webpage and Playwright proxies them on the server. The server streams with SSE and the browser uses POST for the events.

Clause code built it in 5 minutes and it works perfect.

I'm not sure if they are making a joke about JPEG screenshots.

praveen9920

10 days ago

This reminds me of the time we built a big angular3 codebase for a content platform. When we had to launch, the search engines were expecting content to be part of page html while we are calling APIs to fetch the content ( angular3 didn’t have server side rendering at that point)

So only plausible thing to do was pre-build html pages for content pages and let load angular’s JS take its time to load ( for ux functionality). It looked like page flickered when JS loads for the first time but we solved the search engine problem.

the8472

10 days ago

[delayed]

HocusLocus

10 days ago

This is a beautiful cope. Every time technology rolls out something that works great 90% of the time for 90% of the people, those 10%s pile up big time in support and lost productivity. You need functional systems that fall back gracefully to 1994 if necessary.

I started the first ISP in my area. We had two T1s to Miami. When HD audio and the rudiments of video started to increase in popularity, I'd always tell our modem customers, "A few minutes of video is a lifetime of email. Remember how exciting email was?"

hmontazeri

10 days ago

Another case of we’re going backwards. The boring stuff is what works every time…

Jakob

10 days ago

Yes, this is unfortunately still the way and was very common back when iOS Safari did not allow embedded video.

For a fast start of the video, reverse the implementation: instead of downgrading from Websockets to polling when connection fails, you should upgrade from polling to Websockets when the network allows.

Socket.io was one of the first libraries that did that switching and had it wrong first, too. Learned the enterprise network behaviour and they switched the implementation.

156 more comments available on Hacker News

View full discussion on Hacker News

ID: 46367475Type: storyLast synced: 12/26/2025, 6:00:27 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN