Chaining Ffmpeg with a Browser Agent
Postedabout 2 months agoActiveabout 2 months ago
100x.botTechstoryHigh profile
calmmixed
Debate
60/100
FfmpegVideo EditingAI-Assisted Tools
Key topics
Ffmpeg
Video Editing
AI-Assisted Tools
The article discusses chaining FFmpeg with a browser agent to simplify video editing, but the discussion reveals mixed opinions on the usefulness and complexity of FFmpeg.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
34m
Peak period
22
0-2h
Avg / period
5.6
Comment distribution56 data points
Loading chart...
Based on 56 loaded comments
Key moments
- 01Story posted
Nov 4, 2025 at 7:52 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 4, 2025 at 8:27 AM EST
34m after posting
Step 02 - 03Peak activity
22 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 5, 2025 at 7:14 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45810430Type: storyLast synced: 11/20/2025, 1:54:04 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The article states that whatever the article is trying to describe "Takes about ~20-30 mins. The cognitive load is high....". while their literal actual step of "Googling "ffmpeg combine static image and audio."" gives you the literal command you need to run from a known source (superuser.com sourced from ffmpeg wiki).
Anyone even slightly familiar with ffmpeg should be able to produce the same result in minutes. For someone who doesn't understand what ffmpeg is the article means absolutely nothing. How does a "no coder" understand what a "agent in a sandboxed container" is?
we have our designer/intern in our minds who creates shorts, adds subtiles, crops them,and merges the audio generated. He is aware of ffmpeg and prefers using a SaaS UI on top of it.
However, we see him hanging out on chatgpt, or gemini all the time. He is literally the no coder we have in mind.
We just combined his type what you want + ffmpeg workflows.
He does use davinci resolve but only for 2.
NLEs make ffmpeg a standalone yet easy to use tool.
Not denying that major heavy lifting is done by the NLE. We go a step ahead and make it embeddable in a larger workflow.
This hasn't solved the problem of sometimes needing to do new things, but it at least gives me a map to remind me of the parts of the rabbithole I've explored before.
`gst-launch-1.0 filesrc ! qt4demux ! matroskamux ! filesink...` people would be less frustrated maybe?
People would also learn a little more and be less frustrated when conversation about container/codec/colorspace etc... come up. Each have a dedicated element and you can better understand its I/O
Here making ffmpeg as "just another capability" allows it to be stitched together in workflows
https://ffmpeg.org/ffmpeg-filters.html#toc-Filtergraph-synta...
"A special syntax implemented in the ffmpeg CLI tool allows loading option values from files. This is done be prepending a slash ’/’ to the option name, then the supplied value is interpreted as a path from which the actual value is loaded."
For how critical that was to getting over my ffmpeg hump, I wish it was not buried halfway through the documentation, but also, I don't know where else it would go.
And just because I am very proud of my accomplishment here is the ffmpeg side of my project, motion detection using mainly ffmpeg, there is some python glue logic to watch stdout for the events but all the tricky bits are internal to ffmpeg.
The filter(comments are added for audience understanding):
and the ffmpeg evocation:Like the Before vs after section doesn't even seem to create the same thing, the before has no speedup, the after does.
In the end it seems they basically created a few services ("recipes") that they can reuse to do simple stuff like speed-up 2x or combine audio / video or whatever
Or you could go one step further and create a special workflow which would allow you to define some inputs and iterate with an LLM until the user gets what he wants but for this you would need to generate outputs and have the user validate what the LLM has created before finally saving the recipe.
FFmpeg has complex syntax because it’s dealing with the _complexity of video_. I agree with everyone about knowing (and helping create or contribute to) our tools.
Today I largely forget about the _legacy_ of video, the technical challenges, and how critical it was to get it right.
There are an incredible number of output formats and considerations for _current_ screens (desktop, tablet, mobile, tv, etc…). Then we have a whole other world on the creation side for capture, edit, live broadcast…
On legacy formats it used to be so complex with standards, requirements, and evolving formats. Today, we don’t even think about why we have 29.97fps around? Interlacing?
We have a mix of so many incredible (and sometimes frustrating) codecs, needs and final outputs, so it’s really amazing the power we have with a tool like FFmpeg… It’s daunting but really well thought out.
So just a big thanks to the FFmpeg team for all their incredible work over the years…
It's complexity paired with bad design, making the situation worse than it could be.
It's dealing with 3D data (more if you count audio or other tracks) and multi-dimensional transforms from a command line.
It works 99% of the time for my use case.
This is a nice resource: https://amiaopensource.github.io/ffmprovisr/
And also I've written this cheatsheet, which is designed to be used alongside an LLM: https://github.com/rendi-api/ffmpeg-cheatsheet
Let me know if you're interested in more resources
Can't promise it'll be soon but I may be able to expand on a couple of your repo's "possible future topics list" items.
I've been working on a personal project involving doing object detection on multiple camera feed inputs that all have different resolutions, frame rates, and encodings and sending a single consolidated and annotated feed to a remote streaming service.
That sent me down a really interesting rabbit hole and I've got tons of notes and links along with some Gemini chats that I'm gonna go through and see if there's anything there that might be worth including.
https://youtu.be/9kaIXkImCAM
- For one-offs, you would just use a GUI.
- For regular edits where you want creative control, you would use a NLE GUI.
- For regular edits where you want consistency, you would have a limited GUI without access to ffmpeg options.
CLI/prompt-based editing for a visual medium is how a programmer might approach editing but no creative…
-filter_complex_script is a thing
I can ask it to orient people the right way, crop to the important parts, etc. and it will figure out what "the right way", "the important parts", etc. are. Sometimes I have to give it some light hints like "extract n frames from before y to figure out things", but most of the time it just does it.
Claude Code acts like a very general purpose agent for me. About the one thing that I have to manually do that I'm annoyed by is editing 360 videos into a flow. I'd like to be able to tell Claude Code to "follow my daughter as I dunk her in the pool" and stuff like that but I have to do that myself in the GoPro editor.