Ez FFmpeg
npmjs.comKey Features
Key Features
Tech Stack
> Write an ffmpeg command that implements the "bounce" effect: play from 0:00 to 0:03, then backwards from 0:03 to 0:00, then repeat 5 times.
ffmpeg -i input.mp4 \
-filter_complex "
[0:v]trim=0:3,setpts=PTS-STARTPTS[f];
[f]reverse[r];
[f][r]concat=n=2:v=1:a=0[b];
[b]loop=loop=4:size=150:start=0
" \
output.mp4Maybe this should be an AI reasoning test.
Here is what eventually worked, iirc (10 bounces):
ffmpeg -i input.mkv -filter_complex "split=2[fwd][rev_in]; [rev_in]reverse[rev]; [fwd][rev]concat=n=2,split=10[p1][p2][p3][p4][p5][p6][p7][p8][p9][p10]; [p1][p2][p3][p4][p5][p6][p7][p8][p9][p10]concat=n=10[outv]" -map "[outv]" -an output.mkv* - Just a few days ago I used ImageMagick for the first time in at least three years. I downloaded it just to find that I already had it installed.
You can’t verify LLM’s output. And thus, any form of trust is faith, not rational logic.
With an LLM’s output, it is short enough that I can* put in the effort to make sure it's not obliviously malicious. Then I save the output as an artefact.
* and I do put in this effort, unless I'm deliberately experimenting with vibe coding to see what the SOTA is.
In the case of npm and the like, I don't trust them because they are actually using insecure procedures, which is proven to be so. And the vectors of attacks are well known. But I do trust Debian and the binaries they provide as the risks are for the Debian infrastructure to be compromised, malicious code in in the original source, and cryptographic failures. All threes are possibles, but there's more risk of bodily harm to myself that them happening.
Well, you can verify an LLM's output all sorts of ways.
But even if you couldn't, its still very rational to be judicious with how you use your time and attention. If I spent a few hours going through the ffmpeg documentation I could probably learn it better than chatgpt. But, its a judgement call whether its better to spend 5 minutes getting chatgpt to generate an ffmpeg command (with some error rate) or spend 2 hours doing it myself (with maybe a lower error rate).
Which is a better use of my time depends on lots of factors. How much I care. How important it is. How often that knowledge will be useful in the future. And so on. If I worked in a hollywood production studio, I'd probably spend the 2 hours (and many more). But if I just reach for ffmpeg once a year, the small% error rate from chatgpt's invocations might be fine.
Your time and attention are incredibly limited resources. Its very rational to spend them sparingly.
It isn’t fair to say “since I don’t read the source of the libraries I install that are written by humans, I don’t need to read the output of an llm; it’s a higher level of abstraction” for two reasons:
1. Most Libraries worth using have already been proven by being used in actual projects. If you can see that a project has lots of bug fixes, you know it’s better than raw code. Most bugs don’t show up unless code gets put through its paces.
2. Actual humans have actual problems that they’re willing to solve to a high degree of fidelity. This is essentially saying that humans have both a massive context window and an even more massive ability to prioritize important things that are implicit. LLMs can’t prioritize like humans because they don’t have experiences.
And, realistically, compute and power is cheap for getting help with one-off CLI commands.
The problem is someone decided that and the contents of Wikipedia was all something needs to be intelligent haha
It is almost like there is hardwiring in our brains that makes us instinctively correlate language generation with intelligence and people cannot separate the two.
It would be like if for the first calculators ever produced instead of responding with 8 to the input 4 + 4 = printed out "Great question! The answer to your question is 7.98" and that resulted in a slew of people proclaiming the arrival of AGI (or, more seriously, the ELIZA Effect is a thing).
Is there an easier way?
To re-encode the content into H.264+AAC, rather than simply "muxing" the encoded bitstreams from the MP4 container into a new MOV container.
ffmpeg -i input.mp4 -filter_complex "fps=15,scale=640:-2:flags=lanczos,split[a][b];[a]palettegen=reserve_transparent=off[p];[b][p]paletteuse=dither=sierra2_4a" -loop 0 output.gif
See also: this blog post from 10 years ago [1][1] https://blog.pkh.me/p/21-high-quality-gif-with-ffmpeg.html
But one case I see often: If you’re making a website with an animated gif that’s actually a .gif file, try it as an mp4 - smaller, smoother, proper colors, can still autoplay fine.
Which format is the default if no argument is given?
Or more complicated contextual knowledge - if you cut 1sec of a video file, does fish autocomplete to tell you whether the video is reencoded or cut (otherwise) losslessly
Also, what does fish complete to on Windows?
I know what I want to do, I don't know how it's being done, but there's a wealth of information that is very accessible. So I just read it.
It's very easy to type `apropos ffmpeg`. And even if you typed `man ffmpeg`, if you go to the end, you will find related manuals name for more information. And you can always use the pager (`less` in most case) facility for quick search.
I believe that a lot of frustration comes from people unwilling to learn the conceptual basis of the tools they are using.
> It's very easy to type `apropos ffmpeg`
No it's not. First, that's not a Windows command, so right off the bat you've cut off the largest OS. Second, your command is naively empty and it's telling that you've given it instead of an actual search query because you wouldn't be able to come up with a great one right away that would result in the correct result at the top - while the correct resuls is "hardcoded" in the field type in the UI. So yeah, go on, find that perfect query and then explain why you think every single user should be able to do the same quickly. Then you can think about how justified your other beliefs are about basic workflow issues you don't understand
Then any solutions is broken in this way. Even my bluetooth speaker comes with a manual. Not reading it and saying the speaker is broken, because you can't figure how to connect is pure delusion. Same as not reading ffmpeg manual and expecting to know how to use it.
> First, that's not a Windows command, so right off the bat you've cut off the largest OS.
ffmpeg on Window is so far the beaten path that it may as well be in Mordor. I would gladly bet that someone that knows how to run ffmpeg on windows also knows how to find the documentation for it.
> So yeah, go on, find that perfect query
Why would I find the perfect query? Do you go in the library and then find the correct line of the correct book in one go? Or do you consult the list of books of books for a theme, select a few candidates, consult their index, and then read the pages?
Then all of that is left to do is to note down the reference if you need to consult the book again (no need to remember everything).
Nope, you're just doing the same thing - purposefully ignoring the issue to make your non-solution comparable...
> Even my bluetooth speaker comes with a manual.
... in this case - the length and scope of the manual. First, you can operate the speaker without the manual or with just a single read of the manual- so spend a few seconds to learn how to pair (but you might not even need that as "hold to pair" might be something you remember from other devices), then the power/volume buttons require no manual because you've operated such buttons your whole life.
> Same as not reading ffmpeg manual
Of course it's not the same, the ffmpeg manual isn't a tiny page of 5 items, and no other apps will help you learn the peculiarities of ffmpeg. Also, the whole point of intuitive UI with "typed info" is that you don't need to read that huge manual to do the basics as you can simply follow the structure laid out by someone more knowledgeable
> ffmpeg on Window is so far the beaten path that it may as well be in Mordor. I would gladly bet that someone that knows how to run ffmpeg on windows also knows how to find the documentation for it.
Who would take that irrelevant bet? The issue isn't in finding! the manual!
> Why would I find the perfect query?
To prove that your solution works. I know it doesn't and challenge you to prove otherwise. Your suggestion is worse than asking users to Google, because at least there users will likely get the correct top result in a few tries for common needs
> Do you go in the library and then find the correct line of the correct book in one go?
No, I open an app and pick the correct format from the drop-down menu correctly in one go
> Or do you consult the list of books of books for a theme, select a few candidates, consult their index, and then read the pages?
Oh man, even in your fantasies you can't come up with a good workflow! No wonder you're fine suggesting everyone wastes a lot of time aproposing empty queries
It's the same with video viewers or music players. Often the default app of the OS is enough and they are very intuitive. But sometimes you need a bit more control and that's when using something like vlc or mpv which their extensive filter capabilities (which requires to have the doc at hand) is mandatory.
ffmpeg interface is ok for what it does. Any of your suggestion would be complex to implement if it aims to support the whole feature set of ffmpeg.
lol
Quite telling that these tools need to exist to make ffmpeg actually usable by humans (including very experienced developers).
But yea ffmpeg is awesome software, one of the great oss projects imo. working with video is hellish and it makes it possible.
If one has fewer such commands its as simple as just bash aliases and just adding it to ~/.bashrc
alias convertmkvtomp4='ffmpeg command'
then just run it anytime with just that alias phrase i use ffmpeg a lot so i have my own dedicated cli snippet tool for me, to quickly build out complex pipeline in easier language
the best part is i have --dry-run then exposes the flow + explicit commands being used at each step, if i need details on whats happening and verbose output at each step
It's incredible what lengths people go to to avoid memorizing basic ffmpeg usage. It's really not that hard, and the (F.) manual explains the basic concepts fairly well.
Now, granted, ffmpeg's defaults (reencoding by default and only keeping one stream of each type unless otherwise specified) aren't great, which can create some footguns, but as long as you remember to pass `-c copy` by default you should be fine.
Also, hiding those footguns is likely to create more harm than it fixes. Case in point: "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux.
Similarly, "ffmpeg extract audio from video.mp4" will unconditionally reencode the audio to mp3, again losing quality. The quality settings are also hardcoded and hidden from the user.
I can sympathize with ffmpeg syntax looking complicated at first glance, but the main reason for this is just that multimedia is really complicated and that some of this complexity is necessary in order to not make stupid mistakes that lose quality or waste CPU resources. I truly believe that these ffmpeg wrappers that try to make it seem overly simple (at least when it's this simple, i.e. not even exposing quality settings or differentiating between reencoding and remuxing) are more hurtful than helpful. Not only can they give worse results, but by hiding this complexity from users they also give users the wrong ideas about how multimedia works. "Abstractions" like this are exactly how beliefs like "resolution and quality are the same thing" come to be. I believe the way to go should be educating users about video formats and proper ffmpeg usage (e.g. with good cheat sheets), not by hiding complexity that really should not be hidden.
Personally I think it’s great that it’s such a universally useful tool that it has been deployed in so many different variations.
> some folks want to use lossless cut
In that case I would encourage you to ruminate on what the following in the post you're replying to means and what the implications are:
> "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux
Depending on the size of the video, the time it would take you to "do the job swiftly" (i.e. not caring about how the tools you are using actually work) might be more than just reading the ffmpeg manual, or at the very least searching for some command examples.
You may have misunderstood the comment: "lossless cut" is the name of an ffmpeg GUI front end. They're not discussing which exact command line gives lossless results.
>It's incredible what lengths people go to to avoid memorizing basic ffmpeg usage. It's really not that hard, and the (F.) manual explains the basic concepts fairly well.
Not really sure how else I was supposed to interpret your comment but clarification taken.
> But I argue in my comment above that this specific tool does not have better QoL
For some folks it may be better/more intuitive. It doesn’t hurt anybody by existing.
We all compromise with different tools in our lives in different ways. It just reads to me like an odd axe to grind.
Yes, that was a bit facetious of me, I apologize for that.
> What is so bad about the existence of this project?
Being very blunt: The fact that it reinforces the extremely common misconception that a) converting between containers like mkv and mp4 will always require reencoding and that b) there is a single way to reencode a video (hence suggesting that there is no "bad" way to reencode a video), seeing as next to no encoding settings are exposed.
I personally use lossless cut more than ffmpeg in the terminal just because I don’t have to really think about it and it can do most of what I need, which is simply removing or attaching things together without re-encoding. I use it maybe once every month or two, because it’s just not something I need to use a ton, so it doesn’t make sense for me to get down and dirty with the original. Ultimately I get what I need and I’m happy!
There isn't internal consistency to really hold on to ... it's just a bunch of seemingly independent options
Sure, I agree with all of this. Like I said above, the syntax (and, even more, the defaults) isn't great. I'm just arguing that "improving the syntax" should not mean "hiding complexity that should not be hidden", as the linked project does. An alternative ffmpeg frontend (i.e. a new CLI frontend using the libav* libraries like ffmpeg is, not a wrapper for the ffmpeg CLI program) with better syntax and defaults but otherwise similar capabilities would be a very interesting project.
(The answer to your question is that both -vcodec and -c:v are valid, but I imagine that's not the point.)
> The biggest problem is open source teams really don't get people on board that focus on customer and product the way commercial software does.
I believe in this case it may be more of a case of backwards compatibility, with options being added incrementally over time to add what was needed at the moment. Though that's just my guess.
I’m going to guess your job does not involve much UX design?
I've learned not to say this. Different things are easy/hard for each of us.
Reminds me of a discussion where someone argued, "why don't all the poor/homeless people just go get good jobs?"
Edit: I know your comment was meant to inspire/motivate us to try harder. Maybe it's easier than it appears.
I've met plenty of engineers who would rather spend 2 weeks programming than spend 5 minutes talking to their users. I used to struggle a lot with this myself when I was younger. Social anxiety isn't easy to overcome.
Now, I can simply ask any LLM to write the command, and understand any following issues or questions.
For example, my OS records videos as WEBM. Using the default settings for transforming to MP4 usually fails from a resolution ratio issue. I would be deadlocked using this library.
It really isn't that hard anymore.
if you are doing it often that's true. But for people like me who do it once every month or two it really is hard to memorize, especially if it's not exactly the same task.
What I would love would be an interactive script that asked me what I was trying to do and constructed a command line for me while explaining what it would do and the meaning of each argument. And of course it should favour commands that do not re-encode where possible.
Start the tool, and just list all of the options in order of usage popularity, with a brief explanation, and a field to paste in arguments like filenames or values. If an option is commonly used with another (or requires it), provide those hints (or automatically add the necessary values). If a value itself has structure (e.g. is itself a shell command), drill down recursively. Ensure that quotes and spaces and special characters always get escaped correctly.
In other words, a general-purpose command-line builder. And while we're at it, be able to save particular "templates" for fast re-use, identifying which values should be editable in the future.
I can't be the first person to think of this, but I've never come across anything like it and don't understand why not. It doesn't require AI or anything.
Or if no telemetry but based on local usage, it would promote/reinforce the options you already can recall and do use, hiding the ones you can’t/don’t?
But also, you could probably be just as accurate by asking an LLM to order the options by popularity based on their best guess based on all the tutorials they've trained on.
Or just scrape Stack Overflow for every instance of a command-line invocation for each tool and count how many times each option is used.
Ranking options by usage is the least complicated part of this, I think. (And it only matters for the popular options anyways -- below a certain threshold they can just be alphabetical.)
> Or just scrape Stack Overflow for every instance of a command-line invocation for each tool and count how many times each option is used.
Even trusting the developer's intuition is better than nothing, at least if you make sure the developer is prompted to think about it. (For major projects, devs might also be aware that certain features are associated with a large fraction of issue reports, for example.)
My ChatGPT history is full of conversations like this.
I have mixed feelings about using chatgpt to write code. But LLMs certainly make an excellent ffmpeg frontend. And you can even ask them to explain all the ffmpeg arguments they used and why they used them.
I find myself bothering exactly zero times to memorise this obnoxiously long command line. Claude fills in, and I can explore features better. What’s not to like? That I’m getting dumber for not memorising pages of cli args?
Love the project, but as with every Swiss knife this conversation is a thing and relevant. We had similar one reg JQ syntax and I’m truly convinced JQ is wonderful and useful tool. But I’m not gonna bother learning more DSLs…
- making sure pixel are square
("scale=w=if(gt(iw*sar\\,ih)\\,min(ceil(iw*sar/2)*2\\,{})\\,ceil(iw*sar*min(ih\\,{})/ih/2)*2):h=if(gt(ih\\,iw*sar)\\,min(ceil(ih/2)*2\\,{})\\,ceil(ih*min(iw*sar\\,{})/iw/sar/2)\*2):out_range=limited,zscale,setsar=1")
- dealing with some HDR or high gamut thing I can't really remember that can result from screen recording on macos using some method I was using at some point- setting this one tag on hevc files that macos needs for them to be recognised as hevc
- calculating the target bitrate if I need a specific filesize and verifying the encode actually hit that size and retrying if not (doesn't always work first time with certain hardware encoders)
- dealing with 2-pass encoding which is fiddly and requires two separate commands and the parameters are codec specific
- correctly activating hardware encoding for various codecs
- etc
But my issue with the linked tool is that it does none of the things you mentioned. All it does it make already very easy things even easier. Is it really that much harder to remember `ffmpeg -i inputfile outputfile.ext` than `ff convert inputfile to ext`?
I've explained this in other replies here but I am neither saying that ffmpeg wrappers are automatically bad, nor that ffmpeg cannot be complicated. I am only saying that this specific tool does not really help much.
I mean you saw the code above? It looks like gibberish and regex had a child. Many things in computing are complicated, but doesn’t look like that code. I make my living in media related programming and the code above is messy and extremely hard to read.
/usr/bin/ffmpeg -i "/path/to/musicfile.mp3" -i "/path/to/covertune.mp3" \
-filter_complex [1:a]volume=1[track1];[0a][track1]amix=normalize=false[output] \
-map [output] -b:a 192k -metadata title=15:17:01 -metadata "artist=Me, 2025" \
-metadata album=2025-12-23 "/path/to/file.mix.mp3"
chance of my coming up with that without deep poring over docs and tons of trial and error, or using claude (which is pretty much what I do nowadays): zeroIn my opinion there are two kinds of users: 1. Users who use FFmpeg regularly enough to know/understand the parameters. 2. Users who only use FFmpeg once in a while to do something specific.
This wrapper is superfluous for users in group number 1. But group number 2 does not really get much out of it either, for the reasons you've mentioned.
As a member of group 2, I usually want to do something very specific (e.g. remove an audio track, convert only the video, remux to a different container, etc.). A simple English wrapper does not help me here because it is not powerful enough; the defaults are usually not what I want. What I need is a tool that will take a more detailed English statement of what I want to achieve and spit out the FFmpeg command with explanations for what each parameter does and how it achieves my goal. We have this today: AI; and it mostly works (once you've gone through several iterations of it hallucinating options that do not exist...).
Then, several days later, I crawl away from fighting robots in a rabbit hole, and finally get around to doing what I set out to do in the first place....
As an occasional user this was a lot easier to use than having to remember all of the commands, and it did it all without hiding the complexity from the user.
Unfortunately it looks like they tried to monetize it but then later shut down. It doesn't look like they posted the source code anywhere.
https://web.archive.org/web/20230131140736/https://ffmpeg.gu...
Kills me that they didn't even bother open sourcing it.
It's not hard - just not a good use of our time. For 99% of HN users, ffmpeg is not a vital tool.
I have to use it less than twice a year. Now I just go and get an LLM to tell me the command I need.
And BTW, I spend a lot of time memorizing things (using spaced repetition). So I'm not averse to memorizing. ffmpeg simply doesn't warrant a place in my head.
Not that hard for you maybe. These things are not universal. You might wish to reconsider your basic assumption that everyone is too lazy to do this easy thing.
It is only a couple of thousand options[0], just memorize them! It super simple, barely an inconvenience!
[0]https://gist.github.com/tayvano/6e2d456a9897f55025e25035478a...
I will admit that I still do need to occasionally look up specific stuff, but for the most part I can do most of the common cases from memory.
For example this one is also ffmpeg wrapper, https://lorem.video and built for devs and QAs who just need a quick placeholder video without diving into ffmpeg syntax. It's optimized for that narrow use case to generate test video by typing a URL.
Nothing wrong with learning ffmpeg properly if you use it regularly, but purpose built tools have their place too.
I'm usually the one telling everyone else that various Python packaging ecosystem concepts (and possibly some other things) are "really not that hard". Many FFMpeg command lines I've encountered come across to me like examples of their own, esoteric programming language.
> Case in point: "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux.... Similarly, "ffmpeg extract audio from video.mp4" will unconditionally reencode the audio to mp3, again losing quality.
That sounds like a bug report / feature request rather than a problem with the approach.
> The quality settings are also hardcoded and hidden from the user.
This is intended so that users don't have to understand what quality settings are available and choose a sensible default.
> and that some of this complexity is necessary in order to not make stupid mistakes
For example, the case of avoiding re-encodes to switch between container formats could be handled by just maintaining a mapping.
In fact, I've felt the lack of that mapping recently when I wanted to extract audio from some videos and apply a thumbnail to them, because different audio formats have different rules for how that works (or you might be forced to use some particular container format, and have to research which one is appropriate).
I'm not sure this is true? It'll use whatever format you specify. If you do "ffmpeg -i input.mp4 output.wav", the resulting format will be WAV, with no loss of quality. If it encodes to MP3, it's because you've told it to extract to an MP3 file.
https://github.com/dheera/scripts/blob/master/helpme
This is a geenralized version of a previous ffmpeg wrapper I wrote:
There. I've debunked Java, Python, PHP, Perl, and Rust.
(Or maybe, just maybe, tools should make our lives easier.)
Even if the abstractions get leaky, people yern for goal/workflow oriented UX.
Using a different package name could be helpful. I searched for ezff docs and found a completely different Python library. Also ez-ffmpeg turns up a Rust lib which looks great if calling from Rust.
Has anyone else been avoiding typing FFmpeg commands by using file:// URLs with yt-dlp
I like that you took no AI approach, i am looking for something like this i.e. understanding intent and generating command without using AI but so far regex based approaches have proved to be inadequate. I also tried indexing keywords and creating index of keywords with similar meaning that improved the situation a bit but without something heavy like bert its always a subpar experience.
Could you elaborate on this? I see a lot of AI-use and I'm wondering if this is claude speaking or you
25 more comments available on Hacker News
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.