FFmpeg 8.0
Key topics
The FFmpeg community is abuzz over the release of version 8.0, with commenters ripping into the new features, particularly the integration of Whisper for audio transcription. While some are excited about the potential for real-time subtitle generation, others temper expectations, noting that Whisper's performance can be laggy even on powerful hardware. As commenters riff on the new capabilities, they're also poking fun at the complexity of FFmpeg's command-line incantations and the emerging role of LLMs in simplifying the process. Meanwhile, enthusiasts are already speculating about how quickly LLMs will adapt to the new codecs and options in FFmpeg 8.0.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
12m
Peak period
148
Day 1
Avg / period
32
Based on 160 loaded comments
Key moments
- 01Story posted
Aug 22, 2025 at 11:22 AM EDT
4 months ago
Step 01 - 02First comment
Aug 22, 2025 at 11:34 AM EDT
12m after posting
Step 02 - 03Peak activity
148 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 4, 2025 at 12:06 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Secondly, just curious: any insiders here?
What changed? I see the infrastructure has been upgraded, this seems like a big release, etc. I guess there was a recent influx of contributors? A corporate donation? Something else?
[1]: https://github.com/ggml-org/whisper.cpp
[2]: https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/13ce36fef98a...
https://news.ycombinator.com/item?id=44886647 ("FFmpeg 8.0 adds Whisper support (ffmpeg.org)"—9 days ago, 331 comments)
I think building some processing off of Vulkan 1.3 was the right move. (Aside, I also just noticed yesterday that Asahi Linux on Mac supports that standard as well.)
FFmpeg arguments, the original prompt engineering
One would use gemini-cli (or claude-cli),
- and give a natural language prompt to gemini (or claude) on what processing needs to be done,
- with the correct paths to FFmpeg and the media file,
- and g-cli (or c-cli) would take it from there.
Is this correct?
ffmpeg right after
The only options you ever need are tar -x, tar -c (x for extract and c for create). tar -l if you wanna list, l for list.
That's really it, -v for verbose just like every other tool if you wish.
Examples:
You never need anything else for the 99% case.Surely you mean -t if you wanna list, t for lisT.
l is for check-Links.
And you don't need to uncompress separately. tar will detect the correct compression algorithm and decompress on its own. No need for that gunzip intermediate step.Whoops, lol.
> on its own
Yes.. I'm aware, but that's more options, unnecessary too, just compose tools.
Principle of least surprise and all that.
I don't use tape, so I don't need a tape archive format.
Gzip only compresses a single file, so .tar.gz lets you bundle multiple files. You can do the same thing with zip, of course, but...
Zip compresses individual files separately in the container, ignoring redundancies between files. But .tar.gz (and .tar.zip, though I've rarely seen that combination) bundles the files together and then compresses them, so can get better compression than .zip alone.
This will create an uncompressed .tar with the wrong name. You need a z option to specify gzip.
tar -caf foo.tar.xz foo
Will be an xz compressed tarball.
I wasn't expecting the downvotes for an xkcd reference
fwiw, `tar xzf foobar.tgz` = "_x_tract _z_e _f_iles!" has been burned into my brain. It's "extract the files" spoken in a Dr. Strangelove German accent
Better still, I recently discovered `dtrx` (https://github.com/dtrx-py/dtrx) and it's great if you have the ability to install it on the host. It calls the right commands and also always extracts into a subdir, so no more tar-bombs.
If you want to create a tar, I'm sorry but you're on your own.
You don't need the z, as xf will detect which compression was used, if any.
Creating is no harder, just use c for create instead, and specify z for gzip compression:
Same with listing contents, with t for tell:"also always extracts into a subdir" sounds like a nice feature though, thanks for sharing another alternative!
A common use case is:
See: and here is an example from its Wikipedia page, under the "Operation and archive format" section, under the Copy subsection:Copy
Cpio supports a third type of operation which copies files. It is initiated with the pass-through option flag (p). This mode combines the copy-out and copy-in steps without actually creating any file archive. In this mode, cpio reads path names on standard input like the copy-out operation, but instead of creating an archive, it recreates the directories and files at a different location in the file system, as specified by the path given as a command line argument.
This example copies the directory tree starting at the current directory to another path new-path in the file system, preserving files modification times (flag m), creating directories as needed (d), replacing any existing files unconditionally (u), while producing a progress listing on standard output (v):
$ find . -depth -print | cpio -p -dumv new-path
I also use it very infrequently compared to tar -- mostly in conjunction with swupdate. I've also run into file size limits, but that's not really a function of the command line interface to the tool.
It’s really the dream UI/UX from sience fiction movies: “take all images from this folder and crop 100px away except on top, saturate a bit and save them as uncompressed tiffs in this new folder, also assemble them in a video loop, encode for web”.
If you don't care enough about potential side effects to read the manual it's fine, but a dream UX it is not because I'd argue that includes correctness.
A prompt to ChatGPT and a command later and all were nicely cropped in a second.
The dread of doing it by hand and having it magically there a minute later is absolutely mind blowing. Even just 5 years ago, I would have just done it manually as it would have definitely taken more to write the code for this task.
This seemed to be interesting to users of this site. tl;dr they added support for whisper, an OpenAI model for speech-to-text, which should allow autogeneration of captions via ffmpeg
yep, finally the deaf will able to read what people are saying in a porno!
This could streamline things
1. Just copy them over from the Bluray. This lacks support in most client players, so you'll either need to download a player that does, or use something like Plex/Jellyfin, which will run FFMpeg to transcode and burn the picture subtitles in before sending it to the client.
2. Run OCR on the Bluray subtitles. Not perfect.
3. Steal subtitles from a streaming service release (or multiple) if it exists.
[0] - https://xkcd.com/2347/
[0] https://link.springer.com/article/10.1007/s11214-020-00765-9
That being said, if you put down a pie chart of media frameworks (especially for transcoding or muxing), ffmpeg would have a significant share of that pie.
Linux doesn't really have a system codec API though so any Linux video software you see (ex. VLC, Handbrake) is almost certainly using ffmpeg under the hood (or its foundation, libavcodec).
It also was originally authored by the same person who did lzexe, tcc, qemu, and the current leader for the large text compression benchmark.
Oh, and for most of the 2010's there was a fork due to interpersonal issues on the team.
This post talks about the situation back then: https://blog.pkh.me/p/13-the-ffmpeg-libav-situation.html
It’s exceedingly good software though, and to be fair I think it’s gotten a fair bit of sponsorship and corporate support.
> Note that these releases are intended for distributors and system integrators. Users that wish to compile from source themselves are strongly encouraged to consider using the development branch
My earlier comment about "SSL" is that the actual library might be OpenSSL, BoringSSL, WolfSSL, GnuTLS, or any one of a number of others. So the number of uses of each one is smaller than the total number of "SSL" uses.
But Qt and libusb above ffmpeg? No way.
Could be an interesting data source to explore that opinion.
https://youtu.be/9kaIXkImCAM?si=b_vzB4o87ArcYNfq
Then it stopped working until I updated youtube-dl and then that stopped working once I lost the incantation :<
It's a great tool. Little long in the tooth these days, but gets the job done.
Past that, I'm on the command line haha
[0] https://handbrake.fr
https://www.shotcut.org/
https://www.mltframework.org/
Handbrake and Losslssscut are great too. But in addition to donating to FFmpeg, I pay for ffWorks because it really does offer a lot of value to me. I don’t think there is anything close to its polish on other platforms, unfortunately.
[1]: https://www.ffworks.net/index.html
If it was priced 1-5€ would just buy it I guess. But this.
Someone else mentioned Lossless-Cut program, which is pretty good. It has a merge feature that has a compatibility checker ability that can detect a few issues. But I find transcoding the separate videos to MPEG-TS before joining them can get around many problems. If you fire up a RAM-Disk, it's a fast task.
> Only codecs specifically designed for parallelised decoding can be implemented in such a way, with more mainstream codecs not being planned for support.
It makes sense that most video codecs aren't amenable to compute shader decoding. You need tens of thousands of threads to keep a GPU busy, and you'll struggle to get that much parallelism when you have data dependencies between frames and between tiles in the same frame.
I wonder whether encoders might have more flexibility than decoders. Using compute shaders to encode something like VP9 (https://blogs.gnome.org/rbultje/2016/12/13/overview-of-the-v...) would be an interesting challenge.
When the resulting frame is already in a GPU texture then, displaying it has fairly low overhead.
My question is: how wrong am I?
Motion vectors can be large (for example, 256 pixels for VP8), so you wouldn't get much extra parallelism by decoding multiple frames together.
However, even if the worst-case performance is bad, you might see good performance in the average case. For example, you might be able to decode all of a frame's inter blocks in parallel, and that might unlock better parallel processing for intra blocks. It looks like deblocking might be highly parallel. VP9, H.265 and AV1 can optionally split each frame into independently-coded tiles, although I don't know how common that is in practice.
The ProRes bitstream spec was given to SMPTE [1], but I never managed to find any information on ProRes RAW, so it's exciting to see software and compute implementations here. Has this been reverse-engineered by the FFMPEG wizards? At first glance of the code, it does look fairly similar to the regular ProRes.
[1] https://pub.smpte.org/doc/rdd36/20220909-pub/rdd36-2022.pdf
I'm curious wrt how a WebGPU implementation would differ from Vulkan. Here's mine if you're interested: https://github.com/averne/FFmpeg/tree/vk-proresdec
Initially this was just a vehicle for me to get stuck in and learn some WebGPU, so no doubt I'm missing lots of opportunities for optimisation - but it's been fun as much as frustrating. I leaned heavily on the SMPTE specification document and the FFMPEG proresdec.c implementation to understand and debug.
This is great news. I remember being laughed at when I initially asked whether the Vulkan enc/dec were generic because at the time it was all just standardising interfaces for the in-silicon acceleration.
Having these sorts of improvements available for legacy hardware is brilliant, and hopefully a first route that we can use to introduce new codecs and improve everyone's QOL.
The old RV40 had some small advantages over H264. At low bitrates, RV40 always seemed to blur instead of block, so it got used a lot for anime content. CPU-only decoding was also more lightweight than even the most optimized H264 decoder (CoreAVC with the inloop deblocking disabled to save even more CPU).
40 more comments available on Hacker News