Meta Segment Anything Model Audio
Key topics
The Meta Segment Anything Model Audio has sparked a lively debate about the impact of AI on the music industry, with some commenters marveling at the demo's ability to separate music, voice, and background noise. While some worry that AI will render skilled labor obsolete, others point out that technological advancements have always disrupted traditional industries, citing the introduction of synthesizers, music videos, and digital audio workstations as examples. The discussion also touches on the potential benefits of AI, such as improving the listening experience for the hearing impaired, and the creative possibilities of isolating individual tracks. Amidst the discussion, a consensus emerges that AI is not a replacement for human creativity, but rather a tool that can augment and transform the music-making process.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4h
Peak period
25
48-60h
Avg / period
4.6
Based on 37 loaded comments
Key moments
- 01Story posted
Dec 16, 2025 at 1:26 PM EST
20 days ago
Step 01 - 02First comment
Dec 16, 2025 at 5:46 PM EST
4h after posting
Step 02 - 03Peak activity
25 comments in 48-60h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 21, 2025 at 6:56 AM EST
15 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This one rankles me because of a) the benefits piracy has (third world consumers can now discover you, for starters) and b) the absolute bad faith way in which the industry acts, screwing over artists, unethically going after Pirate Bay by making it into a trade war with Sweden (I think)
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
I wonder if you could assemble a big corpus of individual solo instruments, then permute a cacophonous mix of them. IIRC the main training dataset is comprised of a limited number of real songs. But I think a model trained on real songs might struggle with more "out there" harmonies and mixes.
A few prompts failed almost entirely though, "train noises", "background noise" and "clatter"... so definitely sensitive to either prompting or the kind of noise being extracted.
It's not clear from the blog post, the git page, and most other places if this will run on, even in big-O:
* CPU
* 16GB GPU
* 240GB server (of the type most business can afford)
* Meta/Google/Open AI/Anthropic-style data center
Environments might mean the difference between e.g. 16GB and 24GB, but not 16GB and 160GB.
It's seems you need lot's of ram and vram. Reading the issues on github[1], it does not seem many others have had success in using this effectively:
- someone with a 96 Gb VRAM RTX 6000 Pro had cuda oom issues
- someone somehow made it work on a RTX 4090 somehow, but RTF processing time was 12...
- someone with a RTX 5090 managed to use it, but with clips no longer than 20s
It seems utility of the model for hobbyist with consumer grade cards will be low.
[1]: https://github.com/facebookresearch/sam-audio/issues/24
- This feature is awesome for sample-based music
- Sample music is not what it was due to difficulties related to legal rights
- This model was probably created by not giving a damn about said rights
The reason I'm interested in this is because recording with multiple microphones (one on guitar, one on the vocal), has it's own set of problems with phase relationship and bleed between the microphones, which causes issues when mixing.
Being able to capture a singing guitarist with a single microphone placed in just the right spot, but still being able to process the tracks individually (with EQ, compression, reverb, etc), could be really helpful.