Microsoft Releases Vibevoice, Generates 90-Minute, 4-Speaker Audio
Posted4 months agoActive4 months ago
microsoft.github.ioTech Discussionstory
informativepositive
Debate
20/100
AI Audio GenerationMicrosoft ResearchVibevoice
Key topics
AI Audio Generation
Microsoft Research
Vibevoice
Discussion Activity
Light discussionFirst comment
N/A
Peak period
3
0-1h
Avg / period
3
Key moments
- 01Story posted
Aug 26, 2025 at 9:24 AM EDT
4 months ago
Step 01 - 02First comment
Aug 26, 2025 at 9:24 AM EDT
0s after posting
Step 02 - 03Peak activity
3 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 26, 2025 at 9:25 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Discussion (3 comments)
Showing 3 comments
watsonmusicAuthor
4 months ago
https://huggingface.co/microsoft/VibeVoice-1.5B
watsonmusicAuthor
4 months ago
VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details. The model can synthesize speech up to 90 minutes long with up to 4 distinct speakers, surpassing the typical 1-2 speaker limits of many prior models.
watsonmusicAuthor
4 months ago
https://github.com/microsoft/VibeVoice
View full discussion on Hacker News
ID: 45026218Type: storyLast synced: 11/18/2025, 12:07:58 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.