Microsoft Releases Vibevoice, Generates 90-Minute, 4-Speaker Audio

Posted4 months agoActive4 months ago

watsonmusic

3 points

3 comments

microsoft.github.ioTech Discussionstory

informativepositive

Debate

20/100

AI Audio GenerationMicrosoft ResearchVibevoice

Key topics

AI Audio Generation

Microsoft Research

Vibevoice

Discussion Activity

Light discussion

First comment

N/A

Peak period

0-1h

Avg / period

Key moments

01Story posted
Aug 26, 2025 at 9:24 AM EDT
4 months ago
Step 01
02First comment
Aug 26, 2025 at 9:24 AM EDT
0s after posting
Step 02
03Peak activity
3 comments in 0-1h
Hottest window of the conversation
Step 03
04Latest activity
Aug 26, 2025 at 9:25 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (3 comments)

Showing 3 comments

watsonmusicAuthor

4 months ago

https://huggingface.co/microsoft/VibeVoice-1.5B

watsonmusicAuthor

4 months ago

VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details. The model can synthesize speech up to 90 minutes long with up to 4 distinct speakers, surpassing the typical 1-2 speaker limits of many prior models.

watsonmusicAuthor

4 months ago

https://github.com/microsoft/VibeVoice

View full discussion on Hacker News

ID: 45026218Type: storyLast synced: 11/18/2025, 12:07:58 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN