VibeVoice-Realtime-0.5B-MLX-INT8

INT8-quantized MLX bundle of Microsoft VibeVoice-Realtime-0.5B for Apple Silicon, ready to load with the VibeVoiceTTS Swift module from soniqo/speech-swift.

INT8 is the middle-ground option — better quality headroom than INT4, smaller and faster than BF16. For most use cases INT4 is the right pick.

What's in the box

  • model.safetensors — INT8 group-quantized Qwen2 backbone (group_size=32, mode=affine), tokenizer + acoustic tokenizer + diffusion head + EOS classifier kept in source dtype
  • quantization.json — per-layer manifest (244 quantized layers)
  • config.json, preprocessor_config.json — copied from upstream

Bundle size: 1.42 GB.

Performance (Apple M2 Max, 64 GB)

Steps Audio Elapsed RTF RTFx
10 1.20 s 0.64 s 0.53 1.88×

Sits between BF16 (1.48×) and INT4 (2.31×).

Use it

Swift / iOS / macOS

import VibeVoiceTTS

var config = VibeVoiceTTSModel.Configuration()
config.modelId = "aufklarer/VibeVoice-Realtime-0.5B-MLX-INT8"

let tts = try await VibeVoiceTTSModel.fromPretrained(configuration: config)
try tts.loadVoice(from: "voice_cache/en-Mike_man.safetensors")
let pcm = try await tts.generate(text: "Hello world.")

CLI (audio from speech-swift)

audio vibevoice "Hello world." \
    --model aufklarer/VibeVoice-Realtime-0.5B-MLX-INT8 \
    --voice-cache voice_cache/en-Mike_man.safetensors \
    --output hello.wav

Voice caches

Same as the INT4 bundle — MIT-licensed examples at mzbac/vibevoice.swift/voice_cache, or mint your own with audio vibevoice-encode-voice.

Languages

English and Chinese only.

License

MIT, inherited from the upstream Microsoft VibeVoice repo.

Reproduction

models/vibevoice/export/convert.py in soniqo/speech-models (private), --bits 8.

Citation

@misc{microsoft_vibevoice,
  title  = {VibeVoice: Long-form, Multi-speaker Text-to-Speech},
  author = {Microsoft Research},
  year   = {2025},
  url    = {https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B}
}
Downloads last month
27
Safetensors
Model size
0.5B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aufklarer/VibeVoice-Realtime-0.5B-MLX-INT8

Finetuned
(13)
this model

Collection including aufklarer/VibeVoice-Realtime-0.5B-MLX-INT8