---
pipeline_tag: text-to-speech
tags:
- zero-shot
- multilingual
- voice-cloning
library_name: omnivoice
---
# OmniVoice 🌍
This OmniVoice variant was trained exclusively on the Chinese and English subsets of the Emilia dataset and corresponds to the "OmniVoice-Emilia" model described in our paper. It is intended for researchers aiming to reproduce the experimental results reported therein. For regular end users seeking superior performance, we recommend using the full-dataset-trained OmniVoice checkpoint [OmniVoice](https://huggingface.co/k2-fsa/OmniVoice) instead.
When using this checkpoint, set `denoise = False` and `lang_id = None`: the model was trained without prompt denoising or language-ID conditioning.
## Citation
```bibtex
@article{zhu2026omnivoice,
title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models},
author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel},
journal={arXiv preprint arXiv:2604.00688},
year={2026}
}
```