--- pipeline_tag: text-to-speech tags: - zero-shot - multilingual - voice-cloning library_name: omnivoice --- # OmniVoice 🌍

OmniVoice

This OmniVoice variant was trained exclusively on the Chinese and English subsets of the Emilia dataset and corresponds to the "OmniVoice-Emilia" model described in our paper. It is intended for researchers aiming to reproduce the experimental results reported therein. For regular end users seeking superior performance, we recommend using the full-dataset-trained OmniVoice checkpoint [OmniVoice](https://huggingface.co/k2-fsa/OmniVoice) instead. When using this checkpoint, set `denoise = False` and `lang_id = None`: the model was trained without prompt denoising or language-ID conditioning. ## Citation ```bibtex @article{zhu2026omnivoice, title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models}, author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel}, journal={arXiv preprint arXiv:2604.00688}, year={2026} } ```