--- license: apache-2.0 datasets: - issai/KazakhTTS language: - kk base_model: - openbmb/VoxCPM1.5 --- # Kazakh-VoxCPM-LoRA ## ๐Ÿ‡ฐ๐Ÿ‡ฟ Overview This repository hosts a **LoRA (Low-Rank Adaptation)** model specifically optimized for the **Kazakh language**, built upon the **VoxCPM 1.5** architecture. This research project aims to bridge the gap in high-quality Kazakh speech synthesis, offering a solution that excels in both standard TTS and Zero-shot Voice Cloning while retaining the base model's proficiency in Chinese and English. ## ๐Ÿš€ Performance Highlights * **Native Phoneme Mastery**: Precision handling of unique Kazakh phonemes: **ำ™, า“, า›, าฃ, ำฉ, าฑ, าฏ, าป, ั–**. * **Superior Prosody**: Achieved a `loss/stop` of **0.003-0.005**, ensuring natural pauses and rhythmic accuracy in long-form text. * **Advanced Cloning**: Supports high-fidelity voice cloning from as little as 3 seconds of reference audio. * **Seamless Tri-lingualism**: Integrated support for code-switching across Kazakh, English, and Chinese. ## ๐Ÿ“Š Training Specifications * **Base Model**: [openbmb/VoxCPM1.5]() * **Dataset**: **66.1 hours** of high-quality Kazakh speech (Source: [issai/KazakhTTS]()). * **Parameters**: Step: 4160 | Epoch: 1.84 | Rank: 32 | Alpha: 16. * **Final Metrics**: `loss/diff`: ~0.644 | `loss/stop`: ~0.004. ## ๐Ÿ› ๏ธ Implementation Guide This model supports dynamic hot-swapping. You can enable Kazakh support by setting `lora_enabled` to `True`. For a complete interactive web application and detailed inference scripts, please refer to our GitHub repository: ๐Ÿ‘‰ [voxcpm-kazakh-tts](https://github.com/allssai/voxcpm-kazakh-tts) This web application supports: - **Interactive Synthesis**: Real-time Kazakh TTS. - **Voice Cloning**: Custom voice synthesis using your own reference audio. - **Easy Deployment**: Ready to run via Gradio. ## โš–๏ธ License & Acknowledgements This model is released under the **Apache License 2.0**. Special thanks to the ISSAI team for providing the KazakhTTS dataset. ---