google/WaxalNLP
Viewer • Updated • 1.67M • 37k • 234
How to use prince4332/CosyVoice2-0.5B-Akan with CosyVoice:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
Fine-tuned version of CosyVoice2-0.5B for Akan (Twi / Fante) text-to-speech synthesis.
Trained on the aka_asr subset of google/WaxalNLP
with 101 speakers and ~10,000 utterances.
from huggingface_hub import snapshot_download
from cosyvoice.cli.cosyvoice import CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio
model_dir = snapshot_download('prince4332/CosyVoice2-0.5B-Akan')
model = CosyVoice2(model_dir, load_jit=False, load_trt=False, fp16=True)
# Zero-shot voice cloning
prompt_wav, sr = torchaudio.load('reference_akan.wav') # 16 kHz mono
if sr != 16000:
prompt_wav = torchaudio.functional.resample(prompt_wav, sr, 16000)
for chunk in model.inference_zero_shot(
tts_text='Meda wo ase.',
prompt_text='Akwaaba!',
prompt_speech_16k=prompt_wav,
stream=False,
):
torchaudio.save('output.wav', chunk['tts_speech'], 22050)
aka_asr splitBase model
FunAudioLLM/CosyVoice2-0.5B