Instructions to use kyutai/tts-1.6b-en_fr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Moshi
How to use kyutai/tts-1.6b-en_fr with Moshi:
# pip install moshi # Run the interactive web server python -m moshi.server --hf-repo "kyutai/tts-1.6b-en_fr" # Then open https://localhost:8998 in your browser
# pip install moshi import torch from moshi.models import loaders # Load checkpoint info from HuggingFace checkpoint = loaders.CheckpointInfo.from_hf_repo("kyutai/tts-1.6b-en_fr") # Load the Mimi audio codec mimi = checkpoint.get_mimi(device="cuda") mimi.set_num_codebooks(8) # Encode audio (24kHz, mono) wav = torch.randn(1, 1, 24000 * 10) # [batch, channels, samples] with torch.no_grad(): codes = mimi.encode(wav.cuda()) decoded = mimi.decode(codes) - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,7 @@ Pre-print research paper is coming soon!
|
|
| 20 |
This is a model for streaming text-to-speech (TTS).
|
| 21 |
Unlike offline text-to-speech, where the model needs the entire text to produce the audio,
|
| 22 |
our model starts to output audio as soon as the first few words from the text have been given as input.
|
|
|
|
| 23 |
|
| 24 |
## Model Details
|
| 25 |
|
|
|
|
| 20 |
This is a model for streaming text-to-speech (TTS).
|
| 21 |
Unlike offline text-to-speech, where the model needs the entire text to produce the audio,
|
| 22 |
our model starts to output audio as soon as the first few words from the text have been given as input.
|
| 23 |
+
This model is actually 1.8B parameters, not 1.6B as the name might suggest.
|
| 24 |
|
| 25 |
## Model Details
|
| 26 |
|