Instructions to use kyutai/tts-1.6b-en_fr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Moshi
How to use kyutai/tts-1.6b-en_fr with Moshi:
# pip install moshi # Run the interactive web server python -m moshi.server --hf-repo "kyutai/tts-1.6b-en_fr" # Then open https://localhost:8998 in your browser
# pip install moshi import torch from moshi.models import loaders # Load checkpoint info from HuggingFace checkpoint = loaders.CheckpointInfo.from_hf_repo("kyutai/tts-1.6b-en_fr") # Load the Mimi audio codec mimi = checkpoint.get_mimi(device="cuda") mimi.set_num_codebooks(8) # Encode audio (24kHz, mono) wav = torch.randn(1, 1, 24000 * 10) # [batch, channels, samples] with torch.no_grad(): codes = mimi.encode(wav.cuda()) decoded = mimi.decode(codes) - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -60,7 +60,7 @@ See the [GitHub repository](https://github.com/kyutai-labs/delayed-streams-model
|
|
| 60 |
|
| 61 |
## Training Details
|
| 62 |
|
| 63 |
-
The model was trained for 750k steps, with a batch size of 64, and a segment duration of 120 seconds.
|
| 64 |
|
| 65 |
### Training Data
|
| 66 |
|
|
@@ -71,7 +71,7 @@ with `whisper-medium`.
|
|
| 71 |
|
| 72 |
### Compute Infrastructure
|
| 73 |
|
| 74 |
-
Pretraining
|
| 75 |
|
| 76 |
## Model Card Authors
|
| 77 |
|
|
|
|
| 60 |
|
| 61 |
## Training Details
|
| 62 |
|
| 63 |
+
The model was trained for 750k steps, with a batch size of 64, and a segment duration of 120 seconds. Then, CFG distillation was performed for 24k updates.
|
| 64 |
|
| 65 |
### Training Data
|
| 66 |
|
|
|
|
| 71 |
|
| 72 |
### Compute Infrastructure
|
| 73 |
|
| 74 |
+
Pretraining was done with 32 H100 Nvidia GPUs. CFG distillation was done on 8 such GPUs.
|
| 75 |
|
| 76 |
## Model Card Authors
|
| 77 |
|