F5-TTS
Spanish
F5-Spanish / README.md
jpgallegoar's picture
Update README.md
120ddcf verified
|
Raw
History Blame
2.55 kB
metadata
license: cc-by-nc-4.0
language:
  - es
base_model:
  - SWivid/F5-TTS

F5-TTS Spanish Language Model

Overview

The F5-TTS model is finetuned specifically for Spanish language speech synthesis. This project aims to deliver high-quality, regionally diverse speech synthesis capabilities for Spanish speakers.

License

This model is released under the CC0-1.0 license, which allows for free usage, modification, and distribution.

Datasets

The following datasets were used for training:

  • Voxpopuli Dataset, with mainly Peninsular Spain accents
  • Crowdsourced high-quality Spanish speech data:
    • Argentinian Spanish
    • Chilean Spanish
    • Colombian Spanish
    • Peruvian Spanish
    • Puerto Rican Spanish
    • Venezuelan Spanish

Additional sources:

Model Information

Base Model: SWivid/F5-TTS
Total Training Duration: 218 hours of audio
Training Configuration:

  • Batch Size: 3200
  • Max Samples: 64
  • Training Steps: 1,200,000

Usage Instructions

  1. Run the F5-TTS application and monitor the terminal output. The path to the model file will be displayed, similar to the following:
    model : C:\Users\thega\.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\995ff41929c08ff968786b448a384330438b5cb6\F5TTS_Base\model_1200000.safetensors
    
  2. Replace the Model File:
    • Navigate to the specified location.
    • Rename the existing file to model_1200000.safetensors.bak.
    • Download the model_1200000.safetensors file from this repository and place it in the same location.
  3. Rerun the application to load the updated model.

Contributions and Recommendations

This model may benefit from further fine-tuning to enhance its performance across different Spanish dialects. Contributions from the community are encouraged. For optimal output quality, preprocess the reference audio by removing background noise, balancing audio levels, and enhancing clarity.