Instructions to use Vishalshendge3198/spark-tts-german-emotional with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Vishalshendge3198/spark-tts-german-emotional with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Spark-TTS-0.5B/LLM") model = PeftModel.from_pretrained(base_model, "Vishalshendge3198/spark-tts-german-emotional") - Transformers
How to use Vishalshendge3198/spark-tts-german-emotional with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="Vishalshendge3198/spark-tts-german-emotional")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Vishalshendge3198/spark-tts-german-emotional", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use Vishalshendge3198/spark-tts-german-emotional with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vishalshendge3198/spark-tts-german-emotional to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vishalshendge3198/spark-tts-german-emotional to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Vishalshendge3198/spark-tts-german-emotional to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Vishalshendge3198/spark-tts-german-emotional", max_seq_length=2048, )
Fine-tuned Spark-TTS: German Emotional Speech Synthesis
This repository contains a fine-tuned implementation of the Spark-TTS (0.5B) model, specialized for German speech synthesis with advanced support for emotional cues and non-verbal audio tokens.
The model was fine-tuned using LoRA (Low-Rank Adaptation) on a curated German dataset containing high-quality audio with diverse emotional expressions and non-verbal cues.
π Key Highlights
- 57.14% Loss Improvement: Reduced test loss from 10.0074 (Base) to 4.2891 (Fine-tuned).
- Emotional Support: Handles stylistic tags like
[happy],[angry], and[thoughtful]. - Non-Verbal Tokens: Accurately synthesizes non-speech sounds like
[sighs],[laughter],[yawn], and[growl]. - Architecture: Spark-TTS (0.5B) with LoRA Adapters.
ποΈ Training Details
The model was trained using the following optimal parameters:
- Learning Rate: 0.0005
- LoRA Rank (R): 64
- LoRA Alpha: 64
- Precision: 4-bit (bitsandbytes)
π Inference Example
To use this model, you need to load the base Spark-TTS 0.5B model and apply these LoRA adapters.
from peft import PeftModel
from transformers import AutoModelForCausalLM
# Load base model (ensure you have the Spark-TTS architecture code)
base_model = AutoModelForCausalLM.from_pretrained("SparkAudio/Spark-TTS-0.5B", trust_remote_code=True)
# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "Vishalshendge3198/spark-tts-german-emotional")
π Performance
| Metric | Base Model (0.5B) | Fine-tuned (German) | Improvement |
|---|---|---|---|
| Test Loss | 10.0074 | 4.2891 | 57.14% |
| German Prosody | Basic | Advanced | High |
π Credits
Developed as part of a German TTS fine-tuning project using the Spark-TTS architecture by SparkAudio.
- Downloads last month
- -