You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Fine-tuned Orpheus-3B: German Emotional Speech Synthesis

This repository contains a fine-tuned implementation of the Orpheus-3B model, specialized for German speech synthesis with advanced support for emotional cues and non-verbal audio tokens.

The model was fine-tuned using LoRA (Low-Rank Adaptation) on a curated German dataset containing high-quality audio with diverse emotional expressions and non-verbal cues.

🚀 Key Highlights

54.2% WER Improvement: Reduced Word Error Rate on emotional prompts from 0.7046 (Base) to 0.3226 (Fine-tuned).
37.1% CER Improvement: Reduced Character Error Rate from 0.5471 (Base) to 0.3440 (Fine-tuned).
Architecture: Orpheus-3B with LoRA Adapters.

🎭 Supported Tags The model has been fine-tuned on Dataset_eleven_v3 and supports a wide range of emotional and paralinguistic tags. Use square brackets [tag] for inference:

Emotions: [happy], [angry], [sad], [thoughtful], [neutral], [sleepy], [whisper], [worried], [annoyed], [surprised], [fearful], [contemptuous], [disgusted]
Paralinguistic Tokens: [sighs], [laughter], [cry], [growl], [sob], [cheer], [breath], [pause], [grit], [snarl], [exhales sharply], [grits teeth], [breathes heavily], [exclaims], [hush], [soft], [quiet], [softbreath], [hm], [yawn], [mumble], [slowbreath], [ugh], [ew], [scoff], [snort], [tremble], [shaky_breath], [sigh], [nervous_laugh], [chuckles], [short pause], [sniffles], [inhales deeply]

🏋️ Training Details The model was trained using the following optimal parameters:

Learning Rate: 0.0008
LoRA Rank (R): 32
LoRA Alpha: 32
Precision: 4-bit (bitsandbytes/unsloth)
Framework: Unsloth 2024.12

🔊 Inference Example To use this model, you need to load the base Orpheus-3B model and apply these LoRA adapters.

from unsloth import FastLanguageModel
import torch

# Load base model and LoRA adapters
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Vishalshendge3198/orpheus-3b-tts-german-emotional", # This repo
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Example prompt with emotional tags in square brackets
text = "[happy][laughing] Das ist ja großartig! Ich freue mich so sehr. [cheer]"
# ... (standard Orpheus inference code follows)

📊 Performance

Metric	Base Model (3B)	Fine-tuned (German)	Improvement
Avg WER	0.7046	0.3226	54.2%
Avg CER	0.5471	0.3440	37.1%
Emotional Prosody	Basic	Advanced	High

📜 Credits Developed by Vishal Shendge as part of a German TTS fine-tuning project using the Orpheus-3B architecture. Special thanks to the Unsloth team for providing the optimization framework.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Vishalshendge3198/orpheus-3b-tts-german-emotional

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

canopylabs/orpheus-3b-0.1-pretrained

Finetuned

canopylabs/orpheus-3b-0.1-ft

Finetuned

unsloth/orpheus-3b-0.1-ft

Adapter

(34)

this model