You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Fine-tuned Orpheus-3B: German Emotional Speech Synthesis

This repository contains a fine-tuned implementation of the Orpheus-3B model, specialized for German speech synthesis with advanced support for emotional cues and non-verbal audio tokens.

The model was fine-tuned using LoRA (Low-Rank Adaptation) on a curated German dataset containing high-quality audio with diverse emotional expressions and non-verbal cues.

๐Ÿš€ Key Highlights

  • 54.2% WER Improvement: Reduced Word Error Rate on emotional prompts from 0.7046 (Base) to 0.3226 (Fine-tuned).
  • 37.1% CER Improvement: Reduced Character Error Rate from 0.5471 (Base) to 0.3440 (Fine-tuned).
  • Architecture: Orpheus-3B with LoRA Adapters.

๐ŸŽญ Supported Tags The model has been fine-tuned on Dataset_eleven_v3 and supports a wide range of emotional and paralinguistic tags. Use square brackets [tag] for inference:

  • Emotions: [happy], [angry], [sad], [thoughtful], [neutral], [sleepy], [whisper], [worried], [annoyed], [surprised], [fearful], [contemptuous], [disgusted]
  • Paralinguistic Tokens: [sighs], [laughter], [cry], [growl], [sob], [cheer], [breath], [pause], [grit], [snarl], [exhales sharply], [grits teeth], [breathes heavily], [exclaims], [hush], [soft], [quiet], [softbreath], [hm], [yawn], [mumble], [slowbreath], [ugh], [ew], [scoff], [snort], [tremble], [shaky_breath], [sigh], [nervous_laugh], [chuckles], [short pause], [sniffles], [inhales deeply]

๐Ÿ‹๏ธ Training Details The model was trained using the following optimal parameters:

  • Learning Rate: 0.0008
  • LoRA Rank (R): 32
  • LoRA Alpha: 32
  • Precision: 4-bit (bitsandbytes/unsloth)
  • Framework: Unsloth 2024.12

๐Ÿ”Š Inference Example To use this model, you need to load the base Orpheus-3B model and apply these LoRA adapters.

from unsloth import FastLanguageModel
import torch

# Load base model and LoRA adapters
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Vishalshendge3198/orpheus-3b-tts-german-emotional", # This repo
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Example prompt with emotional tags in square brackets
text = "[happy][laughing] Das ist ja groรŸartig! Ich freue mich so sehr. [cheer]"
# ... (standard Orpheus inference code follows)

๐Ÿ“Š Performance

Metric Base Model (3B) Fine-tuned (German) Improvement
Avg WER 0.7046 0.3226 54.2%
Avg CER 0.5471 0.3440 37.1%
Emotional Prosody Basic Advanced High

๐Ÿ“œ Credits Developed by Vishal Shendge as part of a German TTS fine-tuning project using the Orpheus-3B architecture. Special thanks to the Unsloth team for providing the optimization framework.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Vishalshendge3198/orpheus-3b-tts-german-emotional