metadata
license: apache-2.0
language:
- en
- zh
tags:
- singing-voice-synthesis
- singing-voice-conversion
- svs
- svc
- zero-shot
- text-to-audio
- music
pipeline_tag: text-to-speech
SoulX-Singer Models (Safetensors Mirror)
Safetensors conversion of Soul-AILab/SoulX-Singer weights for use in the MAESTRO AI Workstation.
Models
| Path | Size | Description |
|---|---|---|
| svs/model.safetensors | ~2.82 GB | Singing Voice Synthesis (lyrics+MIDI → singing) |
| svc/model.safetensors | ~2.79 GB | Singing Voice Conversion (audio-to-audio) |
| config.yaml | 579 B | Model architecture configuration |
| phone_set.json | ~30 KB | Phoneme mapping for SVS |
Architecture
- Flow-matching based (F5-TTS foundation)
- 22-layer transformer with 1024 hidden size, 16 heads
- 128-dim mel spectrogram, 24kHz output
- Trained on 42,000+ hours of aligned vocals (Mandarin, English, Cantonese)
License
Apache 2.0