---
license: apache-2.0
language:
  - en
  - zh
tags:
  - singing-voice-synthesis
  - singing-voice-conversion
  - svs
  - svc
  - zero-shot
  - text-to-audio
  - music
pipeline_tag: text-to-speech
---

# SoulX-Singer Models (Safetensors Mirror)

Safetensors conversion of [Soul-AILab/SoulX-Singer](https://huggingface.co/Soul-AILab/SoulX-Singer) weights for use in the [MAESTRO AI Workstation](https://github.com/AEmotionStudio/Maestraea).

## Models

| Path | Size | Description |
|------|------|-------------|
| svs/model.safetensors | ~2.82 GB | Singing Voice Synthesis (lyrics+MIDI → singing) |
| svc/model.safetensors | ~2.79 GB | Singing Voice Conversion (audio-to-audio) |
| config.yaml | 579 B | Model architecture configuration |
| phone_set.json | ~30 KB | Phoneme mapping for SVS |

## Architecture

- Flow-matching based (F5-TTS foundation)
- 22-layer transformer with 1024 hidden size, 16 heads
- 128-dim mel spectrogram, 24kHz output
- Trained on 42,000+ hours of aligned vocals (Mandarin, English, Cantonese)

## License

Apache 2.0