DevPanda004/saraga
Viewer β’ Updated β’ 100 β’ 3 β’ 1
How to use sathyavgc/saraga-dreambooth-musicgen with PEFT:
Task type is invalid.
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
Fine-tuned facebook/musicgen-small using audio DreamBooth with LoRA adapters on the Saraga dataset.
The model binds two special identity tokens to specific instrument timbres from the Saraga Carnatic music collection:
| Token | Bound Instrument |
|---|---|
sks0 |
Flute (Saraga Carnatic timbre) |
sks1 |
Veena (Saraga Carnatic timbre) |
Use these tokens in your prompt to trigger the learned timbres.
from transformers import AutoProcessor, MusicgenForConditionalGeneration
from peft import PeftModel
import torch, soundfile as sf
processor = AutoProcessor.from_pretrained("YourUsername/saraga-dreambooth-musicgen")
base_model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
model = PeftModel.from_pretrained(base_model, "YourUsername/saraga-dreambooth-musicgen")
model = model.to("cuda").eval()
inputs = processor(text=["sks0 Calm, Carnatic, Flute"], return_tensors="pt").to("cuda")
with torch.no_grad():
audio = model.generate(**inputs, max_new_tokens=512, guidance_scale=5.0)
sf.write("output.wav", audio[0, 0].cpu().numpy(), samplerate=32000)
| Parameter | Value |
|---|---|
| Base model | facebook/musicgen-small |
| Method | Audio DreamBooth + LoRA |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| Target modules | q_proj, v_proj |
| Dataset | DevPanda004/saraga (90 clips) |
| Sample rate | 32000 Hz |
| Clip length | ~15 seconds |
| Epochs | 50 |
| Optimizer | AdamW + CosineAnnealingLR |
| Training loss | Instance loss + Prior loss |
sks0 Calm, Carnatic, Flutesks0 Hindustani β tests if timbre survives style changesks token to hear the generic base model outputsks tokens are arbitrary β only meaningful with this specific adaptermusicgen-small; larger base models may produce better qualityCC-BY-NC-4.0 β free for non-commercial use with attribution.
Base model
facebook/musicgen-small