Commit ·
3b81e84
0
Parent(s):
Duplicate from theadityamittal/music-separator-unet
Browse filesCo-authored-by: Aditya Mittal <theadityamittal@users.noreply.huggingface.co>
- .DS_Store +0 -0
- .gitattributes +36 -0
- README.md +105 -0
- checkpoints/unet_best.pt +3 -0
- config/default.yaml +50 -0
.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
.gitattributes
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
checkpoints/*.pt filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- audio
|
| 4 |
+
- music-source-separation
|
| 5 |
+
- u-net
|
| 6 |
+
- pytorch
|
| 7 |
+
license: mit
|
| 8 |
+
datasets:
|
| 9 |
+
- musdb18hq
|
| 10 |
+
metrics:
|
| 11 |
+
- SDR
|
| 12 |
+
- SIR
|
| 13 |
+
- SAR
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# 🎸 Music-U-Net — 4-Stem Source Separator
|
| 17 |
+
|
| 18 |
+
A PyTorch U-Net trained to split a full-band stereo **mixture** into
|
| 19 |
+
**drums · bass · other · vocals**.
|
| 20 |
+
|
| 21 |
+
| Property | Value |
|
| 22 |
+
|-----------------------|-------|
|
| 23 |
+
| Model type | 2-D U-Net (6.2 M params) |
|
| 24 |
+
| Input representation | STFT magnitude (mono, 16 kHz) |
|
| 25 |
+
| Output | 4 magnitude masks (drums, bass, other, vocals) |
|
| 26 |
+
| Training data | 100 train + 50 test songs from **MUSDB-18 HQ** |
|
| 27 |
+
| Checkpoint size | ~24 MB (`state_dict`, FP32) |
|
| 28 |
+
| License | MIT |
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## 🗂️ Contents
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
checkpoints/unet\_best.pt # model weights (state\_dict)
|
| 36 |
+
config/default.yaml # sample-rate, FFT size, etc.
|
| 37 |
+
README.md # this card
|
| 38 |
+
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## 📝 Model Details
|
| 44 |
+
|
| 45 |
+
### Architecture
|
| 46 |
+
Classic symmetric U-Net over 2-D spectra:
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
Encoder: \[C32]→\[C64]→\[C128]→\[C256]→\[C512]
|
| 50 |
+
Decoder: \[C256]←\[C128]←\[C64]←\[C32]
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
`ReLU` activations, batch-norm, skip-connections, 1×1 final conv to **4 channels**
|
| 54 |
+
(one per target stem) followed by soft masks --> multiplied by mixture magnitude.
|
| 55 |
+
|
| 56 |
+
### Training
|
| 57 |
+
* **Loss**: L1( pred_mag·mix_phase , ref_mag·mix_phase )
|
| 58 |
+
* **Augment**: time/freq masking, Gaussian noise, ±3 dB gain
|
| 59 |
+
* **Optimizer**: Adam, LR 1e-4 → 1e-5 cosine decay, 50 epochs
|
| 60 |
+
* **Hardware**: single RTX 3090, 2 h total
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## 📊 Evaluation (MUSDB-18 test, per-track average)
|
| 65 |
+
|
| 66 |
+
| Metric | Mean | Std |
|
| 67 |
+
|--------|------|-----|
|
| 68 |
+
| **SDR** | **-0.14 dB** | 1.66 |
|
| 69 |
+
| **SIR** | 3.93 dB | 1.86 |
|
| 70 |
+
| **SAR** | 4.26 dB | 0.85 |
|
| 71 |
+
|
| 72 |
+
*(baseline numbers; not state-of-the-art, but fast & lightweight)*
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## 💻 Usage
|
| 77 |
+
|
| 78 |
+
Try it live in the **Gradio Space** 👉 **[https://huggingface.co/spaces/theadityamittal/music-separator-space](https://huggingface.co/spaces/YOUR_USERNAME/music-separator-space)**
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## ⚖ Limitations & Biases
|
| 83 |
+
|
| 84 |
+
* Trained only on MUSDB-18 HQ → may fail on genres not represented (classical, EDM).
|
| 85 |
+
* Uses mixture phase → audible bleeding & artifacts, negative SDR in some tracks.
|
| 86 |
+
* No multi-channel or stem permutation handling.
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
+
## 📄 License
|
| 91 |
+
|
| 92 |
+
Released under the MIT License.
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## 🙏 Citation
|
| 97 |
+
|
| 98 |
+
```bibtex
|
| 99 |
+
@misc{music-unet-2025,
|
| 100 |
+
title = {Music Source Separation with U-Net},
|
| 101 |
+
author = {Your Name},
|
| 102 |
+
url = {https://huggingface.co/YOUR_USERNAME/music-separator-unet},
|
| 103 |
+
year = 2025
|
| 104 |
+
}
|
| 105 |
+
```
|
checkpoints/unet_best.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9d751220f7215032954092f940c454bf992ffdb9a4186f7c94500e20c0248739
|
| 3 |
+
size 31133505
|
config/default.yaml
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# config/default.yaml
|
| 2 |
+
|
| 3 |
+
device: "mps"
|
| 4 |
+
|
| 5 |
+
data:
|
| 6 |
+
raw_path: data/raw
|
| 7 |
+
splits: ["train", "test"]
|
| 8 |
+
processed_path: data/processed
|
| 9 |
+
sample_rate: 16000
|
| 10 |
+
n_fft: 1024
|
| 11 |
+
hop_length: 512
|
| 12 |
+
n_mels: 80
|
| 13 |
+
segment_length: 256
|
| 14 |
+
|
| 15 |
+
# for DataLoader
|
| 16 |
+
batch_size: 16
|
| 17 |
+
num_workers: 4
|
| 18 |
+
|
| 19 |
+
# list of all sources (including mixture)
|
| 20 |
+
sources: ["mixture", "drums", "bass", "other", "vocals"]
|
| 21 |
+
|
| 22 |
+
model:
|
| 23 |
+
checkpoint_dir: models/checkpoints
|
| 24 |
+
|
| 25 |
+
# for UNet
|
| 26 |
+
chans: 32
|
| 27 |
+
num_pool_layers: 4
|
| 28 |
+
|
| 29 |
+
training:
|
| 30 |
+
# for training loop
|
| 31 |
+
epochs: 50
|
| 32 |
+
lr: 1e-4
|
| 33 |
+
max_steps: null
|
| 34 |
+
log_interval: 50 # how many batches between progress logs
|
| 35 |
+
|
| 36 |
+
augment:
|
| 37 |
+
# defaults for your SpectrogramTransforms
|
| 38 |
+
time_mask_param: 30
|
| 39 |
+
freq_mask_param: 15
|
| 40 |
+
time_warp_param: 40
|
| 41 |
+
stripe_time_width: 1
|
| 42 |
+
stripe_freq_width: 1
|
| 43 |
+
stripe_time_count: 2
|
| 44 |
+
stripe_freq_count: 2
|
| 45 |
+
noise_std: 0.01
|
| 46 |
+
|
| 47 |
+
experiment:
|
| 48 |
+
# MLflow experiment metadata
|
| 49 |
+
name: default_experiment
|
| 50 |
+
run_name: run1
|