huggsface1 theadityamittal commited on
Commit
3b81e84
·
0 Parent(s):

Duplicate from theadityamittal/music-separator-unet

Browse files

Co-authored-by: Aditya Mittal <theadityamittal@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .DS_Store +0 -0
  2. .gitattributes +36 -0
  3. README.md +105 -0
  4. checkpoints/unet_best.pt +3 -0
  5. config/default.yaml +50 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ checkpoints/*.pt filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - audio
4
+ - music-source-separation
5
+ - u-net
6
+ - pytorch
7
+ license: mit
8
+ datasets:
9
+ - musdb18hq
10
+ metrics:
11
+ - SDR
12
+ - SIR
13
+ - SAR
14
+ ---
15
+
16
+ # 🎸 Music-U-Net — 4-Stem Source Separator
17
+
18
+ A PyTorch U-Net trained to split a full-band stereo **mixture** into
19
+ **drums · bass · other · vocals**.
20
+
21
+ | Property | Value |
22
+ |-----------------------|-------|
23
+ | Model type | 2-D U-Net (6.2 M params) |
24
+ | Input representation | STFT magnitude (mono, 16 kHz) |
25
+ | Output | 4 magnitude masks (drums, bass, other, vocals) |
26
+ | Training data | 100 train + 50 test songs from **MUSDB-18 HQ** |
27
+ | Checkpoint size | ~24 MB (`state_dict`, FP32) |
28
+ | License | MIT |
29
+
30
+ ---
31
+
32
+ ## 🗂️ Contents
33
+
34
+ ```
35
+ checkpoints/unet\_best.pt # model weights (state\_dict)
36
+ config/default.yaml # sample-rate, FFT size, etc.
37
+ README.md # this card
38
+
39
+ ```
40
+
41
+ ---
42
+
43
+ ## 📝 Model Details
44
+
45
+ ### Architecture
46
+ Classic symmetric U-Net over 2-D spectra:
47
+
48
+ ```
49
+ Encoder: \[C32]→\[C64]→\[C128]→\[C256]→\[C512]
50
+ Decoder: \[C256]←\[C128]←\[C64]←\[C32]
51
+ ```
52
+
53
+ `ReLU` activations, batch-norm, skip-connections, 1×1 final conv to **4 channels**
54
+ (one per target stem) followed by soft masks --> multiplied by mixture magnitude.
55
+
56
+ ### Training
57
+ * **Loss**: L1( pred_mag·mix_phase , ref_mag·mix_phase )
58
+ * **Augment**: time/freq masking, Gaussian noise, ±3 dB gain
59
+ * **Optimizer**: Adam, LR 1e-4 → 1e-5 cosine decay, 50 epochs
60
+ * **Hardware**: single RTX 3090, 2 h total
61
+
62
+ ---
63
+
64
+ ## 📊 Evaluation (MUSDB-18 test, per-track average)
65
+
66
+ | Metric | Mean | Std |
67
+ |--------|------|-----|
68
+ | **SDR** | **-0.14 dB** | 1.66 |
69
+ | **SIR** | 3.93 dB | 1.86 |
70
+ | **SAR** | 4.26 dB | 0.85 |
71
+
72
+ *(baseline numbers; not state-of-the-art, but fast & lightweight)*
73
+
74
+ ---
75
+
76
+ ## 💻 Usage
77
+
78
+ Try it live in the **Gradio Space** 👉 **[https://huggingface.co/spaces/theadityamittal/music-separator-space](https://huggingface.co/spaces/YOUR_USERNAME/music-separator-space)**
79
+
80
+ ---
81
+
82
+ ## ⚖ Limitations & Biases
83
+
84
+ * Trained only on MUSDB-18 HQ → may fail on genres not represented (classical, EDM).
85
+ * Uses mixture phase → audible bleeding & artifacts, negative SDR in some tracks.
86
+ * No multi-channel or stem permutation handling.
87
+
88
+ ---
89
+
90
+ ## 📄 License
91
+
92
+ Released under the MIT License.
93
+
94
+ ---
95
+
96
+ ## 🙏 Citation
97
+
98
+ ```bibtex
99
+ @misc{music-unet-2025,
100
+ title = {Music Source Separation with U-Net},
101
+ author = {Your Name},
102
+ url = {https://huggingface.co/YOUR_USERNAME/music-separator-unet},
103
+ year = 2025
104
+ }
105
+ ```
checkpoints/unet_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d751220f7215032954092f940c454bf992ffdb9a4186f7c94500e20c0248739
3
+ size 31133505
config/default.yaml ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # config/default.yaml
2
+
3
+ device: "mps"
4
+
5
+ data:
6
+ raw_path: data/raw
7
+ splits: ["train", "test"]
8
+ processed_path: data/processed
9
+ sample_rate: 16000
10
+ n_fft: 1024
11
+ hop_length: 512
12
+ n_mels: 80
13
+ segment_length: 256
14
+
15
+ # for DataLoader
16
+ batch_size: 16
17
+ num_workers: 4
18
+
19
+ # list of all sources (including mixture)
20
+ sources: ["mixture", "drums", "bass", "other", "vocals"]
21
+
22
+ model:
23
+ checkpoint_dir: models/checkpoints
24
+
25
+ # for UNet
26
+ chans: 32
27
+ num_pool_layers: 4
28
+
29
+ training:
30
+ # for training loop
31
+ epochs: 50
32
+ lr: 1e-4
33
+ max_steps: null
34
+ log_interval: 50 # how many batches between progress logs
35
+
36
+ augment:
37
+ # defaults for your SpectrogramTransforms
38
+ time_mask_param: 30
39
+ freq_mask_param: 15
40
+ time_warp_param: 40
41
+ stripe_time_width: 1
42
+ stripe_freq_width: 1
43
+ stripe_time_count: 2
44
+ stripe_freq_count: 2
45
+ noise_std: 0.01
46
+
47
+ experiment:
48
+ # MLflow experiment metadata
49
+ name: default_experiment
50
+ run_name: run1