File size: 9,881 Bytes
72f6f07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ea58c4
 
 
8616370
3ea58c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72f6f07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
---
language: en
license: mit
library_name: onnxruntime
pipeline_tag: audio-to-audio
tags:
  - onnx
  - onnxruntime
  - stem-separation
  - source-separation
  - vocal-isolation
  - vocal-remover
  - drum-extraction
  - bass-extraction
  - karaoke
  - demucs
  - htdemucs
  - music
  - audio-to-audio
  - mobile
  - ios
  - android
  - coreml
  - directml
  - production-ready
datasets:
  - StemSplitio/stem-separation-benchmark-2026
inference: false
---

# HT-Demucs FT β€” Full 4-Stem Bag, ONNX

**The first complete ONNX export of HT-Demucs FT on the Hugging Face Hub.**
Four parity-verified ONNX models (drums, bass, other, vocals) plus a
~250-line numpy aggregator that runs the full 4-stem separation in pure
`onnxruntime`. **No PyTorch required at inference.** Runs on CPU /
CoreML / CUDA / DirectML.

This repo is the convenience drop β€” all 4 specialist sub-models of
`htdemucs_ft` in one place, with a working bag-inference script. If you
only need one stem in production, the individual stem-specialist repos
below are ~75% smaller and ~4Γ— faster per song.

---

## TL;DR

```bash
pip install onnxruntime numpy soundfile
python bag_infer.py your-song.mp3 ./out/
# writes out/drums.wav, out/bass.wav, out/other.wav, out/vocals.wav
```

That's it. The 4 `.onnx` files (316 MB each, ~1.26 GB total) live
alongside the script.

---

## Quality

Median per-stem SDR on the MUSDB18-HQ test split (50 songs), BSS Eval v4
via `museval`. **Identical to the official PyTorch `htdemucs_ft`** β€” the
bag's per-stem output IS the corresponding specialist's output (the weight
matrix is one-hot per stem).

| Stem | SDR (dB) | Rank in our 2026 benchmark |
|---|---:|---|
| **vocals** | **9.19** | **#1** (highest open-source vocal SDR) |
| drums | 10.11 | #2 (mdx_extra_q leads at 11.49) |
| bass | 10.38 | #2 (mdx_extra_q leads at 11.42) |
| other | 6.34 | #2 (mdx_extra_q leads at 7.67) |

Full benchmark across every popular open-source separator:
[StemSplitio/stem-separation-benchmark-2026](https://huggingface.co/datasets/StemSplitio/stem-separation-benchmark-2026).

**ONNX vs PyTorch parity:** verified to < 1e-3 max abs diff on every stem
during export. See the
[Day 1 spike report](https://huggingface.co/StemSplitio/htdemucs-ft-drums-onnx#how-it-was-built)
for the full engineering writeup.

---

## Performance

Real measurements on an Apple M4 Pro:

| Mode | Hardware | Per 3-min song | Notes |
|---|---|---:|---|
| One specialist (`htdemucs-ft-drums-onnx`) | M4 Pro CPU | **~22 s** | 4Γ— faster, 75% smaller β€” use this if you only need one stem |
| **Full bag (this repo)** | M4 Pro CPU | **~88 s** | RTF ~0.5. 4 sub-models Γ— N chunks. |
| Full bag | M4 Pro CPU (8 threads) | ~60 s | With `OMP_NUM_THREADS=8` and SessionOptions tuned |
| Full bag | NVIDIA L4 CUDA | ~6 s | Extrapolated from per-specialist CUDA numbers |
| Full bag | NVIDIA T4 | ~16 s | Extrapolated |
| PyTorch full bag | M4 Pro MPS | ~47 s | Faster only because MPS is GPU-accelerated; ONNX-CUDA beats it cleanly. |

---

## Tooling β€” `demucs-onnx` Python package

This bag is also packaged in the open-source
[`demucs-onnx`](https://github.com/StemSplit/demucs-onnx) Python package
on PyPI. It auto-downloads each specialist from the matching HF repo on
first use, so you don't even need to manually fetch the four `.onnx`
files.

```bash
pip install demucs-onnx

# Full 4-stem separation (auto-downloads ~1.26 GB on first run)
demucs-onnx separate song.mp3 stems/

# From Python
python -c "from demucs_onnx import separate; stems = separate('song.mp3')"
```

The same package is also the canonical tool for **exporting** htdemucs
to ONNX yourself β€” it bundles all four blocker fixes (complex STFT,
`fractions.Fraction`, `random.randrange`,
`aten::_native_multi_head_attention`) so vanilla `torch.onnx.export`
works on your own demucs checkpoints.

```bash
pip install "demucs-onnx[export]"
demucs-onnx export htdemucs_ft out/   # writes 4 .onnx files
```

---

## Common use cases

- **Karaoke makers** β€” `out/other.wav` minus `out/vocals.wav` gives a clean
  karaoke track plus an acapella in one pass.
- **DAW stem export** β€” drop the 4 `.wav` files into Ableton / Logic /
  Reaper as separate channels for remixing.
- **DJ stems software** β€” load all 4 stems as live-mixable tracks.
- **AI music apps** β€” feed each stem into downstream models (drum
  transcription, bassline-to-MIDI, vocal pitch correction).
- **Acapella sampling** β€” clean isolated vocals at the highest SDR
  available in open source.
- **Mobile / on-device separation** β€” replaces a 1+ GB PyTorch install
  with `onnxruntime`'s 50 MB binary on iOS / Android.

---

## Quick start

### Python β€” as a library

```python
import bag_infer

stems = bag_infer.separate_all("your-song.mp3")
# stems: dict[str, numpy.ndarray (2, samples)]
#   stems["drums"], stems["bass"], stems["other"], stems["vocals"]
```

### Python β€” with execution provider control

```python
import soundfile as sf
import bag_infer

audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = bag_infer.separate(
    audio.T, sr,
    providers=["CPUExecutionProvider"],  # or "CoreMLExecutionProvider", etc.
)
for name, audio in stems.items():
    sf.write(f"{name}.wav", audio.T, sr)
```

### CLI

```bash
python bag_infer.py your-song.mp3 ./out/
python bag_infer.py your-song.mp3 ./out/ --providers cuda
python bag_infer.py your-song.mp3 ./out/ --providers coreml
python bag_infer.py your-song.mp3 ./out/ --providers dml
```

### Web / mobile

Each specialist is a vanilla onnxruntime model; just load all 4 sessions
and reuse the aggregation logic in `bag_infer.py::separate`. See the
individual stem repos for platform-specific snippets:
[drums](https://huggingface.co/StemSplitio/htdemucs-ft-drums-onnx) Β·
[bass](https://huggingface.co/StemSplitio/htdemucs-ft-bass-onnx) Β·
[other](https://huggingface.co/StemSplitio/htdemucs-ft-other-onnx) Β·
[vocals](https://huggingface.co/StemSplitio/htdemucs-ft-vocals-onnx).

---

## How aggregation works

The `htdemucs_ft` bag uses a **one-hot weight matrix** for combining the 4
sub-models β€” model 0's drums output is used directly as the bag's drums
stem, model 1's bass output is the bag's bass stem, and so on. No
weighted-sum aggregation needed.

That means:
- **The bag's drums stem == the drums specialist's drums output** (bit-exact in fp32)
- Same for bass, other, vocals
- So you can ship only the specialists you need and get identical
  per-stem quality to the full bag at 1/4 the size

`bag_infer.py` simply runs all 4 specialists and picks the relevant row
from each. ~30 lines of numpy.

---

## Input / output spec per sub-model

| Tensor | Name | Shape | Dtype | Notes |
|---|---|---|---|---|
| Input | `mix` | `(1, 2, 343980)` | float32 | Stereo audio, 44.1 kHz, 7.8 s segment. |
| Output | `stems` | `(1, 4, 2, 343980)` | float32 | `[drums, bass, other, vocals]`. Use only the specialist's target row. |

For longer audio, the bag script handles overlap-add chunking.

---

## Files in this repo

| File | Size | Purpose |
|---|---:|---|
| `htdemucs_ft_drums.onnx`  | 316 MB | Drums specialist (bag index 0) |
| `htdemucs_ft_bass.onnx`   | 316 MB | Bass specialist (bag index 1) |
| `htdemucs_ft_other.onnx`  | 316 MB | Other specialist (bag index 2) |
| `htdemucs_ft_vocals.onnx` | 316 MB | Vocals specialist (bag index 3) |
| `bag_infer.py` | 7 KB | Pure numpy aggregator. No torch. |
| `requirements.txt` | <1 KB | `onnxruntime`, `numpy`, `soundfile`. |
| `README.md` | this file | |

Total: **~1.26 GB**. If that's too big, use individual stem repos.

---

## Related work

| Repo | Stem | Use when |
|---|---|---|
| [`htdemucs-ft-drums-onnx`](https://huggingface.co/StemSplitio/htdemucs-ft-drums-onnx) | drums | Only need drums (1/4 size, 1/4 latency) |
| [`htdemucs-ft-bass-onnx`](https://huggingface.co/StemSplitio/htdemucs-ft-bass-onnx) | bass | Only need bass |
| [`htdemucs-ft-other-onnx`](https://huggingface.co/StemSplitio/htdemucs-ft-other-onnx) | other | Only need "other" / instrumental |
| [`htdemucs-ft-vocals-onnx`](https://huggingface.co/StemSplitio/htdemucs-ft-vocals-onnx) | vocals | **#1 open-source vocal SDR** |

PyTorch versions for HF Inference Endpoints:
[`htdemucs-ft-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-pytorch)
and its [4 sibling specialist repos](https://huggingface.co/StemSplitio).

---

## Skip the infrastructure β€” use the StemSplit API

Don't want to ship 1.26 GB of `.onnx` files in your app, manage a GPU
pool, or write overlap-add chunking? Use the
**[StemSplit API](https://stemsplit.io/developers)** instead β€” same models
under the hood, hosted for you, with credits and a dashboard.

- 🌐 [stemsplit.io](https://stemsplit.io)
- πŸ“˜ [Developer docs](https://stemsplit.io/developers/docs)
- πŸ”Œ [API reference](https://stemsplit.io/developers/reference)

Or use the no-code tools that ship this same model family:

- 🎀 [Vocal Remover](https://stemsplit.io/vocal-remover)
- 🎢 [Karaoke Maker](https://stemsplit.io/karaoke-maker)
- πŸŽ™οΈ [Acapella Maker](https://stemsplit.io/acapella-maker)
- πŸ“Ί [YouTube Stem Splitter](https://stemsplit.io/youtube-stem-splitter)

---

## License & attribution

MIT-licensed, matching the original HT-Demucs.

```bibtex
@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}
```

- Original PyTorch model: [`facebookresearch/demucs`](https://github.com/facebookresearch/demucs)
- ONNX export, parity verification, and packaging by [StemSplit](https://stemsplit.io)
- Search keywords: htdemucs onnx, demucs onnx, htdemucs bag onnx, demucs ios, demucs android, music source separation onnx, 4-stem separation onnx, stem separation mobile, onnxruntime music separation