StemSplitio
/

htdemucs-ft-bass-pytorch

+---
+language: en
+license: mit
+library_name: demucs
+pipeline_tag: audio-to-audio
+tags:
+  - demucs
+  - stem-separation
+  - source-separation
+  - bass-isolation
+  - music
+  - htdemucs
+  - audio-to-audio
+  - bass-extraction
+  - bass-isolation
+  - bassline-extraction
+datasets:
+  - StemSplitio/stem-separation-benchmark-2026
+inference: false
+---
+# HT-Demucs FT — Bass Specialist (PyTorch)
+Bass isolation specialist from HT-Demucs FT, ~1/4 the size of the full ensemble.
+This is sub-model 1 of the 4-bag `htdemucs_ft` ensemble by
+[Défossez et al. (Meta AI)][demucs-repo], extracted as a standalone
+~160 MB model. It produces the **bass** stem with the same quality as
+the full ensemble (median SDR **10.38 dB** on MUSDB18-HQ — 2nd (close behind mdx_extra_q at 11.42) of all
+models in our 2026 benchmark) at roughly 1/4 the compute cost.
+> Want all 4 stems in one request? Use the full ensemble:
+> [`StemSplitio/htdemucs-ft-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-pytorch)
+>
+> Want a hosted REST API with credits and a dashboard? Use the
+> [**StemSplit API**](https://stemsplit.io/developers).
+---
+## Why this model
+| Property | This model | Full `htdemucs_ft` bag |
+|---|---|---|
+| Disk size | **~160 MB** | ~640 MB |
+| Per-3-min-song latency (M4 Pro MPS) | **~22 s** (RTF 0.12) | ~47 s (RTF 0.26) |
+| Bass SDR on MUSDB18-HQ | **10.38 dB** | 10.38 dB *(identical — the bag's `bass` output IS this sub-model's output)* |
+| Other stems returned | None (focused) | All 4 |
+If you only need the bass stem in production, this is **strictly faster and
+smaller** than the full ensemble with identical bass quality —
+**~2.6× faster wall time** in our smoke tests on M4 Pro MPS.
+---
+## Common use cases
+- **Bassline transcription** — extract bass for tab generation, MIDI conversion, or chord detection
+- **Mix rebalancing** — isolate and re-equalise the bass bus on a finished mix
+- **Music education** — learn basslines from any record by hearing them isolated
+- **Sub-bass mastering reference** — compare your low-end against pro mixes
+---
+## Quick start (Python)
+```python
+import base64, io, soundfile as sf
+from huggingface_hub import InferenceClient
+with open("your-song.mp3", "rb") as f:
+    audio_b64 = base64.b64encode(f.read()).decode()
+client = InferenceClient(model="StemSplitio/htdemucs-ft-bass-pytorch")
+result = client.post(json={"inputs": audio_b64})
+wav, sr = sf.read(io.BytesIO(base64.b64decode(result["bass"])))
+sf.write("out_bass.wav", wav, sr)
+```
+Or run locally without Hugging Face at all:
+```python
+import torch, soundfile as sf
+from demucs.apply import apply_model
+from demucs.audio import convert_audio
+from demucs.pretrained import get_model
+bag = get_model("htdemucs_ft")
+model = bag.models[1].eval()  # the bass specialist
+wav, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
+wav = torch.from_numpy(wav.T).contiguous()
+wav = convert_audio(wav, sr, bag.samplerate, bag.audio_channels).unsqueeze(0)
+with torch.no_grad():
+    stems = apply_model(model, wav, device="mps" if torch.backends.mps.is_available() else "cpu")[0]
+# bag.sources == ["drums", "bass", "other", "vocals"]; pick the bass row
+sf.write("out_bass.wav", stems[bag.sources.index("bass")].T.numpy(), bag.samplerate)
+```
+---
+## Deploy on Hugging Face Inference Endpoints
+Click **Deploy → Inference Endpoints** above, pick a GPU instance, and HF
+will spin up a container running [`handler.py`](handler.py).
+| Hardware | Latency for 3-min song |
+|---|---:|
+| NVIDIA L4 | ~3 s |
+| NVIDIA T4 small | ~7 s |
+| CPU x4 (basic) | ~48 s |
+(Roughly 2.6× faster than the full-bag latency, since we run only this
+specialist sub-model. Cloud GPU numbers extrapolated from M4 Pro measurements.)
+```bash
+curl -X POST https://<your-endpoint>.endpoints.huggingface.cloud \
+  -H "Authorization: Bearer $HF_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d "{\"inputs\": \"$(base64 < your-song.mp3)\"}"
+```
+---
+## Try it in your browser, no code
+- [StemSplit](https://stemsplit.io)
+- [StemSplit API](https://stemsplit.io/developers)
+- [Developer docs](https://stemsplit.io/developers/docs)
+- [API reference](https://stemsplit.io/developers/reference)
+---
+## Related models from StemSplit
+| Repo | Stem | When to use |
+|---|---|---|
+| [`htdemucs-ft-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-pytorch) | all 4 | When you need vocals + drums + bass + other in one request |
+| [`htdemucs-ft-vocals-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-pytorch) | vocals | Best vocal SDR in our benchmark (9.19 dB) — karaoke, acapella |
+| [`htdemucs-ft-drums-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-drums-pytorch) | drums | Drum extraction, beat transcription, sample-pack creation |
+| [`htdemucs-ft-bass-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-bass-pytorch) | bass | Bassline transcription, mix rebalancing |
+| [`htdemucs-ft-other-pytorch`](https://huggingface.co/StemSplitio/htdemucs-ft-other-pytorch) | other / instrumental | Karaoke instrumentals, sample-flipping, music-bed extraction |
+Full benchmark across every popular open-source separator:
+[StemSplitio/stem-separation-benchmark-2026](https://huggingface.co/datasets/StemSplitio/stem-separation-benchmark-2026).
+---
+## License & attribution
+This repo is **MIT-licensed**, matching the original HT-Demucs.
+**Original authors (please cite if you use this model in research):**
+```bibtex
+@inproceedings{rouard2023hybrid,
+  title     = {Hybrid Transformers for Music Source Separation},
+  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
+  booktitle = {ICASSP},
+  year      = {2023}
+}
+```
+- Original model: [`facebookresearch/demucs`][demucs-repo]
+- Packaging by [StemSplit](https://stemsplit.io)
+- Search keywords: bass extraction, isolate bass from song, bassline extractor, AI bass separator
+[demucs-repo]: https://github.com/facebookresearch/demucs

handler.py ADDED Viewed

	@@ -0,0 +1,89 @@

+"""
+HF Inference Endpoint handler for the HT-Demucs FT **bass** specialist.
+This repo ships only sub-model 1 of the 4-bag htdemucs_ft ensemble
+— the one trained to extract `bass`. ~160 MB on disk and ~1/4 the inference
+cost of the full bag, with the same per-stem quality as our v1.1 benchmark
+(median bass SDR = 10.38 dB).
+If you need all 4 stems in one request, use the full ensemble:
+    https://huggingface.co/StemSplitio/htdemucs-ft-pytorch
+Request shape:
+    POST /
+    Content-Type: application/json
+    { "inputs": "<base64-encoded audio bytes>" }
+Response shape:
+    { "bass": "<base64 WAV>", "sample_rate": 44100, "duration_s": 123.4 }
+"""
+from __future__ import annotations
+import base64
+import io
+from typing import Any
+import numpy as np
+import soundfile as sf
+import torch
+from demucs.apply import apply_model
+from demucs.audio import convert_audio
+from demucs.pretrained import get_model
+# Which sub-model of the htdemucs_ft bag to ship + which output index is ours.
+BAG_INDEX = 1
+TARGET_STEM = "bass"
+def _audio_to_b64_wav(audio: torch.Tensor, sample_rate: int) -> str:
+    np_audio = np.clip(audio.cpu().numpy().T, -1.0, 1.0)
+    buf = io.BytesIO()
+    sf.write(buf, np_audio, sample_rate, subtype="PCM_16", format="WAV")
+    return base64.b64encode(buf.getvalue()).decode("ascii")
+class EndpointHandler:
+    def __init__(self, path: str = "") -> None:
+        # Load the full bag, then drop the other 3 sub-models so only the
+        # bass specialist stays in memory.
+        bag = get_model("htdemucs_ft")
+        self.model = bag.models[BAG_INDEX]
+        self.model.eval()
+        self.device = torch.device(
+            "cuda" if torch.cuda.is_available() else
+            "mps" if torch.backends.mps.is_available() else
+            "cpu"
+        )
+        self.model.to(self.device)
+        self.sample_rate = int(bag.samplerate)
+        self.audio_channels = int(bag.audio_channels)
+        self.sources = list(bag.sources)  # ["drums","bass","other","vocals"]
+        self.target_index = self.sources.index(TARGET_STEM)
+    def __call__(self, data: dict[str, Any]) -> dict[str, Any]:
+        if "inputs" not in data:
+            return {"error": "Request body must include base64 audio under 'inputs'."}
+        try:
+            audio_bytes = base64.b64decode(data["inputs"])
+            wav_np, sr = sf.read(io.BytesIO(audio_bytes), dtype="float32", always_2d=True)
+        except Exception as e:  # noqa: BLE001
+            return {"error": f"Could not decode audio: {type(e).__name__}: {e}"}
+        wav = torch.from_numpy(wav_np.T).contiguous()
+        wav = convert_audio(wav, sr, self.sample_rate, self.audio_channels)
+        wav = wav.unsqueeze(0).to(self.device)
+        with torch.no_grad():
+            # apply_model on a single Model (not a BagOfModels) is supported
+            # and runs only this specialist — 1/4 the cost of the full bag.
+            stems = apply_model(self.model, wav, device=str(self.device), progress=False)[0]
+            # stems: (n_sources, channels, samples). Only stems[target_index]
+            # is meaningful for this specialist — the other rows are weakly
+            # predicted by-products and should not be used.
+        return {
+            "bass": _audio_to_b64_wav(stems[self.target_index], self.sample_rate),
+            "sample_rate": self.sample_rate,
+            "duration_s": round(wav.shape[-1] / self.sample_rate, 3),
+        }

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+torch>=2.2,<2.6
+torchaudio>=2.2,<2.6
+demucs==4.0.1
+numpy>=1.26,<2.0
+soundfile>=0.12