Text-to-Speech
Transformers
Safetensors
higgs_multimodal_qwen3
text-generation
speech-generation
voice-agent
expressive-speech
controllable-tts
multilingual-tts
Instructions to use bosonai/higgs-tts-3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bosonai/higgs-tts-3-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="bosonai/higgs-tts-3-4b")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bosonai/higgs-tts-3-4b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
ke/add_mac_cfui_deployment (#12)
Browse files- AGENTS.md: add Mac (MLX-Audio) and ComfyUI deployment paths (76e96a4aa9f091a1c2757502900a3daa17c6faa0)
AGENTS.md
CHANGED
|
@@ -18,7 +18,9 @@ Choose by constraint, not by habit:
|
|
| 18 |
|------|-----|-------------|
|
| 19 |
| Just hear it / try preset voices & avatars | **Live Demo** | https://boson.ai/workspace/avatar |
|
| 20 |
| Integrate quickly, no GPU, your own voice | **Hosted API** | https://docs.boson.ai/models/higgs-audio-tts/overview |
|
| 21 |
-
| Data privacy, custom testing, full control | **Self-host (SGLang-Omni)** | https://lmsys.org/blog/2026-06-04-higgs-audio-v3-tts/ |
|
|
|
|
|
|
|
| 22 |
| Inspect weights / config / tokenizer | **Model card (this repo)** | https://huggingface.co/bosonai/higgs-audio-v3-tts-4b |
|
| 23 |
|
| 24 |
Deep dive on everything: **Technical blog** → https://boson.ai/blog/higgs-audio-v3-tts
|
|
@@ -76,7 +78,8 @@ df -h . # disk f
|
|
| 76 |
```
|
| 77 |
|
| 78 |
Rules for the agent:
|
| 79 |
-
- **No NVIDIA GPU** → stop
|
|
|
|
| 80 |
- **≥ 40 GB VRAM (e.g. A100 40 GB, H100)** → known-good; proceed.
|
| 81 |
- **24 GB (e.g. RTX 4090)** → *reported* to work, **not officially verified**. The ~4B weights fit,
|
| 82 |
but expect to lower concurrency / `max_new_tokens` and watch for OOM at the `serve` step.
|
|
@@ -120,6 +123,38 @@ Cookbook reference: https://sgl-project.github.io/sglang-omni/cookbook/higgs_tts
|
|
| 120 |
|
| 121 |
---
|
| 122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
## Control tags — how to write target text
|
| 124 |
|
| 125 |
Embed tags directly in the `input` text to steer emotion, prosody, style, and sound effects.
|
|
|
|
| 18 |
|------|-----|-------------|
|
| 19 |
| Just hear it / try preset voices & avatars | **Live Demo** | https://boson.ai/workspace/avatar |
|
| 20 |
| Integrate quickly, no GPU, your own voice | **Hosted API** | https://docs.boson.ai/models/higgs-audio-tts/overview |
|
| 21 |
+
| Data privacy, custom testing, full control (NVIDIA GPU) | **Self-host (SGLang-Omni)** | https://lmsys.org/blog/2026-06-04-higgs-audio-v3-tts/ |
|
| 22 |
+
| Run locally on a Mac (Apple Silicon, no NVIDIA GPU) | **Self-host (MLX-Audio)** | https://github.com/Blaizzy/mlx-audio |
|
| 23 |
+
| Node-based UI / visual workflow | **ComfyUI (community)** | https://github.com/Saganaki22/Higgs_v3-TTS-ComfyUI |
|
| 24 |
| Inspect weights / config / tokenizer | **Model card (this repo)** | https://huggingface.co/bosonai/higgs-audio-v3-tts-4b |
|
| 25 |
|
| 26 |
Deep dive on everything: **Technical blog** → https://boson.ai/blog/higgs-audio-v3-tts
|
|
|
|
| 78 |
```
|
| 79 |
|
| 80 |
Rules for the agent:
|
| 81 |
+
- **No NVIDIA GPU** → stop this path. On an **Apple Silicon Mac**, use **Path C (MLX-Audio)**;
|
| 82 |
+
for a node-based UI, see **Path D (ComfyUI)**; otherwise use **Path A (hosted API)**.
|
| 83 |
- **≥ 40 GB VRAM (e.g. A100 40 GB, H100)** → known-good; proceed.
|
| 84 |
- **24 GB (e.g. RTX 4090)** → *reported* to work, **not officially verified**. The ~4B weights fit,
|
| 85 |
but expect to lower concurrency / `max_new_tokens` and watch for OOM at the `serve` step.
|
|
|
|
| 123 |
|
| 124 |
---
|
| 125 |
|
| 126 |
+
## Path C — Apple Silicon Mac via MLX-Audio (no NVIDIA GPU)
|
| 127 |
+
|
| 128 |
+
For Macs there is **no CUDA / Docker path** — use **MLX-Audio**, an Apple-MLX-native TTS library
|
| 129 |
+
that runs the model directly on M-series GPUs: https://github.com/Blaizzy/mlx-audio
|
| 130 |
+
|
| 131 |
+
**Hardware (first-hand, measured):** confirmed on an **M1 / 32 GB**, with a peak memory footprint of
|
| 132 |
+
only **~9–12 GB** — comfortably within reach of typical Apple Silicon laptops, no discrete GPU needed.
|
| 133 |
+
|
| 134 |
+
```bash
|
| 135 |
+
pip install mlx-audio # requires Apple Silicon (M1/M2/M3/M4) + macOS
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
Drive the model through MLX-Audio's CLI / Python API per its README — see
|
| 139 |
+
https://github.com/Blaizzy/mlx-audio for the exact `generate` command and supported flags.
|
| 140 |
+
|
| 141 |
+
> Mac-only. On Linux/NVIDIA use **Path B**; with no local accelerator at all, use **Path A**.
|
| 142 |
+
|
| 143 |
+
---
|
| 144 |
+
|
| 145 |
+
## Path D — ComfyUI node-based UI (community)
|
| 146 |
+
|
| 147 |
+
A community integration exposes the model as ComfyUI nodes (text-to-speech in a visual,
|
| 148 |
+
node-based workflow), with a drag-and-drop workflow file for immediate use:
|
| 149 |
+
|
| 150 |
+
- **Repo:** https://github.com/Saganaki22/Higgs_v3-TTS-ComfyUI (by Saganaki22)
|
| 151 |
+
|
| 152 |
+
> **Third-party, not maintained by Boson.** Follow that repo's README for install/usage, and verify
|
| 153 |
+
> it against the version of the weights you intend to run. Surfaced in the model's HF discussions:
|
| 154 |
+
> https://huggingface.co/bosonai/higgs-audio-v3-tts-4b/discussions/4
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
## Control tags — how to write target text
|
| 159 |
|
| 160 |
Embed tags directly in the `input` text to steer emotion, prosody, style, and sound effects.
|