bytebecky commited on
Commit
58f6e41
·
1 Parent(s): 5402f01

ke/add_mac_cfui_deployment (#12)

Browse files

- AGENTS.md: add Mac (MLX-Audio) and ComfyUI deployment paths (76e96a4aa9f091a1c2757502900a3daa17c6faa0)

Files changed (1) hide show
  1. AGENTS.md +37 -2
AGENTS.md CHANGED
@@ -18,7 +18,9 @@ Choose by constraint, not by habit:
18
  |------|-----|-------------|
19
  | Just hear it / try preset voices & avatars | **Live Demo** | https://boson.ai/workspace/avatar |
20
  | Integrate quickly, no GPU, your own voice | **Hosted API** | https://docs.boson.ai/models/higgs-audio-tts/overview |
21
- | Data privacy, custom testing, full control | **Self-host (SGLang-Omni)** | https://lmsys.org/blog/2026-06-04-higgs-audio-v3-tts/ |
 
 
22
  | Inspect weights / config / tokenizer | **Model card (this repo)** | https://huggingface.co/bosonai/higgs-audio-v3-tts-4b |
23
 
24
  Deep dive on everything: **Technical blog** → https://boson.ai/blog/higgs-audio-v3-tts
@@ -76,7 +78,8 @@ df -h . # disk f
76
  ```
77
 
78
  Rules for the agent:
79
- - **No NVIDIA GPU** → stop. Self-host is not viable; steer the user to **Path A (hosted API)**.
 
80
  - **≥ 40 GB VRAM (e.g. A100 40 GB, H100)** → known-good; proceed.
81
  - **24 GB (e.g. RTX 4090)** → *reported* to work, **not officially verified**. The ~4B weights fit,
82
  but expect to lower concurrency / `max_new_tokens` and watch for OOM at the `serve` step.
@@ -120,6 +123,38 @@ Cookbook reference: https://sgl-project.github.io/sglang-omni/cookbook/higgs_tts
120
 
121
  ---
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ## Control tags — how to write target text
124
 
125
  Embed tags directly in the `input` text to steer emotion, prosody, style, and sound effects.
 
18
  |------|-----|-------------|
19
  | Just hear it / try preset voices & avatars | **Live Demo** | https://boson.ai/workspace/avatar |
20
  | Integrate quickly, no GPU, your own voice | **Hosted API** | https://docs.boson.ai/models/higgs-audio-tts/overview |
21
+ | Data privacy, custom testing, full control (NVIDIA GPU) | **Self-host (SGLang-Omni)** | https://lmsys.org/blog/2026-06-04-higgs-audio-v3-tts/ |
22
+ | Run locally on a Mac (Apple Silicon, no NVIDIA GPU) | **Self-host (MLX-Audio)** | https://github.com/Blaizzy/mlx-audio |
23
+ | Node-based UI / visual workflow | **ComfyUI (community)** | https://github.com/Saganaki22/Higgs_v3-TTS-ComfyUI |
24
  | Inspect weights / config / tokenizer | **Model card (this repo)** | https://huggingface.co/bosonai/higgs-audio-v3-tts-4b |
25
 
26
  Deep dive on everything: **Technical blog** → https://boson.ai/blog/higgs-audio-v3-tts
 
78
  ```
79
 
80
  Rules for the agent:
81
+ - **No NVIDIA GPU** → stop this path. On an **Apple Silicon Mac**, use **Path C (MLX-Audio)**;
82
+ for a node-based UI, see **Path D (ComfyUI)**; otherwise use **Path A (hosted API)**.
83
  - **≥ 40 GB VRAM (e.g. A100 40 GB, H100)** → known-good; proceed.
84
  - **24 GB (e.g. RTX 4090)** → *reported* to work, **not officially verified**. The ~4B weights fit,
85
  but expect to lower concurrency / `max_new_tokens` and watch for OOM at the `serve` step.
 
123
 
124
  ---
125
 
126
+ ## Path C — Apple Silicon Mac via MLX-Audio (no NVIDIA GPU)
127
+
128
+ For Macs there is **no CUDA / Docker path** — use **MLX-Audio**, an Apple-MLX-native TTS library
129
+ that runs the model directly on M-series GPUs: https://github.com/Blaizzy/mlx-audio
130
+
131
+ **Hardware (first-hand, measured):** confirmed on an **M1 / 32 GB**, with a peak memory footprint of
132
+ only **~9–12 GB** — comfortably within reach of typical Apple Silicon laptops, no discrete GPU needed.
133
+
134
+ ```bash
135
+ pip install mlx-audio # requires Apple Silicon (M1/M2/M3/M4) + macOS
136
+ ```
137
+
138
+ Drive the model through MLX-Audio's CLI / Python API per its README — see
139
+ https://github.com/Blaizzy/mlx-audio for the exact `generate` command and supported flags.
140
+
141
+ > Mac-only. On Linux/NVIDIA use **Path B**; with no local accelerator at all, use **Path A**.
142
+
143
+ ---
144
+
145
+ ## Path D — ComfyUI node-based UI (community)
146
+
147
+ A community integration exposes the model as ComfyUI nodes (text-to-speech in a visual,
148
+ node-based workflow), with a drag-and-drop workflow file for immediate use:
149
+
150
+ - **Repo:** https://github.com/Saganaki22/Higgs_v3-TTS-ComfyUI (by Saganaki22)
151
+
152
+ > **Third-party, not maintained by Boson.** Follow that repo's README for install/usage, and verify
153
+ > it against the version of the weights you intend to run. Surfaced in the model's HF discussions:
154
+ > https://huggingface.co/bosonai/higgs-audio-v3-tts-4b/discussions/4
155
+
156
+ ---
157
+
158
  ## Control tags — how to write target text
159
 
160
  Embed tags directly in the `input` text to steer emotion, prosody, style, and sound effects.