Text-to-Speech
Transformers
Safetensors
higgs_multimodal_qwen3
text-generation
speech-generation
voice-agent
expressive-speech
controllable-tts
multilingual-tts
Instructions to use bosonai/higgs-tts-3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bosonai/higgs-tts-3-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="bosonai/higgs-tts-3-4b")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bosonai/higgs-tts-3-4b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Writing target text for Higgs Audio v3 TTS — control tags | |
| How to embed control tags in the `input` text to steer emotion, prosody, style, and sound effects. | |
| For where this fits in the overall workflow, see [AGENTS.md](./AGENTS.md). | |
| ## Format rule (read first) | |
| Every tag is `<|category:tag|>`. There are **two placements**: | |
| - **Sentence-level** — emotion, style, and prosody's `speed_* / pitch_* / expressive_*`. | |
| Put at the **start of the sentence**; it colors the whole sentence. | |
| - **Inline** — sound effects (`sfx`) and prosody's `pause / long_pause`. | |
| Insert **at the exact position** in the sentence where the effect should occur. | |
| **`sfx` gotcha:** format is `<|sfx:tag|>onomatopoeia, then the line` — the tag comes **first**, | |
| immediately followed by the onomatopoeia with **no space** between them. | |
| ## Examples | |
| Sentence-level: | |
| ``` | |
| <|emotion:elation|>Welcome aboard, we are absolutely thrilled to have you here! | |
| <|style:whispering|>Come closer, I have a little secret to share. | |
| <|prosody:speed_slow|>Take your time, there's really no need to rush. | |
| ``` | |
| Inline sfx (tag first, onomatopoeia attached, no space): | |
| ``` | |
| <|sfx:cough|>Ahem, welcome everyone, let's get started. | |
| <|sfx:laughter|>Haha, so glad you could make it! | |
| ``` | |
| Inline pause (between phrases): | |
| ``` | |
| Hello there <|prosody:pause|> and welcome to the show. | |
| ``` | |
| Stacking tags (sentence-level emotion + inline sfx in one line): | |
| ``` | |
| <|emotion:elation|><|sfx:laughter|>Haha, welcome, welcome, we're so happy you're here! | |
| <|sfx:sigh|>Haah, what a day — but welcome, please make yourself at home. | |
| ``` | |
| ## Tips | |
| - You can stack tags in one sentence (e.g. a leading emotion tag plus an inline sfx). | |
| - `speed_very_slow` only slows the model to roughly ~5s; for slower delivery, insert | |
| `<|prosody:long_pause|>` between phrases instead. | |
| - Only the tags below are recognized — anything else degrades output or gets read literally. | |
| ## Full tag catalog (43) | |
| ### Emotion (21) — sentence-level | |
| `affection`, `amusement`, `anger`, `arousal`, `awe`, `bitterness`, `confusion`, `contemplation`, | |
| `contentment`, `determination`, `disgust`, `elation`, `enthusiasm`, `fear`, `helplessness`, | |
| `longing`, `pride`, `relief`, `sadness`, `shame`, `surprise` | |
| Syntax: `<|emotion:elation|>` | |
| ### Prosody (10) | |
| - Sentence-level: `speed_very_slow`, `speed_slow`, `speed_fast`, `speed_very_fast`, | |
| `pitch_low`, `pitch_high`, `expressive_high`, `expressive_low` | |
| - Inline: `pause`, `long_pause` | |
| Syntax: `<|prosody:speed_slow|>`, `<|prosody:pause|>` | |
| ### Style (3) — sentence-level | |
| `singing`, `shouting`, `whispering` | |
| Syntax: `<|style:whispering|>` | |
| ### Sound effects (9) — inline | |
| `cough`, `laughter`, `crying`, `screaming`, `burping`, `humming`, `sigh`, `sniff`, `sneeze` | |
| Syntax: `<|sfx:cough|>Ahem, ...` (tag first, onomatopoeia attached, no space) | |