Text-to-Speech
Transformers
Safetensors
higgs_multimodal_qwen3
text-generation
speech-generation
voice-agent
expressive-speech
controllable-tts
multilingual-tts
Instructions to use bosonai/higgs-tts-3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bosonai/higgs-tts-3-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="bosonai/higgs-tts-3-4b")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bosonai/higgs-tts-3-4b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 2,812 Bytes
5402f01 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # Writing target text for Higgs Audio v3 TTS — control tags
How to embed control tags in the `input` text to steer emotion, prosody, style, and sound effects.
For where this fits in the overall workflow, see [AGENTS.md](./AGENTS.md).
## Format rule (read first)
Every tag is `<|category:tag|>`. There are **two placements**:
- **Sentence-level** — emotion, style, and prosody's `speed_* / pitch_* / expressive_*`.
Put at the **start of the sentence**; it colors the whole sentence.
- **Inline** — sound effects (`sfx`) and prosody's `pause / long_pause`.
Insert **at the exact position** in the sentence where the effect should occur.
**`sfx` gotcha:** format is `<|sfx:tag|>onomatopoeia, then the line` — the tag comes **first**,
immediately followed by the onomatopoeia with **no space** between them.
## Examples
Sentence-level:
```
<|emotion:elation|>Welcome aboard, we are absolutely thrilled to have you here!
<|style:whispering|>Come closer, I have a little secret to share.
<|prosody:speed_slow|>Take your time, there's really no need to rush.
```
Inline sfx (tag first, onomatopoeia attached, no space):
```
<|sfx:cough|>Ahem, welcome everyone, let's get started.
<|sfx:laughter|>Haha, so glad you could make it!
```
Inline pause (between phrases):
```
Hello there <|prosody:pause|> and welcome to the show.
```
Stacking tags (sentence-level emotion + inline sfx in one line):
```
<|emotion:elation|><|sfx:laughter|>Haha, welcome, welcome, we're so happy you're here!
<|sfx:sigh|>Haah, what a day — but welcome, please make yourself at home.
```
## Tips
- You can stack tags in one sentence (e.g. a leading emotion tag plus an inline sfx).
- `speed_very_slow` only slows the model to roughly ~5s; for slower delivery, insert
`<|prosody:long_pause|>` between phrases instead.
- Only the tags below are recognized — anything else degrades output or gets read literally.
## Full tag catalog (43)
### Emotion (21) — sentence-level
`affection`, `amusement`, `anger`, `arousal`, `awe`, `bitterness`, `confusion`, `contemplation`,
`contentment`, `determination`, `disgust`, `elation`, `enthusiasm`, `fear`, `helplessness`,
`longing`, `pride`, `relief`, `sadness`, `shame`, `surprise`
Syntax: `<|emotion:elation|>`
### Prosody (10)
- Sentence-level: `speed_very_slow`, `speed_slow`, `speed_fast`, `speed_very_fast`,
`pitch_low`, `pitch_high`, `expressive_high`, `expressive_low`
- Inline: `pause`, `long_pause`
Syntax: `<|prosody:speed_slow|>`, `<|prosody:pause|>`
### Style (3) — sentence-level
`singing`, `shouting`, `whispering`
Syntax: `<|style:whispering|>`
### Sound effects (9) — inline
`cough`, `laughter`, `crying`, `screaming`, `burping`, `humming`, `sigh`, `sniff`, `sneeze`
Syntax: `<|sfx:cough|>Ahem, ...` (tag first, onomatopoeia attached, no space)
|