bosonai
/

higgs-tts-3-4b

higgs_multimodal_qwen3

text-generation

speech-generation

expressive-speech

controllable-tts

multilingual-tts

Model card Files Files and versions

higgs-tts-3-4b / PROMPTING.md

SilinMeng0510's picture

ke/add_agentsmd (#9)

5402f01 16 days ago

|

History Blame Contribute Delete

2.81 kB

	# Writing target text for Higgs Audio v3 TTS — control tags

	How to embed control tags in the `input` text to steer emotion, prosody, style, and sound effects.
	For where this fits in the overall workflow, see [AGENTS.md](./AGENTS.md).

	## Format rule (read first)

	Every tag is `<\|category:tag\|>`. There are two placements:

	- Sentence-level — emotion, style, and prosody's `speed_* / pitch_* / expressive_*`.
	Put at the start of the sentence; it colors the whole sentence.
	- Inline — sound effects (`sfx`) and prosody's `pause / long_pause`.
	Insert at the exact position in the sentence where the effect should occur.

	`sfx` gotcha: format is `<\|sfx:tag\|>onomatopoeia, then the line` — the tag comes first,
	immediately followed by the onomatopoeia with no space between them.

	## Examples

	Sentence-level:
	```
	<\|emotion:elation\|>Welcome aboard, we are absolutely thrilled to have you here!
	<\|style:whispering\|>Come closer, I have a little secret to share.
	<\|prosody:speed_slow\|>Take your time, there's really no need to rush.
	```

	Inline sfx (tag first, onomatopoeia attached, no space):
	```
	<\|sfx:cough\|>Ahem, welcome everyone, let's get started.
	<\|sfx:laughter\|>Haha, so glad you could make it!
	```

	Inline pause (between phrases):
	```
	Hello there <\|prosody:pause\|> and welcome to the show.
	```

	Stacking tags (sentence-level emotion + inline sfx in one line):
	```
	<\|emotion:elation\|><\|sfx:laughter\|>Haha, welcome, welcome, we're so happy you're here!
	<\|sfx:sigh\|>Haah, what a day — but welcome, please make yourself at home.
	```

	## Tips

	- You can stack tags in one sentence (e.g. a leading emotion tag plus an inline sfx).
	- `speed_very_slow` only slows the model to roughly ~5s; for slower delivery, insert
	`<\|prosody:long_pause\|>` between phrases instead.
	- Only the tags below are recognized — anything else degrades output or gets read literally.

	## Full tag catalog (43)

	### Emotion (21) — sentence-level
	`affection`, `amusement`, `anger`, `arousal`, `awe`, `bitterness`, `confusion`, `contemplation`,
	`contentment`, `determination`, `disgust`, `elation`, `enthusiasm`, `fear`, `helplessness`,
	`longing`, `pride`, `relief`, `sadness`, `shame`, `surprise`

	Syntax: `<\|emotion:elation\|>`

	### Prosody (10)
	- Sentence-level: `speed_very_slow`, `speed_slow`, `speed_fast`, `speed_very_fast`,
	`pitch_low`, `pitch_high`, `expressive_high`, `expressive_low`
	- Inline: `pause`, `long_pause`

	Syntax: `<\|prosody:speed_slow\|>`, `<\|prosody:pause\|>`

	### Style (3) — sentence-level
	`singing`, `shouting`, `whispering`

	Syntax: `<\|style:whispering\|>`

	### Sound effects (9) — inline
	`cough`, `laughter`, `crying`, `screaming`, `burping`, `humming`, `sigh`, `sniff`, `sneeze`

	Syntax: `<\|sfx:cough\|>Ahem, ...` (tag first, onomatopoeia attached, no space)