higgs-tts-3-4b / PROMPTING.md
SilinMeng0510's picture
ke/add_agentsmd (#9)
5402f01
|
Raw
History Blame Contribute Delete
2.81 kB

Writing target text for Higgs Audio v3 TTS — control tags

How to embed control tags in the input text to steer emotion, prosody, style, and sound effects. For where this fits in the overall workflow, see AGENTS.md.

Format rule (read first)

Every tag is <|category:tag|>. There are two placements:

  • Sentence-level — emotion, style, and prosody's speed_* / pitch_* / expressive_*. Put at the start of the sentence; it colors the whole sentence.
  • Inline — sound effects (sfx) and prosody's pause / long_pause. Insert at the exact position in the sentence where the effect should occur.

sfx gotcha: format is <|sfx:tag|>onomatopoeia, then the line — the tag comes first, immediately followed by the onomatopoeia with no space between them.

Examples

Sentence-level:

<|emotion:elation|>Welcome aboard, we are absolutely thrilled to have you here!
<|style:whispering|>Come closer, I have a little secret to share.
<|prosody:speed_slow|>Take your time, there's really no need to rush.

Inline sfx (tag first, onomatopoeia attached, no space):

<|sfx:cough|>Ahem, welcome everyone, let's get started.
<|sfx:laughter|>Haha, so glad you could make it!

Inline pause (between phrases):

Hello there <|prosody:pause|> and welcome to the show.

Stacking tags (sentence-level emotion + inline sfx in one line):

<|emotion:elation|><|sfx:laughter|>Haha, welcome, welcome, we're so happy you're here!
<|sfx:sigh|>Haah, what a day — but welcome, please make yourself at home.

Tips

  • You can stack tags in one sentence (e.g. a leading emotion tag plus an inline sfx).
  • speed_very_slow only slows the model to roughly ~5s; for slower delivery, insert <|prosody:long_pause|> between phrases instead.
  • Only the tags below are recognized — anything else degrades output or gets read literally.

Full tag catalog (43)

Emotion (21) — sentence-level

affection, amusement, anger, arousal, awe, bitterness, confusion, contemplation, contentment, determination, disgust, elation, enthusiasm, fear, helplessness, longing, pride, relief, sadness, shame, surprise

Syntax: <|emotion:elation|>

Prosody (10)

  • Sentence-level: speed_very_slow, speed_slow, speed_fast, speed_very_fast, pitch_low, pitch_high, expressive_high, expressive_low
  • Inline: pause, long_pause

Syntax: <|prosody:speed_slow|>, <|prosody:pause|>

Style (3) — sentence-level

singing, shouting, whispering

Syntax: <|style:whispering|>

Sound effects (9) — inline

cough, laughter, crying, screaming, burping, humming, sigh, sniff, sneeze

Syntax: <|sfx:cough|>Ahem, ... (tag first, onomatopoeia attached, no space)