sentis-supertonic

Supertonic text-to-speech converted to Unity Inference Engine (Sentis) FP16.

Files

Models/
  text_encoder_fp16.sentis        # text/character features
  duration_predictor_fp16.sentis  # per-token durations (alignment)
  vector_estimator_fp16.sentis    # flow-matching latent estimator (iterative)
  vocoder_fp16.sentis             # latent -> waveform
Config/
  tts.json                        # model hyperparameters (latent_dim, normalizer, etc.)
  unicode_indexer.json            # character -> index mapping
VoiceStyles/                      # F1-F5, M1-M5 voice-style vectors (speaker conditioning)
Anthems/                          # sample text
SupertonicTts.cs                  # self-contained Unity Sentis inference

Pipeline

A four-model flow-matching TTS. Synthesis is conditioned on a voice-style vector from VoiceStyles/:

  1. Text encode — characters (via unicode_indexer.json) → text_encoder → text features.
  2. Durationduration_predictor → per-token durations, used to expand text features to frame length.
  3. Vector estimatevector_estimator runs the flow-matching ODE for a few steps (see tts.json) to produce the acoustic latent, conditioned on the voice style.
  4. Vocodervocoder → audio waveform (play via an AudioClip).

A complete self-contained implementation lives in SupertonicTts.cs (text preprocessing + unicode_indexer.json, duration prediction, the flow-matching estimator loop, vocoder, chunking and tts.json config). It returns mono PCM (float[]) you can drop into an AudioClip. Minimal usage:

var tts = new SupertonicTts(BackendType.CPU);
tts.Load(modelRoot); // folder holding Models/, Config/ and VoiceStyles/
float[] pcm = await tts.Synthesize("Hello there.", SupertonicLanguage.en, "M1"); // tts.SampleRate Hz mono

The exact tensor names, the iterative estimator loop, and the chunking/normalization from tts.json are all handled in SupertonicTts.cs.

⚠️ License & attribution

Model weights are released under the OpenRAIL-M license by Supertone (the sample code is MIT). OpenRAIL-M permits redistribution and use subject to use-based restrictions that must be passed through to downstream users. Include the full OpenRAIL-M license text and the upstream use restrictions with this repo, and review them before deploying. These weights derive from the opensource-multilingual Supertonic release.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support