Instructions to use Sky-Kim/com.sky.sentis.supertonic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- unity-sentis
How to use Sky-Kim/com.sky.sentis.supertonic with unity-sentis:
string modelName = "[Your model name here].sentis"; Model model = ModelLoader.Load(Application.streamingAssetsPath + "/" + modelName); IWorker engine = WorkerFactory.CreateWorker(BackendType.GPUCompute, model); // Please see provided C# file for more details
- Notebooks
- Google Colab
- Kaggle
sentis-supertonic
Supertonic text-to-speech converted to Unity Inference Engine (Sentis) FP16.
Files
Models/
text_encoder_fp16.sentis # text/character features
duration_predictor_fp16.sentis # per-token durations (alignment)
vector_estimator_fp16.sentis # flow-matching latent estimator (iterative)
vocoder_fp16.sentis # latent -> waveform
Config/
tts.json # model hyperparameters (latent_dim, normalizer, etc.)
unicode_indexer.json # character -> index mapping
VoiceStyles/ # F1-F5, M1-M5 voice-style vectors (speaker conditioning)
Anthems/ # sample text
SupertonicTts.cs # self-contained Unity Sentis inference
Pipeline
A four-model flow-matching TTS. Synthesis is conditioned on a voice-style vector from VoiceStyles/:
- Text encode — characters (via
unicode_indexer.json) →text_encoder→ text features. - Duration —
duration_predictor→ per-token durations, used to expand text features to frame length. - Vector estimate —
vector_estimatorruns the flow-matching ODE for a few steps (seetts.json) to produce the acoustic latent, conditioned on the voice style. - Vocoder —
vocoder→ audio waveform (play via anAudioClip).
A complete self-contained implementation lives in SupertonicTts.cs (text
preprocessing + unicode_indexer.json, duration prediction, the flow-matching estimator loop, vocoder,
chunking and tts.json config). It returns mono PCM (float[]) you can drop into an AudioClip.
Minimal usage:
var tts = new SupertonicTts(BackendType.CPU);
tts.Load(modelRoot); // folder holding Models/, Config/ and VoiceStyles/
float[] pcm = await tts.Synthesize("Hello there.", SupertonicLanguage.en, "M1"); // tts.SampleRate Hz mono
The exact tensor names, the iterative estimator loop, and the chunking/normalization from tts.json
are all handled in SupertonicTts.cs.
⚠️ License & attribution
Model weights are released under the OpenRAIL-M license by
Supertone (the sample code is MIT). OpenRAIL-M permits
redistribution and use subject to use-based restrictions that must be passed through to downstream
users. Include the full OpenRAIL-M license text and the upstream use restrictions with this repo, and
review them before deploying. These weights derive from the opensource-multilingual Supertonic
release.
- Downloads last month
- 16