--- license: openrail tags: - rknn - rk3588 - text-to-speech - on-device - supertonic pipeline_tag: text-to-speech library_name: rknn-toolkit2 --- # Supertonic RKNN RK3588 This repository contains static-shape RKNN exports of Supertonic for `rk3588`. The package is organized as a shape matrix so runtime code can select the smallest model that covers the processed text length and required audio length. ## Contents ```text models/ previews/ scripts/ config.json docs/DELIVERY.md conversion/convert_matrix.log ``` Each shape directory contains one RKNN file per module: - `duration_predictor` - `text_encoder` - `vector_estimator` - `vocoder` ## Shape Matrix | Shape | Max text tokens | Approx body chars | Max audio | Package size | | --- | ---: | ---: | ---: | ---: | | `t64_l64` | 64 | 54-55 | 4.46 s | 201.44 MiB | | `t128_l128` | 128 | 118-119 | 8.92 s | 204.76 MiB | | `t256_l256` | 256 | 246-247 | 17.83 s | 213.22 MiB | | `t384_l384` | 384 | 374-375 | 26.75 s | 223.88 MiB | ## Model Size The learned weights are shared across shape variants; RKNN files differ by compiled graph shape and memory planning. These files are non-quantized FP RKNN builds. | Module | Parameters | Weight size | | --- | ---: | ---: | | `duration_predictor` | 0.865 M | 3.30 MiB | | `text_encoder` | 9.001 M | 34.34 MiB | | `vector_estimator` | 64.015 M | 244.20 MiB | | `vocoder` | 25.338 M | 96.66 MiB | | **Total** | **99.219 M** | **378.49 MiB** | ## Runtime Selection Choose the smallest shape that covers both processed text token length and latent length required by predicted duration. ```text latent_length = ceil(duration_seconds * 44100 / 3072) max_duration = latent_length * 3072 / 44100 ``` For longer text, split into sentence or paragraph chunks instead of forcing a larger single fixed shape. ## Download And Run Example This package ships with its own Python scripts under `scripts/`. Install the Python environment and make sure the Supertonic ONNX assets are available: ```bash cd scripts uv sync test -d ../../assets/onnx || git clone https://huggingface.co/Supertone/supertonic-3 ../../assets ``` Run a smoke test on an RK3588 device with `rknn-toolkit-lite2` installed: ```bash cd scripts uv run python benchmark_rknn.py \ --rknn-dir .. \ --onnx-dir ../../assets/onnx \ --text-length 128 \ --latent-length 128 \ --text "Hello from Supertonic." \ --lang en \ --duration-source rknn \ --total-step 4 \ --warmup 1 \ --repeat 3 \ --save-dir results/rknn_smoke_t128_l128 ``` To generate additional static shape variants: ```bash cd scripts uv run python convert_onnx_to_rknn.py \ --onnx-dir ../../assets/onnx \ --out-dir ../models \ --shape-matrix 128x128 256x256 ``` See `docs/DELIVERY.md` for the generated delivery checklist. ## License The accompanying model is released under the OpenRAIL-M License. See `LICENSE`.