--- license: apache-2.0 base_model: hexgrad/Kokoro-82M pipeline_tag: text-to-speech tags: - kokoro - tflite - litert - ai-edge-litert - text-to-speech - custom-op - edge-ai - experimental --- # Kokoro 82M LiteRT Runtime Preview This repository packages the current Kokoro 82M LiteRT/TFLite runtime used by the Reachy edge robot-agent project. It is sourced from [`hexgrad/Kokoro-82M`](https://huggingface.co/hexgrad/Kokoro-82M) and contains the accepted text-to-decoder-input frontend bucket plus the accepted merged decoder/vocoder graph. ## Runtime Shape ```text text -> Kokoro KPipeline G2P/tokenization -> frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite -> kokoro_decoder_source_stft_merged.tflite + KokoroSourceStft -> WAV bytes ``` The runtime still uses the `kokoro` Python package for `KPipeline.g2p()` and `KPipeline.en_tokenize()`. It must not instantiate Kokoro `KModel` in the request path. Neural inference is served by the LiteRT frontend bucket and the LiteRT decoder/vocoder. ## Included Artifacts ```text kokoro_litert_manifest.json config.json voices/af_heart.npz frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite kokoro_decoder_source_stft_merged.tflite custom_ops/kokoro_source_stft_custom_op_native.cc custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so reports/kokoro_bucketed_frontend_litert_parity_report.json reports/kokoro_decoder_source_stft_merged_probe.json ``` The current frontend bucket is `T=48`, with max `128` decoder frames and `256` F0/noise frames. Longer or multi-segment text must be deterministically chunked and repacked before inference. ## Jetson / ARM64 Status The package includes custom op builds for local Linux x86-64 development and Jetson/Linux aarch64 deployment: ```text custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so ``` The aarch64 binary was cross-compiled from `custom_ops/kokoro_source_stft_custom_op_native.cc` with: ```bash aarch64-linux-gnu-g++ -std=c++17 -O2 -fPIC \ -fno-math-errno \ -fno-trapping-math \ -ffp-contract=fast \ -static-libstdc++ \ -static-libgcc \ -Wl,--exclude-libs,ALL \ -shared \ custom_ops/kokoro_source_stft_custom_op_native.cc \ -o custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so ``` The expected aarch64 SHA-256 is recorded in `kokoro_litert_manifest.json` under `decoder_vocoder.custom_op.linux_aarch64_sha256`. Jetson target-device loading and synthesis benchmarking are still required. ## Validation Frontend bucket acceptance is recorded in: ```text reports/kokoro_bucketed_frontend_litert_parity_report.json ``` The local acceptance result for this package: ```text passed: true bucket: T=48 max observed frontend float abs error: 0.000812530517578125 pred_dur exact: true alignment exact: true valid_frames exact: true ``` Decoder/vocoder acceptance is recorded in: ```text reports/kokoro_decoder_source_stft_merged_probe.json ``` The merged decoder is a one-interpreter graph connected through the `KokoroSourceStft` custom op. The custom op remains a CPU custom-op island unless implemented as a GPU-capable custom kernel or delegate. ## Minimal Local Smoke In the Reachy robot-agent repo: ```bash PYTHONPATH=src uv run --extra tts --extra kokoro-frontend \ python scripts/kokoro_litert_runtime_smoke.py \ --text "Hi Will." \ --output /tmp/robot-kokoro-litert/runtime_smoke.wav ``` Expected output is a mono 24 kHz WAV file. ## License The upstream Kokoro model card lists `hexgrad/Kokoro-82M` under Apache-2.0. This converted runtime package is distributed under Apache-2.0 as a derived runtime form. See `LICENSE` and `NOTICE`.