---
license: apache-2.0
base_model: hexgrad/Kokoro-82M
pipeline_tag: text-to-speech
tags:
  - kokoro
  - tflite
  - litert
  - ai-edge-litert
  - text-to-speech
  - custom-op
  - edge-ai
  - experimental
---

# Kokoro 82M LiteRT Runtime Preview

This repository packages the current Kokoro 82M LiteRT/TFLite runtime used by
the Reachy edge robot-agent project.

It is sourced from [`hexgrad/Kokoro-82M`](https://huggingface.co/hexgrad/Kokoro-82M)
and contains the accepted text-to-decoder-input frontend bucket plus the accepted
merged decoder/vocoder graph.

## Runtime Shape

```text
text
  -> Kokoro KPipeline G2P/tokenization
  -> frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite
  -> kokoro_decoder_source_stft_merged.tflite + KokoroSourceStft
  -> WAV bytes
```

The runtime still uses the `kokoro` Python package for `KPipeline.g2p()` and
`KPipeline.en_tokenize()`. It must not instantiate Kokoro `KModel` in the
request path. Neural inference is served by the LiteRT frontend bucket and the
LiteRT decoder/vocoder.

## Included Artifacts

```text
kokoro_litert_manifest.json
config.json
voices/af_heart.npz
frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite
kokoro_decoder_source_stft_merged.tflite
custom_ops/kokoro_source_stft_custom_op_native.cc
custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so
custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
reports/kokoro_bucketed_frontend_litert_parity_report.json
reports/kokoro_decoder_source_stft_merged_probe.json
```

The current frontend bucket is `T=48`, with max `128` decoder frames and `256`
F0/noise frames. Longer or multi-segment text must be deterministically chunked
and repacked before inference.

## Jetson / ARM64 Status

The package includes custom op builds for local Linux x86-64 development and
Jetson/Linux aarch64 deployment:

```text
custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so
custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
```

The aarch64 binary was cross-compiled from
`custom_ops/kokoro_source_stft_custom_op_native.cc` with:

```bash
aarch64-linux-gnu-g++ -std=c++17 -O2 -fPIC \
  -fno-math-errno \
  -fno-trapping-math \
  -ffp-contract=fast \
  -static-libstdc++ \
  -static-libgcc \
  -Wl,--exclude-libs,ALL \
  -shared \
  custom_ops/kokoro_source_stft_custom_op_native.cc \
  -o custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
```

The expected aarch64 SHA-256 is recorded in `kokoro_litert_manifest.json` under
`decoder_vocoder.custom_op.linux_aarch64_sha256`. Jetson target-device loading
and synthesis benchmarking are still required.

## Validation

Frontend bucket acceptance is recorded in:

```text
reports/kokoro_bucketed_frontend_litert_parity_report.json
```

The local acceptance result for this package:

```text
passed: true
bucket: T=48
max observed frontend float abs error: 0.000812530517578125
pred_dur exact: true
alignment exact: true
valid_frames exact: true
```

Decoder/vocoder acceptance is recorded in:

```text
reports/kokoro_decoder_source_stft_merged_probe.json
```

The merged decoder is a one-interpreter graph connected through the
`KokoroSourceStft` custom op. The custom op remains a CPU custom-op island unless
implemented as a GPU-capable custom kernel or delegate.

## Minimal Local Smoke

In the Reachy robot-agent repo:

```bash
PYTHONPATH=src uv run --extra tts --extra kokoro-frontend \
  python scripts/kokoro_litert_runtime_smoke.py \
  --text "Hi Will." \
  --output /tmp/robot-kokoro-litert/runtime_smoke.wav
```

Expected output is a mono 24 kHz WAV file.

## License

The upstream Kokoro model card lists `hexgrad/Kokoro-82M` under Apache-2.0. This
converted runtime package is distributed under Apache-2.0 as a derived runtime
form. See `LICENSE` and `NOTICE`.