NeuCodec Encoder (rten port)

This is a Rust-runtime port of the NeuCodec encoder, exported to ONNX and converted for use with rten โ€” a pure-Rust ONNX runtime.

The encoder converts a 16 kHz mono audio reference into discrete codes that drive zero-shot voice cloning in the NeuTTS family of models.

This artifact exists to enable voice cloning in Ragtag, a local-first desktop AI application, under strict architectural constraints: no native ONNX runtime (no onnxruntime / ort), no GPL dependencies. The constraint-clean path through rten produced numerical parity with the original PyTorch encoder.

Model details

  • Source model: neuphonic/neucodec (encoder portion only)
  • Format: .rten (rten's native model format, converted from ONNX)
  • Runtime: rten 0.22+
  • Precision: fp32 (quantized variants may be added later)
  • File size: approximately 1.77 GB
  • Input: 16 kHz mono audio, fixed 20-second window (shorter inputs zero-padded)
  • Output: discrete code tokens consumed by the NeuTTS backbone for voice cloning

Provenance and conversion process

This model was produced from the original NeuCodec encoder through the following pipeline:

  1. PyTorch source: the neucodec Python package, encoder component only
  2. ONNX export: torch.onnx.export via a patched version of the author's export_encoder.py. The patch corrects two issues in the upstream export script (a probe-ordering bug and a stale alias-free patch written against a different module structure), and swaps an ONNX-hostile dynamic operation in UpSample1d/LowPassFilter1d for a fixed buffer. The patched model produces identical codes to the original PyTorch model.
  3. rten conversion: rten-convert from the ONNX export. The full encoder โ€” including the 600M-parameter Wav2Vec2-BERT 2.0 semantic model โ€” converts cleanly with no unsupported operators.

Quality verification

The Rust runtime output was verified against the PyTorch reference:

  • First 12 output tokens: identical between rten and PyTorch
  • Overall token divergence: 1.00% across the full sequence (attributable to floating-point boundary rounding)
  • Reconstruction parity: codes from the rten encoder, when fed through the existing Rust decoder, reconstruct audio matching the Python-encoded reference within tolerance
  • Clone equivalence: clones driven by rten-encoded references are subjectively equivalent in quality to clones driven by Python-encoded references

The chain rten โ‰ˆ ORT โ‰ˆ PyTorch holds end-to-end.

Usage

This model is intended for use within Ragtag's clone pipeline. It is not a standalone TTS system. Using it requires:

  • A NeuTTS backbone model (e.g., neuphonic/neutts-air-q4-gguf)
  • A NeuCodec decoder (the Rust port included in neutts-rs)
  • A G2P frontend producing IPA phonemes (Ragtag uses piper-plus-g2p)
  • The rten runtime crate

The encoder runs on CPU; encoding a 20-second reference takes approximately 5 seconds on Apple Silicon.

Licence and attribution

This model is licensed under Apache License 2.0, derived from the original NeuCodec encoder which is also Apache 2.0 licensed.

When using this model, please retain the attribution to the original authors:

NeuCodec by Neuphonic
https://huggingface.co/neuphonic/neucodec
Licensed under Apache 2.0

The rten conversion and ONNX export patches are contributed by Ragtag / Captivated Ltd, also under Apache 2.0.

Limitations

  • English-only G2P: while the encoder itself is language-agnostic, the current Ragtag pipeline uses an English G2P frontend. Non-English cloning is not currently supported.
  • Fixed 20-second input: shorter references are zero-padded; longer references are truncated. The pipeline targets 12โ€“15 second references for the guided recording flow.
  • Quality depends on reference quality: clone quality tracks reference quality directly. Short, performative, or emotionally emphatic references bleed prosody into the output. Neutral, evenly-delivered, sufficiently-long references produce dramatically better clones.

Related resources

Citation

If you use this work, please cite both the original NeuCodec and this rten port:

@misc{neucodec2024,
  author = {Neuphonic},
  title = {NeuCodec},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/neuphonic/neucodec}
}

@misc{neucodec-encoder-rten,
  author = {Mallett, Leon},
  title = {NeuCodec Encoder (rten port)},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/ragtag-ai/neucodec-encoder-rten}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ragtag-ai/neucodec-encoder-rten

Finetuned
(3)
this model