--- license: mit language: - en tags: - speech-recognition - asr - ctc - conformer - p25 - imbe - vocoder - fine-tuned - onnx datasets: - librispeech_asr - LIUM/tedlium - speechcolab/gigaspeech base_model: trunk-reporter/imbe-asr-base-512d pipeline_tag: automatic-speech-recognition --- # IMBE-ASR Base P25 Fine-tuned (48.6M params) P25 radio-adapted variant of [imbe-asr-base-512d](https://huggingface.co/trunk-reporter/imbe-asr-base-512d). Produces readable transcriptions from real P25 radio traffic. **Code:** [trunk-reporter/imbe-asr](https://github.com/trunk-reporter/imbe-asr) | **Base model:** [imbe-asr-base-512d](https://huggingface.co/trunk-reporter/imbe-asr-base-512d) | **Best model:** [imbe-asr-large-1024d](https://huggingface.co/trunk-reporter/imbe-asr-large-1024d) ## Results Evaluated on 50 real P25 labeled samples using greedy decode vs. beam search with the included 3-gram KenLM: | Decode method | WER | CER | |---|---|---| | Greedy | 37.1% | 14.8% | | Beam + KenLM (α=0.5, β=1.0) | **19.2%** | **9.5%** | The KenLM reduces WER by ~18 percentage points. **Beam search with the included LM is strongly recommended.** Example P25 output: `BATTALION 60 ENGINE 62 MEDIC 61 RESPOND TO 1234 MAIN STREET FOR A MEDICAL EMERGENCY` ## Training Fine-tuned from `imbe-asr-base-512d` on ~20 hours of real P25 radio captures, pseudo-labeled with Whisper large-v3 + Qwen3-ASR ensemble. Mixed with 30% base training data to prevent catastrophic forgetting. ## Files | File | Format | Size | |------|--------|------| | `model.safetensors` | SafeTensors | 205 MB | | `config.json` | JSON | — | | `model.onnx` | ONNX fp32 | 196 MB | | `model_int8.onnx` | ONNX int8 | 58 MB | | `stats.npz` | NumPy | 2 KB | | `lm/3gram.bin` | KenLM trie (3-gram, q8) | 501 MB | | `lm/unigrams.txt` | Vocabulary | 9 MB | ## Usage ### Greedy decode (fast, no dependencies) ```python import onnxruntime as ort, numpy as np session = ort.InferenceSession("model_int8.onnx") stats = np.load("stats.npz") features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32) log_probs, out_lengths = session.run(None, { "features": features.reshape(1, -1, 170), "lengths": np.array([features.shape[0]], dtype=np.int64), }) ``` ### Beam search + KenLM (recommended, ~18pp WER improvement) ```python import onnxruntime as ort, numpy as np from pyctcdecode import build_ctcdecoder import kenlm # Load model session = ort.InferenceSession("model_int8.onnx") stats = np.load("stats.npz") # Build decoder with KenLM — tuned params for P25 VOCAB = list(" ABCDEFGHIJKLMNOPQRSTUVWXYZ'") # 28 chars + blank at index 0 labels = [""] + VOCAB decoder = build_ctcdecoder( labels=labels, kenlm_model_path="lm/3gram.bin", unigrams=open("lm/unigrams.txt").read().splitlines(), alpha=0.5, # LM weight — tuned on P25 data beta=1.0, # word insertion bonus ) # Run inference features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32) log_probs, out_lengths = session.run(None, { "features": features.reshape(1, -1, 170), "lengths": np.array([features.shape[0]], dtype=np.int64), }) text = decoder.decode(log_probs[0, :out_lengths[0]], beam_width=100) ``` Install dependencies: `pip install pyctcdecode kenlm` ## Limitations - Pseudo-labeled training data may contain transcription errors. - P25 coverage is primarily law enforcement, fire, and EMS from one region. May not generalize to all agencies. - A P25 fine-tuned version of the large-1024d model is in progress and will substantially outperform this one. - English only.