trunk-reporter
/

imbe-asr-base-512d-p25

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

AuggieActual commited on Mar 23

Commit

e9818df

·

verified ·

1 Parent(s): 32f8e42

Add model card

Files changed (1) hide show

README.md +71 -0

README.md ADDED Viewed

	@@ -0,0 +1,71 @@

+---
+license: mit
+language:
+- en
+tags:
+- speech-recognition
+- asr
+- ctc
+- conformer
+- p25
+- imbe
+- vocoder
+- fine-tuned
+- onnx
+datasets:
+- librispeech_asr
+- LIUM/tedlium
+- speechcolab/gigaspeech
+base_model: trunk-reporter/imbe-asr-base-512d
+pipeline_tag: automatic-speech-recognition
+---
+# IMBE-ASR Base P25 Fine-tuned (48.6M params)
+P25 radio-adapted variant of [imbe-asr-base-512d](https://huggingface.co/trunk-reporter/imbe-asr-base-512d). Produces readable dispatch transcriptions from real P25 radio traffic.
+**Code:** [trunk-reporter/imbe-asr](https://github.com/trunk-reporter/imbe-asr) | **Base model:** [imbe-asr-base-512d](https://huggingface.co/trunk-reporter/imbe-asr-base-512d) | **Best model:** [imbe-asr-large-1024d](https://huggingface.co/trunk-reporter/imbe-asr-large-1024d)
+## Results
+| Dataset | Greedy WER |
+|---------|-----------|
+| LibriSpeech-IMBE | 19.2% |
+| Real P25 dispatch | Substantially better -- readable transcriptions |
+Example P25 output: `BATTALION 60 ENGINE 62 MEDIC 61 RESPOND TO 1234 MAIN STREET FOR A MEDICAL EMERGENCY`
+## Training
+Fine-tuned from `imbe-asr-base-512d` on ~20 hours of real P25 radio captures, pseudo-labeled with Whisper large-v3 + Qwen3-ASR ensemble. Mixed with 30% base training data to prevent catastrophic forgetting.
+## Files
+| File | Format | Size |
+|------|--------|------|
+| `model.safetensors` | SafeTensors | 205 MB |
+| `config.json` | JSON | -- |
+| `model.onnx` | ONNX fp32 | 196 MB |
+| `model_int8.onnx` | ONNX int8 | 58 MB |
+| `stats.npz` | NumPy | 2 KB |
+## Usage
+```python
+import onnxruntime as ort, numpy as np
+session = ort.InferenceSession("model_int8.onnx")
+stats = np.load("stats.npz")
+features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32)
+log_probs, out_lengths = session.run(None, {
+    "features": features.reshape(1, -1, 170),
+    "lengths": np.array([features.shape[0]], dtype=np.int64),
+})
+```
+## Limitations
+- Pseudo-labeled training data may contain transcription errors.
+- P25 coverage is primarily law enforcement, fire, and EMS from one region. May not generalize to all agencies.
+- A P25 fine-tuned version of the large-1024d model is in progress and will substantially outperform this one.
+- English only.