Add model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- speech-recognition
|
| 7 |
+
- asr
|
| 8 |
+
- ctc
|
| 9 |
+
- conformer
|
| 10 |
+
- p25
|
| 11 |
+
- imbe
|
| 12 |
+
- vocoder
|
| 13 |
+
- fine-tuned
|
| 14 |
+
- onnx
|
| 15 |
+
datasets:
|
| 16 |
+
- librispeech_asr
|
| 17 |
+
- LIUM/tedlium
|
| 18 |
+
- speechcolab/gigaspeech
|
| 19 |
+
base_model: trunk-reporter/imbe-asr-base-512d
|
| 20 |
+
pipeline_tag: automatic-speech-recognition
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
# IMBE-ASR Base P25 Fine-tuned (48.6M params)
|
| 24 |
+
|
| 25 |
+
P25 radio-adapted variant of [imbe-asr-base-512d](https://huggingface.co/trunk-reporter/imbe-asr-base-512d). Produces readable dispatch transcriptions from real P25 radio traffic.
|
| 26 |
+
|
| 27 |
+
**Code:** [trunk-reporter/imbe-asr](https://github.com/trunk-reporter/imbe-asr) | **Base model:** [imbe-asr-base-512d](https://huggingface.co/trunk-reporter/imbe-asr-base-512d) | **Best model:** [imbe-asr-large-1024d](https://huggingface.co/trunk-reporter/imbe-asr-large-1024d)
|
| 28 |
+
|
| 29 |
+
## Results
|
| 30 |
+
|
| 31 |
+
| Dataset | Greedy WER |
|
| 32 |
+
|---------|-----------|
|
| 33 |
+
| LibriSpeech-IMBE | 19.2% |
|
| 34 |
+
| Real P25 dispatch | Substantially better -- readable transcriptions |
|
| 35 |
+
|
| 36 |
+
Example P25 output: `BATTALION 60 ENGINE 62 MEDIC 61 RESPOND TO 1234 MAIN STREET FOR A MEDICAL EMERGENCY`
|
| 37 |
+
|
| 38 |
+
## Training
|
| 39 |
+
|
| 40 |
+
Fine-tuned from `imbe-asr-base-512d` on ~20 hours of real P25 radio captures, pseudo-labeled with Whisper large-v3 + Qwen3-ASR ensemble. Mixed with 30% base training data to prevent catastrophic forgetting.
|
| 41 |
+
|
| 42 |
+
## Files
|
| 43 |
+
|
| 44 |
+
| File | Format | Size |
|
| 45 |
+
|------|--------|------|
|
| 46 |
+
| `model.safetensors` | SafeTensors | 205 MB |
|
| 47 |
+
| `config.json` | JSON | -- |
|
| 48 |
+
| `model.onnx` | ONNX fp32 | 196 MB |
|
| 49 |
+
| `model_int8.onnx` | ONNX int8 | 58 MB |
|
| 50 |
+
| `stats.npz` | NumPy | 2 KB |
|
| 51 |
+
|
| 52 |
+
## Usage
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
import onnxruntime as ort, numpy as np
|
| 56 |
+
|
| 57 |
+
session = ort.InferenceSession("model_int8.onnx")
|
| 58 |
+
stats = np.load("stats.npz")
|
| 59 |
+
features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32)
|
| 60 |
+
log_probs, out_lengths = session.run(None, {
|
| 61 |
+
"features": features.reshape(1, -1, 170),
|
| 62 |
+
"lengths": np.array([features.shape[0]], dtype=np.int64),
|
| 63 |
+
})
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## Limitations
|
| 67 |
+
|
| 68 |
+
- Pseudo-labeled training data may contain transcription errors.
|
| 69 |
+
- P25 coverage is primarily law enforcement, fire, and EMS from one region. May not generalize to all agencies.
|
| 70 |
+
- A P25 fine-tuned version of the large-1024d model is in progress and will substantially outperform this one.
|
| 71 |
+
- English only.
|