Automatic Speech Recognition
Transformers
Safetensors
PyTorch
arkasr
text-generation
speech
audio
vllm
ark-asr
custom_code
Eval Results
Instructions to use AutoArk-AI/ARK-ASR-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AutoArk-AI/ARK-ASR-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="AutoArk-AI/ARK-ASR-3B", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("AutoArk-AI/ARK-ASR-3B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update ARK-ASR-3B eval metrics
Browse files- .eval_results/open_asr_leaderboard.yaml +26 -16
- README.md +9 -2
.eval_results/open_asr_leaderboard.yaml
CHANGED
|
@@ -1,8 +1,18 @@
|
|
| 1 |
- dataset:
|
| 2 |
id: hf-audio/open-asr-leaderboard
|
| 3 |
task_id: mean_wer
|
| 4 |
-
value: 5.
|
| 5 |
-
date: '2026-06-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
source:
|
| 7 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 8 |
name: open-asr-leaderboard
|
|
@@ -11,8 +21,8 @@
|
|
| 11 |
- dataset:
|
| 12 |
id: hf-audio/open-asr-leaderboard
|
| 13 |
task_id: ami_wer
|
| 14 |
-
value: 8.
|
| 15 |
-
date: '2026-06-
|
| 16 |
source:
|
| 17 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 18 |
name: open-asr-leaderboard
|
|
@@ -21,8 +31,8 @@
|
|
| 21 |
- dataset:
|
| 22 |
id: hf-audio/open-asr-leaderboard
|
| 23 |
task_id: earnings22_wer
|
| 24 |
-
value: 8.
|
| 25 |
-
date: '2026-06-
|
| 26 |
source:
|
| 27 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 28 |
name: open-asr-leaderboard
|
|
@@ -31,8 +41,8 @@
|
|
| 31 |
- dataset:
|
| 32 |
id: hf-audio/open-asr-leaderboard
|
| 33 |
task_id: gigaspeech_wer
|
| 34 |
-
value:
|
| 35 |
-
date: '2026-06-
|
| 36 |
source:
|
| 37 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 38 |
name: open-asr-leaderboard
|
|
@@ -41,8 +51,8 @@
|
|
| 41 |
- dataset:
|
| 42 |
id: hf-audio/open-asr-leaderboard
|
| 43 |
task_id: librispeech_clean_wer
|
| 44 |
-
value: 1.
|
| 45 |
-
date: '2026-06-
|
| 46 |
source:
|
| 47 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 48 |
name: open-asr-leaderboard
|
|
@@ -51,8 +61,8 @@
|
|
| 51 |
- dataset:
|
| 52 |
id: hf-audio/open-asr-leaderboard
|
| 53 |
task_id: librispeech_other_wer
|
| 54 |
-
value: 2.
|
| 55 |
-
date: '2026-06-
|
| 56 |
source:
|
| 57 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 58 |
name: open-asr-leaderboard
|
|
@@ -61,8 +71,8 @@
|
|
| 61 |
- dataset:
|
| 62 |
id: hf-audio/open-asr-leaderboard
|
| 63 |
task_id: spgispeech_wer
|
| 64 |
-
value: 2.
|
| 65 |
-
date: '2026-06-
|
| 66 |
source:
|
| 67 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 68 |
name: open-asr-leaderboard
|
|
@@ -71,8 +81,8 @@
|
|
| 71 |
- dataset:
|
| 72 |
id: hf-audio/open-asr-leaderboard
|
| 73 |
task_id: voxpopuli_wer
|
| 74 |
-
value: 5.
|
| 75 |
-
date: '2026-06-
|
| 76 |
source:
|
| 77 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 78 |
name: open-asr-leaderboard
|
|
|
|
| 1 |
- dataset:
|
| 2 |
id: hf-audio/open-asr-leaderboard
|
| 3 |
task_id: mean_wer
|
| 4 |
+
value: 5.04
|
| 5 |
+
date: '2026-06-23'
|
| 6 |
+
source:
|
| 7 |
+
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 8 |
+
name: open-asr-leaderboard
|
| 9 |
+
user: hf-audio
|
| 10 |
+
|
| 11 |
+
- dataset:
|
| 12 |
+
id: hf-audio/open-asr-leaderboard
|
| 13 |
+
task_id: rtfx
|
| 14 |
+
value: 490.98
|
| 15 |
+
date: '2026-06-23'
|
| 16 |
source:
|
| 17 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 18 |
name: open-asr-leaderboard
|
|
|
|
| 21 |
- dataset:
|
| 22 |
id: hf-audio/open-asr-leaderboard
|
| 23 |
task_id: ami_wer
|
| 24 |
+
value: 8.79
|
| 25 |
+
date: '2026-06-23'
|
| 26 |
source:
|
| 27 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 28 |
name: open-asr-leaderboard
|
|
|
|
| 31 |
- dataset:
|
| 32 |
id: hf-audio/open-asr-leaderboard
|
| 33 |
task_id: earnings22_wer
|
| 34 |
+
value: 8.23
|
| 35 |
+
date: '2026-06-23'
|
| 36 |
source:
|
| 37 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 38 |
name: open-asr-leaderboard
|
|
|
|
| 41 |
- dataset:
|
| 42 |
id: hf-audio/open-asr-leaderboard
|
| 43 |
task_id: gigaspeech_wer
|
| 44 |
+
value: 6.98
|
| 45 |
+
date: '2026-06-23'
|
| 46 |
source:
|
| 47 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 48 |
name: open-asr-leaderboard
|
|
|
|
| 51 |
- dataset:
|
| 52 |
id: hf-audio/open-asr-leaderboard
|
| 53 |
task_id: librispeech_clean_wer
|
| 54 |
+
value: 1.03
|
| 55 |
+
date: '2026-06-23'
|
| 56 |
source:
|
| 57 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 58 |
name: open-asr-leaderboard
|
|
|
|
| 61 |
- dataset:
|
| 62 |
id: hf-audio/open-asr-leaderboard
|
| 63 |
task_id: librispeech_other_wer
|
| 64 |
+
value: 2.35
|
| 65 |
+
date: '2026-06-23'
|
| 66 |
source:
|
| 67 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 68 |
name: open-asr-leaderboard
|
|
|
|
| 71 |
- dataset:
|
| 72 |
id: hf-audio/open-asr-leaderboard
|
| 73 |
task_id: spgispeech_wer
|
| 74 |
+
value: 2.46
|
| 75 |
+
date: '2026-06-23'
|
| 76 |
source:
|
| 77 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 78 |
name: open-asr-leaderboard
|
|
|
|
| 81 |
- dataset:
|
| 82 |
id: hf-audio/open-asr-leaderboard
|
| 83 |
task_id: voxpopuli_wer
|
| 84 |
+
value: 5.47
|
| 85 |
+
date: '2026-06-23'
|
| 86 |
source:
|
| 87 |
url: https://huggingface.co/datasets/hf-audio/open-asr-leaderboard
|
| 88 |
name: open-asr-leaderboard
|
README.md
CHANGED
|
@@ -44,7 +44,7 @@ repository: https://github.com/AutoArk/open-audio-opd
|
|
| 44 |
|
| 45 |
</div>
|
| 46 |
|
| 47 |
-
> **TL;DR** ARK-ASR-3B is a multilingual automatic speech recognition model. It achieves current state-of-the-art results on the Hugging Face Open ASR Leaderboard English short-form benchmark, with an average WER of **5.
|
| 48 |
|
| 49 |
## Abstract
|
| 50 |
|
|
@@ -84,9 +84,16 @@ The following results are from the Hugging Face [Open ASR Leaderboard](https://h
|
|
| 84 |
|
| 85 |
| Model | AMI | Earnings22 | GigaSpeech | LS Clean | LS Other | SPGISpeech | VoxPopuli | Avg |
|
| 86 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 87 |
-
| ARK-ASR-3B | **8.
|
| 88 |
| ARK-ASR-0.6B | 10.02% | 9.77% | 8.00% | 1.53% | 3.51% | 2.63% | 6.31% | 5.97% |
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
## Inference
|
| 91 |
|
| 92 |
Run ASR inference with Hugging Face Transformers:
|
|
|
|
| 44 |
|
| 45 |
</div>
|
| 46 |
|
| 47 |
+
> **TL;DR** ARK-ASR-3B is a multilingual automatic speech recognition model. It achieves current state-of-the-art results on the Hugging Face Open ASR Leaderboard English short-form benchmark, with an average WER of **5.04%** and RTFx of **490.98** across AMI, Earnings22, GigaSpeech, LibriSpeech, SPGISpeech, and VoxPopuli. The accompanying training, inference, and evaluation code is available at [AutoArk/open-audio-opd](https://github.com/AutoArk/open-audio-opd).
|
| 48 |
|
| 49 |
## Abstract
|
| 50 |
|
|
|
|
| 84 |
|
| 85 |
| Model | AMI | Earnings22 | GigaSpeech | LS Clean | LS Other | SPGISpeech | VoxPopuli | Avg |
|
| 86 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 87 |
+
| ARK-ASR-3B | **8.79%** | **8.23%** | **6.98%** | **1.03%** | **2.35%** | **2.46%** | **5.47%** | **5.04%** |
|
| 88 |
| ARK-ASR-0.6B | 10.02% | 9.77% | 8.00% | 1.53% | 3.51% | 2.63% | 6.31% | 5.97% |
|
| 89 |
|
| 90 |
+
### Chinese CER
|
| 91 |
+
|
| 92 |
+
| Model | AISHELL-1 | WenetSpeech test meeting | WenetSpeech test-net |
|
| 93 |
+
| --- | ---: | ---: | ---: |
|
| 94 |
+
| ARK-ASR-3B | **1.80%** | **4.97%** | **4.58%** |
|
| 95 |
+
| ARK-ASR-0.6B | 2.02% | 5.92% | 4.96% |
|
| 96 |
+
|
| 97 |
## Inference
|
| 98 |
|
| 99 |
Run ASR inference with Hugging Face Transformers:
|