A newer model is available — please use syvai/hviske-v5.3 instead. v5.3 is the current recommended Danish ASR model from this family and reaches 13.91% strict WER on the CoRal v3 full test set (beam=5). This v5.1 checkpoint is kept as the base for downstream fine-tunes (v5.2, v5.3) and for reproducibility.

hviske-v5.1

Danish ASR model — a 2B-parameter Conformer encoder-decoder trained on ~~3.5M samples (~~16k hours) of Danish speech from syvai/danish-asr-unified.

Results on CoRal v3 test

Split	Baseline WER	Baseline CER	v5.1 WER	v5.1 CER	ElevenLabs scribe_v2 WER	ElevenLabs scribe_v2 CER	OpenAI gpt-4o-transcribe WER	OpenAI gpt-4o-transcribe CER
`read_aloud`	104.73%	60.05%	19.45%	7.24%	18.62%	7.60%	26.34%	11.31%
`conversation`	126.12%	99.84%	25.46%	14.08%	31.38%	19.57%	55.24%	43.63%

WER drop of 85 pp on read-aloud and 101 pp on conversational speech.

ElevenLabs scribe_v2 evaluated via the public /v1/speech-to-text API and OpenAI gpt-4o-transcribe via /v1/audio/transcriptions — both on the full CoRal v3 test splits (n=17,560) with strict normalization (lowercase + punctuation strip + Danish digit-to-word via num2words(lang="da")).

Usage

Setup

pip install transformers==4.57.6 torch soundfile librosa

Note: this model uses native CohereAsr/Whisper classes from transformers 4.57.6. It is not compatible with transformers ≥5.0.

import torch, numpy as np, soundfile as sf
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

processor = AutoProcessor.from_pretrained("syvai/hviske-v5.1", trust_remote_code=True)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "syvai/hviske-v5.1", trust_remote_code=True, dtype=torch.bfloat16
).to("cuda").eval()

audio, sr = sf.read("your_audio.wav")
audio = np.asarray(audio, dtype=np.float32)

hyp = model.transcribe(
    processor=processor,
    language="da",
    audio_arrays=[audio],
    sample_rates=[sr],
)[0]
print(hyp)

Audio > 35 s is automatically chunked. Input is resampled to 16 kHz internally.

Run with vLLM (OpenAI-compatible API)

vLLM can serve the model behind an OpenAI-compatible /v1/audio/transcriptions endpoint — convenient for high-throughput batch transcription and remote serving.

Install

pip install "vllm==0.19.0"
pip install "vllm[audio]" librosa   # audio deps are required for transcription

Start the server

vllm serve syvai/hviske-v5.1 --trust-remote-code --host 0.0.0.0 --port 8000

--trust-remote-code is required — the model ships custom code. The runner (transcription) is auto-detected; no --task flag is needed.

Transcribe — curl

curl -s http://localhost:8000/v1/audio/transcriptions \
  -F "file=@your_audio.wav" \
  -F "model=syvai/hviske-v5.1" \
  -F "language=da" \
  -F "temperature=0"

Transcribe — Python (`openai` client)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

with open("your_audio.wav", "rb") as f:
    resp = client.audio.transcriptions.create(
        model="syvai/hviske-v5.1",
        file=f,
        language="da",
        temperature=0,
    )
print(resp.text)

Notes

language="da" + temperature=0 gives the most accurate, deterministic output.
response_format supports json (default) and text. verbose_json is not supported and returns a 400.
Accepts common audio formats (wav, mp3, flac, ogg); audio is resampled to 16 kHz internally.

Training details

Architecture: 2.06B-parameter Conformer encoder-decoder, full fine-tune
Data: syvai/danish-asr-unified pre-shuffled into 200 shards (3.41M rows) with voxpopuli, ftspeech, coral_read_aloud, coral_conversation, nst_da, nota, cv17 sources
Epochs: 1
Batch: 16 micro × 8 grad-accum = 128 effective batch
Optimizer: bnb AdamW8bit, LR 5e-5 peak, 500-step warmup, cosine decay
Augmentation: SpecAugment (2 freq × 27 bins, 2 time × 100 frames)
Max audio: 31 s (recovers 86% of VoxPopuli long-audio samples)
Precision: bf16 on NVIDIA RTX PRO 6000 Blackwell Max-Q
Wall time: ~47 h

License

This model is released under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).

Permitted: non-commercial use including research, education, evaluation, and personal projects, with attribution.
Not permitted without a separate commercial license: any use by or for a commercial entity, integration into a commercial product or service, or use to generate revenue (directly or indirectly).
Commercial licensing: contact mads@syv.ai.

Downloads last month: 194

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for syvai/hviske-v5.1

Finetunes

2 models

syvai
/

hviske-v5.1

hviske-v5.1

Results on CoRal v3 test

Usage

Setup

Run with vLLM (OpenAI-compatible API)

Install

Start the server

Transcribe — curl

Transcribe — Python (`openai` client)

Training details

License

Model tree for syvai/hviske-v5.1

Datasets used to train syvai/hviske-v5.1

hviske-v5.1

Results on CoRal v3 test

Usage

Setup

Run with vLLM (OpenAI-compatible API)

Install

Start the server

Transcribe — curl

Transcribe — Python (openai client)

Training details

License

Model tree for syvai/hviske-v5.1

Datasets used to train syvai/hviske-v5.1

Transcribe — Python (`openai` client)