NbAiLab / nb-asr-beta-qwen06b-041126-forced-aligner
Norwegian forced alignment for the NB-ASR beta program
This repository hosts a Norwegian forced-alignment checkpoint for the NB-ASR beta group, built on top of the Qwen3 ForcedAligner stack and adapted for the project's Norwegian speech workflows.
Internal reference: 041126-forced-aligner
Uploaded: 21.04.2026
This model is intended for beta evaluation and pipeline integration. It is designed to be used either:
- as a standalone aligner when you already have reference text, or
- alongside an NB-ASR beta transcription model when you want timestamps in the same workflow.
Beta notice: this checkpoint is for controlled evaluation and integration work. APIs, naming, packaging advice, and companion repo IDs may still change during the beta period.
Access Lifecycle (NB-ASR Beta Policy)
- Initial state on creation: private
- Development/iteration state: private
- Later beta-release step: public + gated
- After release update: add to
https://huggingface.co/collections/NbAiLab/nb-asr-beta
This ordering is mandatory for nb-asr-beta repositories.
Provenance
This HF repo was prepared from the local training artifact:
olivia-4gpu-20260411-1111/checkpoint-6000
The packaged checkpoint uses:
model.safetensorsandgeneration_config.jsonfrom the new local training outputconfig.jsonfromassets/TEMPLATE_FORCED_ALIGNER_config.json, which replaces the default forced-aligner config- tokenizer and processor support files copied from the known-working repo
hfrepos/nb-asr-beta1-qwen-forced-aligner
This follows the project rule that the default forced-aligner config should not be used as-is for future releases.
Overview
The aligner predicts time-aligned spans for reference text given an audio input. In practical terms, it is useful when you want word- or segment-level timing for known text, or when you want to augment an ASR system with alignment output.
This repository follows the Qwen forced-alignment usage pattern and is intended to work with the broader NB-ASR evaluation setup. For upstream architecture and package behavior, use the public Qwen references as the implementation baseline:
- Base model: Qwen/Qwen3-ForcedAligner-0.6B
- Technical report: Qwen3-ASR Technical Report
Recommended Installation
The recommended route is the official qwen-asr package, which provides the compatible classes and loading behavior for Qwen ASR and forced-aligner checkpoints.
pip install -U "qwen-asr"
Optional packages for supported GPU setups:
pip install -U flash-attn --no-build-isolation
If you want the broader serving stack as well:
pip install -U "qwen-asr[vllm]"
Quickstart
Load the aligner directly and call align with audio, text, and language.
import torch
from qwen_asr import Qwen3ForcedAligner
model = Qwen3ForcedAligner.from_pretrained(
"NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner",
dtype=torch.bfloat16,
device_map="cuda:0",
# attn_implementation="flash_attention_2",
)
results = model.align(
audio="audio.wav",
text="Hun er oversatt til en rekke språk, men ikke norsk.",
language="Norwegian",
)
print(results[0])
first = results[0][0]
print(first.text, first.start_time, first.end_time)
Use Together With an NB-ASR Beta ASR Model
For transcription plus timestamps, load the ASR model and point forced_aligner to this repo.
import torch
from qwen_asr import Qwen3ASRModel
ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner"
model = Qwen3ASRModel.from_pretrained(
ASR_MODEL,
dtype=torch.bfloat16,
device_map="cuda:0",
max_inference_batch_size=8,
max_new_tokens=1024,
forced_aligner=ALIGNER_MODEL,
forced_aligner_kwargs=dict(
dtype=torch.bfloat16,
device_map="cuda:0",
),
)
results = model.transcribe(
audio=["/path/to/utt1.wav", "/path/to/utt2.wav"],
language=["Norwegian", "Norwegian"],
return_time_stamps=True,
)
for r in results:
print(r.language, r.text, r.time_stamps[0] if r.time_stamps else None)
vLLM Pattern
When the ASR side runs with vLLM, keep the aligner as a dedicated companion checkpoint.
import torch
from qwen_asr import Qwen3ASRModel
if __name__ == "__main__":
ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner"
model = Qwen3ASRModel.LLM(
model=ASR_MODEL,
gpu_memory_utilization=0.7,
max_inference_batch_size=32,
max_new_tokens=4096,
forced_aligner=ALIGNER_MODEL,
forced_aligner_kwargs=dict(
dtype=torch.bfloat16,
device_map="cuda:0",
),
)
results = model.transcribe(
audio=["/path/to/audio.wav"],
language=["Norwegian"],
return_time_stamps=True,
)
for r in results:
print(r.language, r.text, r.time_stamps)
Demo and Local Download
For Gradio-style testing with the Qwen demo tooling, pass this repo as the aligner checkpoint:
--aligner-checkpoint NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner
To download locally:
pip install -U "huggingface_hub[cli]"
hf download NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner --local-dir ./nb-asr-beta-qwen06b-041126-forced-aligner
Included Files
This staged HF repository includes:
model.safetensorsgeneration_config.jsonconfig.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonvocab.jsonmerges.txtadded_tokens.jsonchat_template.jinjapreprocessor_config.jsonaudio.wav
Training-state files such as optimizer state, scheduler state, RNG snapshots, and trainer metadata were intentionally left out of this HF package.
Intended Use
This model is intended for:
- beta evaluation,
- alignment experiments,
- timestamp generation in NB-ASR workflows,
- and integration into internal or semi-controlled ASR pipelines.
It should not be presented as a final public production release without additional validation, packaging review, and naming review.
Acknowledgements
This model is based on the open Qwen3-ASR framework and adapted by NB-ASR project at the National Library.
The following persons have contributed to the dataset creation and training:
- Freddy Wetjen
- Thea Tollersrud
- Phoebe Parsons
- Per Egil Kummervold
- Downloads last month
- 3