NbAiLab / nb-asr-beta-qwen06b-041126-forced-aligner

Norwegian forced alignment for the NB-ASR beta program

This repository hosts a Norwegian forced-alignment checkpoint for the NB-ASR beta group, built on top of the Qwen3 ForcedAligner stack and adapted for the project's Norwegian speech workflows.

Internal reference: 041126-forced-aligner

Uploaded: 21.04.2026

This model is intended for beta evaluation and pipeline integration. It is designed to be used either:

as a standalone aligner when you already have reference text, or
alongside an NB-ASR beta transcription model when you want timestamps in the same workflow.

Beta notice: this checkpoint is for controlled evaluation and integration work. APIs, naming, packaging advice, and companion repo IDs may still change during the beta period.

Access Lifecycle (NB-ASR Beta Policy)

Initial state on creation: private
Development/iteration state: private
Later beta-release step: public + gated
After release update: add to https://huggingface.co/collections/NbAiLab/nb-asr-beta

This ordering is mandatory for nb-asr-beta repositories.

Provenance

This HF repo was prepared from the local training artifact:

olivia-4gpu-20260411-1111/checkpoint-6000

The packaged checkpoint uses:

model.safetensors and generation_config.json from the new local training output
config.json from assets/TEMPLATE_FORCED_ALIGNER_config.json, which replaces the default forced-aligner config
tokenizer and processor support files copied from the known-working repo hfrepos/nb-asr-beta1-qwen-forced-aligner

This follows the project rule that the default forced-aligner config should not be used as-is for future releases.

Overview

The aligner predicts time-aligned spans for reference text given an audio input. In practical terms, it is useful when you want word- or segment-level timing for known text, or when you want to augment an ASR system with alignment output.

This repository follows the Qwen forced-alignment usage pattern and is intended to work with the broader NB-ASR evaluation setup. For upstream architecture and package behavior, use the public Qwen references as the implementation baseline:

Base model: Qwen/Qwen3-ForcedAligner-0.6B
Technical report: Qwen3-ASR Technical Report

Recommended Installation

The recommended route is the official qwen-asr package, which provides the compatible classes and loading behavior for Qwen ASR and forced-aligner checkpoints.

pip install -U "qwen-asr"

Optional packages for supported GPU setups:

pip install -U flash-attn --no-build-isolation

If you want the broader serving stack as well:

pip install -U "qwen-asr[vllm]"

Quickstart

Load the aligner directly and call align with audio, text, and language.

import torch
from qwen_asr import Qwen3ForcedAligner

model = Qwen3ForcedAligner.from_pretrained(
    "NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    # attn_implementation="flash_attention_2",
)

results = model.align(
    audio="audio.wav",
    text="Hun er oversatt til en rekke språk, men ikke norsk.",
    language="Norwegian",
)

print(results[0])
first = results[0][0]
print(first.text, first.start_time, first.end_time)

Use Together With an NB-ASR Beta ASR Model

For transcription plus timestamps, load the ASR model and point forced_aligner to this repo.

import torch
from qwen_asr import Qwen3ASRModel

ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner"

model = Qwen3ASRModel.from_pretrained(
    ASR_MODEL,
    dtype=torch.bfloat16,
    device_map="cuda:0",
    max_inference_batch_size=8,
    max_new_tokens=1024,
    forced_aligner=ALIGNER_MODEL,
    forced_aligner_kwargs=dict(
        dtype=torch.bfloat16,
        device_map="cuda:0",
    ),
)

results = model.transcribe(
    audio=["/path/to/utt1.wav", "/path/to/utt2.wav"],
    language=["Norwegian", "Norwegian"],
    return_time_stamps=True,
)

for r in results:
    print(r.language, r.text, r.time_stamps[0] if r.time_stamps else None)

vLLM Pattern

When the ASR side runs with vLLM, keep the aligner as a dedicated companion checkpoint.

import torch
from qwen_asr import Qwen3ASRModel

if __name__ == "__main__":
    ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
    ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner"

    model = Qwen3ASRModel.LLM(
        model=ASR_MODEL,
        gpu_memory_utilization=0.7,
        max_inference_batch_size=32,
        max_new_tokens=4096,
        forced_aligner=ALIGNER_MODEL,
        forced_aligner_kwargs=dict(
            dtype=torch.bfloat16,
            device_map="cuda:0",
        ),
    )

    results = model.transcribe(
        audio=["/path/to/audio.wav"],
        language=["Norwegian"],
        return_time_stamps=True,
    )

    for r in results:
        print(r.language, r.text, r.time_stamps)

Demo and Local Download

For Gradio-style testing with the Qwen demo tooling, pass this repo as the aligner checkpoint:

--aligner-checkpoint NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner

To download locally:

pip install -U "huggingface_hub[cli]"
hf download NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner --local-dir ./nb-asr-beta-qwen06b-041126-forced-aligner

Included Files

This staged HF repository includes:

model.safetensors
generation_config.json
config.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.json
merges.txt
added_tokens.json
chat_template.jinja
preprocessor_config.json
audio.wav

Training-state files such as optimizer state, scheduler state, RNG snapshots, and trainer metadata were intentionally left out of this HF package.

Intended Use

This model is intended for:

beta evaluation,
alignment experiments,
timestamp generation in NB-ASR workflows,
and integration into internal or semi-controlled ASR pipelines.

It should not be presented as a final public production release without additional validation, packaging review, and naming review.

Acknowledgements

This model is based on the open Qwen3-ASR framework and adapted by NB-ASR project at the National Library.

The following persons have contributed to the dataset creation and training:

Freddy Wetjen
Thea Tollersrud
Phoebe Parsons
Per Egil Kummervold

Downloads last month: 3

Safetensors

Model size

0.9B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for NbAiLab/nb-asr-beta-qwen06b-041126-forced-aligner

Qwen3-ASR Technical Report

Paper • 2601.21337 • Published Jan 29 • 38