Instructions to use appautomaton/qwen3-asr-1.7b-bf16-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use appautomaton/qwen3-asr-1.7b-bf16-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir qwen3-asr-1.7b-bf16-mlx appautomaton/qwen3-asr-1.7b-bf16-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Qwen3-ASR 1.7B — MLX BF16
This repository contains a pure-MLX BF16 conversion of Qwen3-ASR-1.7B for
local, offline speech recognition on Apple Silicon. It is intended for use
with mlx-speech, without a
PyTorch, Transformers, or vLLM runtime at inference time.
The conversion remaps upstream thinker.* checkpoint keys into the
mlx-speech module tree and transposes the audio Conv2D weights from PyTorch
layout into MLX layout. Weights are kept in the original BF16 precision — no
quantization.
Model Details
- Developed by: AppAutomaton
- Upstream model:
Qwen/Qwen3-ASR-1.7B - Task: automatic speech recognition (offline, single-pass)
- Runtime: MLX on Apple Silicon
- Precision: BF16 (unquantized)
- Validated languages: English, Chinese, and mixed Chinese/English
- Total size: ~4.7 GB
Contents
| File | Component | Format |
|---|---|---|
model.safetensors |
Audio encoder + Qwen3 text decoder | bf16 |
config.json |
Model config (model_type: qwen3_asr) |
JSON |
generation_config.json |
Generation defaults | JSON |
preprocessor_config.json |
Audio frontend config | JSON |
chat_template.json |
Upstream chat template (reference) | JSON |
vocab.json, merges.txt, tokenizer_config.json |
Tokenizer assets | JSON / text |
How to Get Started
Download the package:
hf download appautomaton/qwen3-asr-1.7b-bf16-mlx \
--local-dir models/Qwen3-ASR-1.7B-MLX-BF16
Minimal Python usage with mlx-speech:
import mlx_speech
asr = mlx_speech.asr.load("models/Qwen3-ASR-1.7B-MLX-BF16")
result = asr.generate("speech.wav", max_new_tokens=256)
print(result.language, result.text)
Command-line transcription:
mlx-speech asr \
--model models/Qwen3-ASR-1.7B-MLX-BF16 \
--audio speech.wav
Language Behavior
Omitting language (or passing None / "auto") lets the model infer the
language from the audio. This is the right first option for single-language
English or Chinese speech.
For Chinese/English mixed speech where preserving Chinese characters matters, prefer the forced Chinese prompt path:
asr.generate("mixed-speech.wav", language="Chinese")
Local checks found that auto mode can treat English-dominant mixed speech as English and translate the Chinese segments; the Chinese prompt path preserved mixed Chinese/English text best.
Runtime Shape
- Audio is loaded or expected as 16 kHz mono waveform data.
- The frontend matches the upstream
WhisperFeatureExtractorsetup: 128 mel bins,n_fft=400,hop_length=160, with dynamic padding. - The processor builds the Qwen chat prompt directly with token IDs and
expands
<|audio_pad|>to the exact audio feature length. - Audio embeddings replace the audio placeholder token embeddings before Qwen3 prefill.
- Generation uses greedy decoding with a local KV cache and parses
language ...<asr_text>...outputs into(language, text).
Current Limits
- Offline, single-pass transcription only; streaming is deferred.
- Timestamps and forced alignment are deferred.
- Long-audio chunking and language merge logic are deferred.
- Upstream supports 30 languages and 22 Chinese dialects; this conversion is validated for English, Chinese, and mixed Chinese/English.
Links
- Source code:
mlx-speech - Upstream model:
Qwen/Qwen3-ASR-1.7B - More examples: AppAutomaton
License
Apache 2.0 — following the upstream license published with
Qwen/Qwen3-ASR-1.7B.
- Downloads last month
- 30
Quantized
Model tree for appautomaton/qwen3-asr-1.7b-bf16-mlx
Base model
Qwen/Qwen3-ASR-1.7B