--- license: apache-2.0 language: - zh - en tags: - gguf - llama.cpp - automatic-speech-recognition - speech-to-text - qwen3-asr base_model: - Qwen/Qwen3-ASR-1.7B pipeline_tag: automatic-speech-recognition --- # Qwen3-ASR-1.7B-Q4_K_M-GGUF GGUF export of `Qwen/Qwen3-ASR-1.7B` for `llama.cpp`. Files included: - `Qwen3-ASR-1.7B-Q4_K_M.gguf` - `mmproj-Qwen3-ASR-1.7B-Q4_K_M.gguf` Both files are required for audio transcription with `llama.cpp` multimodal support. ## Tested command ```powershell llama-mtmd-cli.exe ^ -m Qwen3-ASR-1.7B-Q4_K_M.gguf ^ --mmproj mmproj-Qwen3-ASR-1.7B-Q4_K_M.gguf ^ --audio sample.wav ^ -p "Transcribe the audio." ^ -t 8 -n 256 --temp 0 ``` ## Notes - Main model was converted from the original Hugging Face checkpoint to GGUF, then quantized to `Q4_K_M`. - `mmproj` was exported from the original checkpoint as `F16`, then quantized to `Q4_K_M`. - This pair was locally tested with `llama-mtmd-cli` on Chinese audio.