---
license: apache-2.0
language:
- zh
- en
tags:
- gguf
- llama.cpp
- automatic-speech-recognition
- speech-to-text
- qwen3-asr
base_model:
- Qwen/Qwen3-ASR-1.7B
pipeline_tag: automatic-speech-recognition
---

# Qwen3-ASR-1.7B-Q4_K_M-GGUF

GGUF export of `Qwen/Qwen3-ASR-1.7B` for `llama.cpp`.

Files included:

- `Qwen3-ASR-1.7B-Q4_K_M.gguf`
- `mmproj-Qwen3-ASR-1.7B-Q4_K_M.gguf`

Both files are required for audio transcription with `llama.cpp` multimodal support.

## Tested command

```powershell
llama-mtmd-cli.exe ^
  -m Qwen3-ASR-1.7B-Q4_K_M.gguf ^
  --mmproj mmproj-Qwen3-ASR-1.7B-Q4_K_M.gguf ^
  --audio sample.wav ^
  -p "Transcribe the audio." ^
  -t 8 -n 256 --temp 0
```

## Notes

- Main model was converted from the original Hugging Face checkpoint to GGUF, then quantized to `Q4_K_M`.
- `mmproj` was exported from the original checkpoint as `F16`, then quantized to `Q4_K_M`.
- This pair was locally tested with `llama-mtmd-cli` on Chinese audio.