--- base_model: z-lab/Qwen3.6-35B-A3B-DFlash tags: - gguf - dflash - ik_llama.cpp library_name: gguf pipeline_tag: text-generation --- # Qwen 3.6 35B A3B DFlash GGUF GGUF made to use in [ikawrakow/ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp), currently for PR [#1970](https://github.com/ikawrakow/ik_llama.cpp/pull/1970). The small quantizations delivered here are made for test purposes; feel free to create your own quantization. Derived from the safetensors DFlash draft model [z-lab/Qwen3.6-35B-A3B-DFlash](https://huggingface.co/z-lab/Qwen3.6-35B-A3B-DFlash). ## Compatible target model - `Qwen3.6-35B-A3B-UD.gguf` - Mainly tested with Q4_K_M. ## Files | File | Quant | Size | |---|---|---| | `qwen36-35b-a3b-dflash-F16.gguf` | F16 | 915 MB | | `qwen36-35b-a3b-dflash-Q8_0.gguf` | Q8_0 | 491 MB | | `qwen36-35b-a3b-dflash-Q4_K_M.gguf` | Q4_K_M | 279 MB | ## Usage ```bash ./build/bin/llama-server \ --model \ --model-draft \ --spec-type dflash:n_max=,cross_ctx= ... ``` ## Notes - This repo contains DFlash draft models, not a standalone instruct model. - Use it with the matching target family listed above. - `Q4_K_M` and `Q8_0` are small test-oriented quants; create your own quant if you need a different tradeoff.