---
base_model: z-lab/Qwen3.6-35B-A3B-DFlash
tags:
- gguf
- dflash
- ik_llama.cpp
library_name: gguf
pipeline_tag: text-generation
---

# Qwen 3.6 35B A3B DFlash GGUF

GGUF made to use in [ikawrakow/ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp), currently for PR [#1970](https://github.com/ikawrakow/ik_llama.cpp/pull/1970). The small quantizations delivered here are made for test purposes; feel free to create your own quantization.

Derived from the safetensors DFlash draft model [z-lab/Qwen3.6-35B-A3B-DFlash](https://huggingface.co/z-lab/Qwen3.6-35B-A3B-DFlash).

## Compatible target model

- `Qwen3.6-35B-A3B-UD.gguf` - Mainly tested with Q4_K_M.

## Files

| File | Quant | Size |
|---|---|---|
| `qwen36-35b-a3b-dflash-F16.gguf` | F16 | 915 MB |
| `qwen36-35b-a3b-dflash-Q8_0.gguf` | Q8_0 | 491 MB |
| `qwen36-35b-a3b-dflash-Q4_K_M.gguf` | Q4_K_M | 279 MB |

## Usage

```bash
./build/bin/llama-server \
  --model <target.gguf> \
  --model-draft <draft.gguf> \
  --spec-type dflash:n_max=<N>,cross_ctx=<N> ...
```

## Notes

- This repo contains DFlash draft models, not a standalone instruct model.
- Use it with the matching target family listed above.
- `Q4_K_M` and `Q8_0` are small test-oriented quants; create your own quant if you need a different tradeoff.