--- base_model: - z-lab/Qwen3.6-27B-DFlash base_model_relation: quantized quantized_by: Alittlehammmer license: apache-2.0 license_link: https://huggingface.co/z-lab/Qwen3.6-27B-DFlash/blob/main/LICENSE pipeline_tag: text-generation library_name: gguf tags: - gguf - quantized - llama.cpp - text-generation - agentic-coding - dflash --- # Qwen3.6-27B-DFlash GGUF quantizations of [z-lab/Qwen3.6-27B-DFlash](https://huggingface.co/z-lab/Qwen3.6-27B-DFlash). Converted to BF16 using `convert_hf_to_gguf.py`, then quantized using `llama-quantize` from [llama.cpp](https://github.com/ggml-org/llama.cpp). ## Available quants | Quant | Bits | Size | Notes | | ------ | ----- | ------- | -------------------------------- | | Q4_K_M | 4 | ~1.03 GB | Average quality | | Q5_K | 5 | ~1.22 GB | High quality | | Q6_K | 6 | ~1.43 GB | Very high quality | | Q8_0 | 8 | ~1.84 GB | Highest quality, near lossless, Recommended | | BF16 | 16 | ~3.47 GB | Full precision, reference file | ## Usage Use in conjunction with existing Qwen3.6 Quants, example `config` if using `llama-server`: ```bash [Qwen3.6-27B-Q8_0-DFlash] sm = layer model = /mnt/gguf/Qwen3.6-27B/Qwen3.6-27B-Q8_0.gguf model-draft = /mnt/gguf/Qwen3.6-27B/Qwen3.6-27B-DFlash-Q8_0.gguf spec-type = draft-dflash spec-draft-n-max = 6 ``` (Note: For some reason I cannot get `sm = tensor` to work, it crashes on launch, pretty sure this is an issue in llama.cpp) ## Original model See the [original model card](https://huggingface.co/z-lab/Qwen3.6-27B-DFlash) for details on capabilities, benchmarks, and license.