Qwen3.6-27B-DFlash

GGUF quantizations of z-lab/Qwen3.6-27B-DFlash.

Converted to BF16 using convert_hf_to_gguf.py, then quantized using llama-quantize from llama.cpp.

Available quants

Quant Bits Size Notes
Q4_K_M 4 ~1.03 GB Average quality
Q5_K 5 ~1.22 GB High quality
Q6_K 6 ~1.43 GB Very high quality
Q8_0 8 ~1.84 GB Highest quality, near lossless, Recommended
BF16 16 ~3.47 GB Full precision, reference file

Usage

Use in conjunction with existing Qwen3.6 Quants, example config if using llama-server:

[Qwen3.6-27B-Q8_0-DFlash]
sm = layer
model = /mnt/gguf/Qwen3.6-27B/Qwen3.6-27B-Q8_0.gguf
model-draft = /mnt/gguf/Qwen3.6-27B/Qwen3.6-27B-DFlash-Q8_0.gguf
spec-type = draft-dflash
spec-draft-n-max = 6 

(Note: For some reason I cannot get sm = tensor to work, it crashes on launch, pretty sure this is an issue in llama.cpp)

Original model

See the original model card for details on capabilities, benchmarks, and license.

Downloads last month
15
GGUF
Model size
2B params
Architecture
dflash
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alittlehammmer/Qwen3.6-27B-DFlash-GGUF-llama.cpp

Quantized
(10)
this model

Collection including Alittlehammmer/Qwen3.6-27B-DFlash-GGUF-llama.cpp