How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Alittlehammmer/Qwen3.6-35B-A3B-DFlash-GGUF-llama.cpp",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen3.6-35B-A3B-DFlash

GGUF quantizations of z-lab/Qwen3.6-35B-A3B-DFlash.

Converted to BF16 using convert_hf_to_gguf.py, then quantized using llama-quantize from llama.cpp.

Available quants

Quant Bits Size Notes
Q4_K_M 4 ~235 MB Average quality
Q5_K 5 ~280 MB High quality
Q6_K 6 ~326 MB Very high quality
Q8_0 8 ~421 MB Highest quality, near lossless, Recommended
BF16 16 ~771 MB Full precision, reference file

Usage

Use in conjunction with existing Qwen3.6 Quants, example config if using llama-server:

[Qwen3.6-35B-A3B-Q8_0-DFlash]
model = /mnt/gguf/Qwen3.6-35B-A3B/Qwen3.6-35B-A3B-Q8_0.gguf
model-draft = /mnt/gguf/Qwen3.6-35B-A3B/Qwen3.6-35B-A3B-DFlash-Q8_0.gguf
spec-type = draft-dflash
spec-draft-n-max = 6 

Original model

See the original model card for details on capabilities, benchmarks, and license.

Downloads last month
10
GGUF
Model size
0.4B params
Architecture
dflash
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alittlehammmer/Qwen3.6-35B-A3B-DFlash-GGUF-llama.cpp

Quantized
(10)
this model

Collection including Alittlehammmer/Qwen3.6-35B-A3B-DFlash-GGUF-llama.cpp