Gemma-4-26B-A4B-it-DFlash

GGUF quantizations of z-lab/gemma-4-26B-A4B-it-DFlash.

Converted to BF16 using convert_hf_to_gguf.py, then quantized using llama-quantize from llama.cpp.

Available quants

Quant Bits Size Notes
Q4_K_M 4 226 MB Average quality
Q5_K 5 315 MB High quality
Q6_K 6 367 MB Very high quality
Q8_0 8 471 MB Highest quality, near lossless, Recommended
BF16 16 874 MB Full precision, reference file

Usage

Use in conjunction with existing Gemma 4 Quants, example config if using llama-server:

[Gemma-4-26B-A4B-it-DFlash]
sm = layer
model = /mnt/gguf/Gemma-4-26B-A4B-it/Gemma-4-26B-A4B-it-Q8_0.gguf
model-draft = /mnt/gguf/Gemma-4-26B-A4B-it-DFlash/Gemma-4-26B-A4B-it-DFlash-Q8_0.gguf
spec-type = draft-dflash
spec-draft-n-max = 6 

(Note: For some reason I cannot get sm = tensor to work, it crashes on launch, pretty sure this is an issue in llama.cpp)

Original model

See the original model card for details on capabilities, benchmarks, and license.

Downloads last month
-
GGUF
Model size
0.4B params
Architecture
dflash
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alittlehammmer/gemma-4-26B-A4B-it-DFlash-GGUF-llama.cpp

Quantized
(4)
this model

Collection including Alittlehammmer/gemma-4-26B-A4B-it-DFlash-GGUF-llama.cpp