--- license: apache-2.0 base_model: google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant tags: - gguf - llama.cpp - gemma4 - assistant - speculative-decoding --- # Gemma 4 E4B IT QAT Assistant GGUF GGUF conversions of Google's official unquantized QAT assistant/drafter checkpoint for Gemma 4 E4B IT. Google publishes GGUFs for the main QAT models, but the assistant/drafter checkpoint is published as unquantized QAT safetensors. This repo packages the E4B assistant as GGUF for llama.cpp speculative decoding. Base model: `google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant` These files were converted with llama.cpp commit `961e9a3e46ca4cf7e6e86cfceb5b5e32084bf5f0`. The QAT assistant GGUFs also appear to work fine with regular, non-QAT Gemma 4 E4B IT target models in llama.cpp. ## Usage ```bash llama-server \ -m gemma-4-E4B-it.gguf \ --model-draft gemma-4-E4B-it-qat-assistant-q4_k_m.gguf \ --spec-type draft-mtp \ --spec-draft-n-max 3 ``` ## Files - `gemma-4-E4B-it-qat-assistant-bf16.gguf` - `gemma-4-E4B-it-qat-assistant-q8_0.gguf` - `gemma-4-E4B-it-qat-assistant-q4_k_m.gguf` - `gemma-4-E4B-it-qat-assistant-q4_0.gguf` ## Conversion BF16 source GGUF: ```bash python convert_hf_to_gguf.py \ google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant \ --outfile gemma-4-E4B-it-qat-assistant-bf16.gguf \ --outtype bf16 ``` Q4_K_M and Q4_0 were quantized directly from the BF16 GGUF: ```bash llama-quantize gemma-4-E4B-it-qat-assistant-bf16.gguf gemma-4-E4B-it-qat-assistant-q4_k_m.gguf q4_k_m llama-quantize gemma-4-E4B-it-qat-assistant-bf16.gguf gemma-4-E4B-it-qat-assistant-q4_0.gguf q4_0 ``` Q8_0 was exported directly from the Hugging Face checkpoint: ```bash python convert_hf_to_gguf.py \ google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant \ --outfile gemma-4-E4B-it-qat-assistant-q8_0.gguf \ --outtype q8_0 ```