---
license: apache-2.0
base_model: google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant
tags:
- gguf
- llama.cpp
- gemma4
- assistant
- speculative-decoding
---

# Gemma 4 E4B IT QAT Assistant GGUF

GGUF conversions of Google's official unquantized QAT assistant/drafter checkpoint for Gemma 4 E4B IT.

Google publishes GGUFs for the main QAT models, but the assistant/drafter checkpoint is published as unquantized QAT safetensors. This repo packages the E4B assistant as GGUF for llama.cpp speculative decoding.

Base model: `google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant`

These files were converted with llama.cpp commit `961e9a3e46ca4cf7e6e86cfceb5b5e32084bf5f0`.

The QAT assistant GGUFs also appear to work fine with regular, non-QAT Gemma 4 E4B IT target models in llama.cpp.

## Usage

```bash
llama-server \
  -m gemma-4-E4B-it.gguf \
  --model-draft gemma-4-E4B-it-qat-assistant-q4_k_m.gguf \
  --spec-type draft-mtp \
  --spec-draft-n-max 3
```

## Files

- `gemma-4-E4B-it-qat-assistant-bf16.gguf`
- `gemma-4-E4B-it-qat-assistant-q8_0.gguf`
- `gemma-4-E4B-it-qat-assistant-q4_k_m.gguf`
- `gemma-4-E4B-it-qat-assistant-q4_0.gguf`

## Conversion

BF16 source GGUF:

```bash
python convert_hf_to_gguf.py \
  google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant \
  --outfile gemma-4-E4B-it-qat-assistant-bf16.gguf \
  --outtype bf16
```

Q4_K_M and Q4_0 were quantized directly from the BF16 GGUF:

```bash
llama-quantize gemma-4-E4B-it-qat-assistant-bf16.gguf gemma-4-E4B-it-qat-assistant-q4_k_m.gguf q4_k_m
llama-quantize gemma-4-E4B-it-qat-assistant-bf16.gguf gemma-4-E4B-it-qat-assistant-q4_0.gguf q4_0
```

Q8_0 was exported directly from the Hugging Face checkpoint:

```bash
python convert_hf_to_gguf.py \
  google/gemma-4-E4B-it-qat-q4_0-unquantized-assistant \
  --outfile gemma-4-E4B-it-qat-assistant-q8_0.gguf \
  --outtype q8_0
```