Gemma 4 E2B QAT — OpenVINO INT4 (q4_0-matched scheme)

OpenVINO IR conversion of google/gemma-4-E2B-it-qat-q4_0-unquantized — Google's quantization-aware-trained Gemma 4 E2B checkpoint. To our knowledge the first OpenVINO IR of a Gemma 4 QAT model.

Why this build instead of a regular E2B int4 conversion: the QAT checkpoint was trained to be quantized, so int4 keeps far more quality than post-training quantization. This conversion applies the QAT-matched scheme (symmetric, group size 32, mirroring the q4_0 layout the model was trained for) so the QAT benefit is preserved:

optimum-cli export openvino -m google/gemma-4-E2B-it-qat-q4_0-unquantized \
  --task image-text-to-text --weight-format int4 --sym --group-size 32 \
  gemma-4-E2B-it-qat-int4-ov

Conversion note

Loader warnings about "missing" k/v projections on upper layers are Gemma 4's tied KV-shared weights (verified benign).

Usage (OpenVINO GenAI)

import openvino_genai as ov_genai

pipe = ov_genai.VLMPipeline("gemma-4-E2B-it-qat-int4-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Explain Python decorators in three sentences.", max_new_tokens=128))

VLM-shaped IR (per-layer embeddings + vision tower) — requires VLMPipeline even for text-only use. Tested end-to-end as a Continue.dev chat backend via core-ultra-llm-server.

Provenance & license

Base: Google's Gemma 4 E2B QAT checkpoint (released 2026-04); weights are governed by the Gemma Terms of Use
Conversion date: 2026-06-06; optimum-intel git-master, transformers 5.5.0
No finetuning — direct quantization of Google's QAT weights

Downloads last month: 49

Model tree for HarmenWessels/gemma-4-E2B-it-qat-int4-ov

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Finetuned

google/gemma-4-E2B-it-qat-q4_0-unquantized

Quantized

(28)

this model