Gemma 4 E2B QAT — OpenVINO INT4 (q4_0-matched scheme)
OpenVINO IR conversion of google/gemma-4-E2B-it-qat-q4_0-unquantized — Google's quantization-aware-trained Gemma 4 E2B checkpoint. To our knowledge the first OpenVINO IR of a Gemma 4 QAT model.
Why this build instead of a regular E2B int4 conversion: the QAT checkpoint was trained to be quantized, so int4 keeps far more quality than post-training quantization. This conversion applies the QAT-matched scheme (symmetric, group size 32, mirroring the q4_0 layout the model was trained for) so the QAT benefit is preserved:
optimum-cli export openvino -m google/gemma-4-E2B-it-qat-q4_0-unquantized \
--task image-text-to-text --weight-format int4 --sym --group-size 32 \
gemma-4-E2B-it-qat-int4-ov
Conversion note
Loader warnings about "missing" k/v projections on upper layers are Gemma 4's tied KV-shared weights (verified benign).
Usage (OpenVINO GenAI)
import openvino_genai as ov_genai
pipe = ov_genai.VLMPipeline("gemma-4-E2B-it-qat-int4-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Explain Python decorators in three sentences.", max_new_tokens=128))
VLM-shaped IR (per-layer embeddings + vision tower) — requires VLMPipeline even for
text-only use. Tested end-to-end as a Continue.dev chat backend via
core-ultra-llm-server.
Provenance & license
- Base: Google's Gemma 4 E2B QAT checkpoint (released 2026-04); weights are governed by the Gemma Terms of Use
- Conversion date: 2026-06-06; optimum-intel git-master, transformers 5.5.0
- No finetuning — direct quantization of Google's QAT weights
- Downloads last month
- 49
Model tree for HarmenWessels/gemma-4-E2B-it-qat-int4-ov
Base model
google/gemma-4-E2B