Gemma 4 E2B QAT — OpenVINO INT4 (q4_0-matched scheme)

OpenVINO IR conversion of google/gemma-4-E2B-it-qat-q4_0-unquantized — Google's quantization-aware-trained Gemma 4 E2B checkpoint. To our knowledge the first OpenVINO IR of a Gemma 4 QAT model.

Why this build instead of a regular E2B int4 conversion: the QAT checkpoint was trained to be quantized, so int4 keeps far more quality than post-training quantization. This conversion applies the QAT-matched scheme (symmetric, group size 32, mirroring the q4_0 layout the model was trained for) so the QAT benefit is preserved:

optimum-cli export openvino -m google/gemma-4-E2B-it-qat-q4_0-unquantized \
  --task image-text-to-text --weight-format int4 --sym --group-size 32 \
  gemma-4-E2B-it-qat-int4-ov

Conversion note

Loader warnings about "missing" k/v projections on upper layers are Gemma 4's tied KV-shared weights (verified benign).

Usage (OpenVINO GenAI)

import openvino_genai as ov_genai

pipe = ov_genai.VLMPipeline("gemma-4-E2B-it-qat-int4-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Explain Python decorators in three sentences.", max_new_tokens=128))

VLM-shaped IR (per-layer embeddings + vision tower) — requires VLMPipeline even for text-only use. Tested end-to-end as a Continue.dev chat backend via core-ultra-llm-server.

Provenance & license

  • Base: Google's Gemma 4 E2B QAT checkpoint (released 2026-04); weights are governed by the Gemma Terms of Use
  • Conversion date: 2026-06-06; optimum-intel git-master, transformers 5.5.0
  • No finetuning — direct quantization of Google's QAT weights
Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HarmenWessels/gemma-4-E2B-it-qat-int4-ov

Quantized
(28)
this model