Gemma4 31B Abliterated Multimodal AWQ8

This repository contains a compressed-tensors AWQ W8A16 checkpoint for shreyan35/gemma4-31b-abliterated-multimodal.

What is included

  • config.json with the compressed-tensors quantization config
  • model.safetensors
  • tokenizer.json and tokenizer_config.json
  • processor_config.json
  • chat_template.jinja
  • generation_config.json
  • recipe.yaml

Quantization summary

  • Format: compressed-tensors
  • Method: AWQ
  • Weight bits: 8
  • Activation bits: 16
  • Group size: 32
  • Weights: symmetric
  • Observer: mse
  • Duo scaling: enabled
  • Excluded from quantization: vision tower, multimodal projector/embed_vision, and lm_head

Recommended serving

Use vLLM for inference.

Tested runtime:

  • torch 2.10.0+cu128
  • transformers 5.5.1
  • compressed-tensors 0.14.0.1
  • vllm 0.19.0
pip install -U "torch==2.10.0" "transformers==5.5.1" "compressed-tensors==0.14.0.1" "vllm==0.19.0"
vllm serve groxaxo/gemma4-31b-abliterated-multimodal-awq8 \
  --trust-remote-code \
  --dtype auto \
  --max-model-len 6144 \
  --served-model-name gemma4-31b-abliterated-multimodal-awq8

For local image inputs, add:

--allowed-local-media-path /path/to/images --limit-mm-per-prompt '{"image":1}'

For text-only long-context serving:

--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}' --skip-mm-profiling --mm-processor-cache-gb 0 --max-model-len 10240

Transformers fallback

The checkpoint also loads with trust_remote_code=True through Transformers:

from transformers import AutoProcessor, AutoModelForImageTextToText

repo_id = "groxaxo/gemma4-31b-abliterated-multimodal-awq8"
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype="auto",
)

Notes

  • This checkpoint was built to preserve multimodal capability while avoiding quantization of the vision tower and projector modules.
  • If you only need a local OpenAI-compatible endpoint, point clients at http://127.0.0.1:1234/v1 after starting vLLM.
Downloads last month
40
Safetensors
Model size
34B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/gemma4-31b-abliterated-multimodal-awq8

Quantized
(1)
this model