---
base_model: shreyan35/gemma4-31b-abliterated-multimodal
library_name: vllm
pipeline_tag: image-text-to-text
tags:
  - gemma4
  - multimodal
  - awq
  - compressed-tensors
  - w8a16
  - vllm
---

# Gemma4 31B Abliterated Multimodal AWQ8

This repository contains a **compressed-tensors AWQ W8A16** checkpoint for `shreyan35/gemma4-31b-abliterated-multimodal`.

## What is included

- `config.json` with the compressed-tensors quantization config
- `model.safetensors`
- `tokenizer.json` and `tokenizer_config.json`
- `processor_config.json`
- `chat_template.jinja`
- `generation_config.json`
- `recipe.yaml`

## Quantization summary

- Format: `compressed-tensors`
- Method: `AWQ`
- Weight bits: `8`
- Activation bits: `16`
- Group size: `32`
- Weights: symmetric
- Observer: `mse`
- Duo scaling: enabled
- Excluded from quantization: vision tower, multimodal projector/embed_vision, and `lm_head`

## Recommended serving

Use **vLLM** for inference.

Tested runtime:

- `torch 2.10.0+cu128`
- `transformers 5.5.1`
- `compressed-tensors 0.14.0.1`
- `vllm 0.19.0`

```bash
pip install -U "torch==2.10.0" "transformers==5.5.1" "compressed-tensors==0.14.0.1" "vllm==0.19.0"
vllm serve groxaxo/gemma4-31b-abliterated-multimodal-awq8 \
  --trust-remote-code \
  --dtype auto \
  --max-model-len 6144 \
  --served-model-name gemma4-31b-abliterated-multimodal-awq8
```

For local image inputs, add:

```bash
--allowed-local-media-path /path/to/images --limit-mm-per-prompt '{"image":1}'
```

For text-only long-context serving:

```bash
--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}' --skip-mm-profiling --mm-processor-cache-gb 0 --max-model-len 10240
```

## Transformers fallback

The checkpoint also loads with `trust_remote_code=True` through Transformers:

```python
from transformers import AutoProcessor, AutoModelForImageTextToText

repo_id = "groxaxo/gemma4-31b-abliterated-multimodal-awq8"
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype="auto",
)
```

## Notes

- This checkpoint was built to preserve multimodal capability while avoiding quantization of the vision tower and projector modules.
- If you only need a local OpenAI-compatible endpoint, point clients at `http://127.0.0.1:1234/v1` after starting vLLM.