--- base_model: shreyan35/gemma4-31b-abliterated-multimodal library_name: vllm pipeline_tag: image-text-to-text tags: - gemma4 - multimodal - awq - compressed-tensors - w8a16 - vllm --- # Gemma4 31B Abliterated Multimodal AWQ8 This repository contains a **compressed-tensors AWQ W8A16** checkpoint for `shreyan35/gemma4-31b-abliterated-multimodal`. ## What is included - `config.json` with the compressed-tensors quantization config - `model.safetensors` - `tokenizer.json` and `tokenizer_config.json` - `processor_config.json` - `chat_template.jinja` - `generation_config.json` - `recipe.yaml` ## Quantization summary - Format: `compressed-tensors` - Method: `AWQ` - Weight bits: `8` - Activation bits: `16` - Group size: `32` - Weights: symmetric - Observer: `mse` - Duo scaling: enabled - Excluded from quantization: vision tower, multimodal projector/embed_vision, and `lm_head` ## Recommended serving Use **vLLM** for inference. Tested runtime: - `torch 2.10.0+cu128` - `transformers 5.5.1` - `compressed-tensors 0.14.0.1` - `vllm 0.19.0` ```bash pip install -U "torch==2.10.0" "transformers==5.5.1" "compressed-tensors==0.14.0.1" "vllm==0.19.0" vllm serve groxaxo/gemma4-31b-abliterated-multimodal-awq8 \ --trust-remote-code \ --dtype auto \ --max-model-len 6144 \ --served-model-name gemma4-31b-abliterated-multimodal-awq8 ``` For local image inputs, add: ```bash --allowed-local-media-path /path/to/images --limit-mm-per-prompt '{"image":1}' ``` For text-only long-context serving: ```bash --limit-mm-per-prompt '{"image":0,"video":0,"audio":0}' --skip-mm-profiling --mm-processor-cache-gb 0 --max-model-len 10240 ``` ## Transformers fallback The checkpoint also loads with `trust_remote_code=True` through Transformers: ```python from transformers import AutoProcessor, AutoModelForImageTextToText repo_id = "groxaxo/gemma4-31b-abliterated-multimodal-awq8" processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained( repo_id, trust_remote_code=True, device_map="auto", torch_dtype="auto", ) ``` ## Notes - This checkpoint was built to preserve multimodal capability while avoiding quantization of the vision tower and projector modules. - If you only need a local OpenAI-compatible endpoint, point clients at `http://127.0.0.1:1234/v1` after starting vLLM.