Instructions to use mlabonne/gemma-3-27b-it-qat-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlabonne/gemma-3-27b-it-qat-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="mlabonne/gemma-3-27b-it-qat-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mlabonne/gemma-3-27b-it-qat-abliterated")
model = AutoModelForCausalLM.from_pretrained("mlabonne/gemma-3-27b-it-qat-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mlabonne/gemma-3-27b-it-qat-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mlabonne/gemma-3-27b-it-qat-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlabonne/gemma-3-27b-it-qat-abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/mlabonne/gemma-3-27b-it-qat-abliterated

SGLang

How to use mlabonne/gemma-3-27b-it-qat-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mlabonne/gemma-3-27b-it-qat-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlabonne/gemma-3-27b-it-qat-abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mlabonne/gemma-3-27b-it-qat-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlabonne/gemma-3-27b-it-qat-abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use mlabonne/gemma-3-27b-it-qat-abliterated with Docker Model Runner:
```
docker model run hf.co/mlabonne/gemma-3-27b-it-qat-abliterated
```

gemma-3-27b-it-qat-abliterated

File size: 2,034 Bytes

35b6b91
87b1dbb
35b6b91
87b1dbb
 
35b6b91
 
87b1dbb
35b6b91
87b1dbb
a0e0870
35b6b91
87b1dbb
 
35b6b91
87b1dbb
35b6b91
87b1dbb
35b6b91
87b1dbb
35b6b91
87b1dbb
35b6b91
87b1dbb
 
 
 
35b6b91
87b1dbb

---
license: gemma
library_name: transformers
pipeline_tag: image-text-to-text
base_model: google/gemma-3-27b-it-qat-q4_0-unquantized
---

# 💎 Gemma 3 27B IT QAT Abliterated

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/NjwzenHhKsuPRMPYxyN4p.png)
<center>Gemma 3 QAT Abliterated <a href="https://huggingface.co/mlabonne/gemma-3-1b-it-qat-abliterated">1B</a> • <a href="https://huggingface.co/mlabonne/gemma-3-4b-it-qat-abliterated">4B</a> • <a href="https://huggingface.co/mlabonne/gemma-3-12b-it-qat-abliterated">12B</a> • <a href="https://huggingface.co/mlabonne/gemma-3-27b-it-qat-abliterated">27B</a></center>

This is an uncensored version of [google/gemma-3-27b-it-qat-q4_0-unquantized](https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-unquantized) created with a new abliteration technique.
See [this article](https://huggingface.co/blog/mlabonne/abliteration) to know more about abliteration.

This is a new, improved version that targets refusals with enhanced accuracy.

I recommend using these generation parameters: `temperature=1.0`, `top_k=64`, `top_p=0.95`.

## ✂️ Abliteration

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/xzUdjHWYL0p-KyqlIpN4x.png)

The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. 
The hidden states of target modules (e.g., o_proj) are orthogonalized to subtract this refusal direction with a given weight factor. 
These weight factors follow a normal distribution with a certain spread and peak layer. 
Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory.

Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1). 
The goal is to obtain an acceptance rate >90% and still produce coherent outputs.