Instructions to use mlabonne/gemma-3-27b-it-qat-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlabonne/gemma-3-27b-it-qat-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="mlabonne/gemma-3-27b-it-qat-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mlabonne/gemma-3-27b-it-qat-abliterated")
model = AutoModelForCausalLM.from_pretrained("mlabonne/gemma-3-27b-it-qat-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mlabonne/gemma-3-27b-it-qat-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mlabonne/gemma-3-27b-it-qat-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlabonne/gemma-3-27b-it-qat-abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/mlabonne/gemma-3-27b-it-qat-abliterated

SGLang

How to use mlabonne/gemma-3-27b-it-qat-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mlabonne/gemma-3-27b-it-qat-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlabonne/gemma-3-27b-it-qat-abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mlabonne/gemma-3-27b-it-qat-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlabonne/gemma-3-27b-it-qat-abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use mlabonne/gemma-3-27b-it-qat-abliterated with Docker Model Runner:
```
docker model run hf.co/mlabonne/gemma-3-27b-it-qat-abliterated
```

gemma-3-27b-it-qat-abliterated / README.md

mlabonne

Update README.md

a0e0870 verified about 1 year ago

preview code

raw

history blame

2.03 kB

	---
	license: gemma
	library_name: transformers
	pipeline_tag: image-text-to-text
	base_model: google/gemma-3-27b-it-qat-q4_0-unquantized
	---

	# 💎 Gemma 3 27B IT QAT Abliterated

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/NjwzenHhKsuPRMPYxyN4p.png)
	<center>Gemma 3 QAT Abliterated <a href="https://huggingface.co/mlabonne/gemma-3-1b-it-qat-abliterated">1B</a> • <a href="https://huggingface.co/mlabonne/gemma-3-4b-it-qat-abliterated">4B</a> • <a href="https://huggingface.co/mlabonne/gemma-3-12b-it-qat-abliterated">12B</a> • <a href="https://huggingface.co/mlabonne/gemma-3-27b-it-qat-abliterated">27B</a></center>

	This is an uncensored version of [google/gemma-3-27b-it-qat-q4_0-unquantized](https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-unquantized) created with a new abliteration technique.
	See [this article](https://huggingface.co/blog/mlabonne/abliteration) to know more about abliteration.

	This is a new, improved version that targets refusals with enhanced accuracy.

	I recommend using these generation parameters: `temperature=1.0`, `top_k=64`, `top_p=0.95`.

	## ✂️ Abliteration

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/xzUdjHWYL0p-KyqlIpN4x.png)

	The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples.
	The hidden states of target modules (e.g., o_proj) are orthogonalized to subtract this refusal direction with a given weight factor.
	These weight factors follow a normal distribution with a certain spread and peak layer.
	Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory.

	Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1).
	The goal is to obtain an acceptance rate >90% and still produce coherent outputs.