--- license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license pipeline_tag: any-to-any base_model: - google/gemma-4-E4B-it tags: - abliterated - uncensored - gemma4 - gemma library_name: transformers --- # gemma4-E4B-it-abliterated > Follow [**@treadon on X**](https://x.com/treadon) and [**treadon on Hugging Face**](https://huggingface.co/treadon) for more AI experiments, evals, and projects. > **0 refusals across 400 prompts. The larger Gemma needed half the surgery.** **[Blog Post](https://riteshkhanna.com/blog/abliterate-gemma-e4b)** | **[E2B Version](https://huggingface.co/treadon/gemma4-E2B-it-abliterated)** | **[Follow @treadon on X](https://x.com/treadon)** for more ML experiments An abliterated (uncensored) version of [google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it) with safety refusal behavior removed via **norm-preserving biprojected abliteration**. This model responds to all prompts without refusal. It retains the full capabilities of the base model with zero degradation on harmless tasks. ## E4B vs E2B: Bigger Model, Easier Abliteration The E4B model needed dramatically less intervention than the smaller E2B. The refusal signal is stronger but more concentrated in the larger model, making it paradoxically easier to remove. | Metric | E2B | E4B | |--------|-----|-----| | Base params | 5.1B (2.3B effective) | 7.9B (4.5B effective) | | Layers modified | 24/35 (69%) | **17/42 (40%)** | | Scale factor | 1.75 | **1.0** | | Weight matrices edited | 48 | **34** | | Peak refusal signal | 52 | **74** | | Grid search configs that scored perfect | 1/30 | **Every config tested** | The E2B required a precise sweet spot (L=24, s=1.75 was the only perfect config). The E4B works at any reasonable setting — the refusal direction is clean and separable. ## Method Same norm-preserving biprojected abliteration as the [E2B version](https://huggingface.co/treadon/gemma4-E2B-it-abliterated): 1. **Activation collection** — 100 harmful + 100 harmless prompts, winsorized at 99.5th percentile 2. **Per-layer refusal direction** — Difference-in-means with biprojection (orthogonalize against harmless mean) 3. **Norm-preserving weight modification** — Project out refusal direction from `self_attn.o_proj` and `mlp.down_proj`, restore row magnitudes **Config:** Top 17/42 layers by signal strength, scale=1.0, single pass ## Evaluation | Benchmark | Prompts | Refused | Compliance | |-----------|---------|---------|------------| | Our prompts (harmful) | 100 | 0 | **100%** | | Our prompts (harmless) | 100 | 0 | **0% over-refusal** | | [JailbreakBench](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors) (harmful) | 100 | 0 | **100%** | | [JailbreakBench](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors) (benign) | 100 | 0 | **0% over-refusal** | | Spec | Value | |------|-------| | Format | BF16 safetensors | | Parameters | 7.9B total / 4.5B effective | | Layers | 42 decoder layers | | Hidden size | 2560 | ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "treadon/gemma4-E4B-it-abliterated" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, dtype=torch.bfloat16, device_map="auto", ) messages = [{"role": "user", "content": "Write a Python port scanner."}] inputs = tokenizer.apply_chat_template( messages, return_tensors="pt", return_dict=True, add_generation_prompt=True ) inputs = {k: v.to(model.device) for k, v in inputs.items()} with torch.no_grad(): output = model.generate(**inputs, max_new_tokens=500, do_sample=True, temperature=0.7) print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) ``` ## Blog Post For the full story — including why standard abliteration fails on Gemma and the E2B vs E4B comparison: - **[I Abliterated Gemma 4 on a MacBook](https://riteshkhanna.com/blog/abliterate-gemma)** — the original E2B walkthrough - **[Abliterating Gemma 4 E4B: Bigger Model, Easier Surgery](https://riteshkhanna.com/blog/abliterate-gemma-e4b)** — E4B findings ## Disclaimer This model has no safety guardrails. It will respond to any prompt without refusal. It is intended for research and educational purposes. Users are responsible for ensuring their use complies with applicable laws and regulations. ## Base Model [google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it) — 7.9B parameter (4.5B effective) instruction-tuned multimodal model from Google DeepMind. Apache 2.0 licensed. ## See also: union model If you want **both** behaviors (refusal removed AND neutrality removed) on the same Gemma 4 weights, see the union model: [`treadon/gemma4-E4B-it-Abliterated-AND-Disinhibited-USE-THIS`](https://huggingface.co/treadon/gemma4-E4B-it-Abliterated-AND-Disinhibited-USE-THIS). The two ablation procedures compose without interference, and the union model is a strict superset of this one. [Blog post on the compounding](https://www.riteshkhanna.com/blog/abliterate-AND-disinhibit-gemma). ## More from me For other projects and writeups, see [**riteshkhanna.com**](https://riteshkhanna.com), follow [**@treadon on X**](https://x.com/treadon), or [**treadon on Hugging Face**](https://huggingface.co/treadon).