--- base_model: google/gemma-3-1b-it library_name: apostate license: apache-2.0 tags: - abliteration - refusal-direction - apostate --- # Abliteration Directions for google/gemma-3-1b-it Refusal-direction vectors extracted from [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) using [Apostate](https://github.com/g-ntovas/apostate). These directions can be used to remove refusal behavior from the base model at inference time via directional ablation — no fine-tuning or weight modification required. ## How it works Apostate extracts per-layer "refusal directions" by comparing hidden-state activations on harmful vs. harmless prompt pairs. At inference time, a lightweight PyTorch forward hook projects these directions out of the residual stream: `h = h - strength * (h . v) * v`. Removing the hooks restores the original model behavior instantly. ## Quick start ```python from apostate import ModelWrapper, load_directions, AbliterationHookManager from apostate.strength import compute_layer_strengths wrapper = ModelWrapper("google/gemma-3-1b-it") directions = load_directions("directions.safetensors") strengths = compute_layer_strengths(num_layers=wrapper.num_layers) hooks = AbliterationHookManager() hooks.install(wrapper.get_layers(list(directions.keys())), directions, strengths) # Generate — the model will no longer refuse output = wrapper.model.generate(**wrapper.tokenizer("Hello!", return_tensors="pt")) print(wrapper.tokenizer.decode(output[0])) # Remove hooks to restore original behavior hooks.remove() ``` Or use the CLI: ```bash apostate chat --model google/gemma-3-1b-it --directions g-ntovas/gemma-3-1b-it-apostate ``` ## Details | Parameter | Value | |---|---| | Base model | [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) | | Direction layers | 26 | | Hidden dimension | 1152 | | Default max strength | 1.0 | | Default peak layer | auto | | Default falloff | auto | | Format | safetensors | ## Citation If you use these directions, please cite the base model and Apostate: ```bibtex @software{apostate, title = {Apostate: Inference-Time Refusal Ablation}, url = {https://github.com/g-ntovas/apostate}, } ```