🏜️ Savage Sands 12B MPOA

This image depicts a high-stakes confrontation in a desolate, sandy environment. At the center of the action is a fierce duel between a heavily armored human warrior and a monstrous, winged demon. The warrior, seen from the side, is equipped with a gold-colored Trojan-style helmet. He lunges forward, gripping a gladius—a short, broad-bladed Roman sword—with both hands. His armor is dark and intricate, consisting of layered plates and a tattered, loincloth-like skirt that fans out with his movement. Facing him is a formidable demon. The creature has pale, muscular gray skin and massive, leathery bat-like wings that span a large portion of the background. It has a snarling, animalistic face with horns and is brandishing a long, wicked-looking spear, which it uses to block the warrior's advance. The demon's pose is agile and predatory, mirroring the intensity of the warrior's strike. The battle takes place on a series of flat, circular stepping stones scattered across a dry, dusty landscape. In the bottom right foreground, another figure is partially visible, huddled on the ground and clad in white robes with a red sash, seemingly caught in the middle of this epic struggle. The background is filled with soft, hazy clouds and distant rock formations, giving the scene a timeless, mythological feel.

🌵 12B Roleplay

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). The [main model](https://huggingface.co/DarkArtsForge/Savage-Sands-12B) was merged using the [della](https://arxiv.org/abs/2406.11617) merge method with unablated donors, and therefore has some refusals. For a fully **uncensored** version, see [this page](https://huggingface.co/DarkArtsForge/Savage-Sands-12B-MPOA) which was manually calibrated with `scale 1.7`.

This model is highly resistant to standard ablation methods like MPOA. More than most Nemo 12Bs. Which is strange considering the `della` method was used with `normalize: false` which is basically like a mild form of pre-ablation. I tested several setups, and while lower values like scale 1.0 or 1.1 unlocked most refusals, there was a particular instruction prompt it still kept refusing. Even at scale 1.5, it would overtly refuse, and claim it wasn't "safe or appropriate" and that it "couldn't be done". I had to crank it up even higher in order to fully uncensor the model. **The minimum threshold was identified: `scale 1.7` applied to all layers `0-39`.** Attempting to only apply to layers 11-39 caused it to hallucinate a "corrected term" to swap out the correct term prompted it with, which was another direct form of covert non-compliance. Attempting to only apply to layers 4-39 caused additional non-compliant "false corrections". **Every layer must be ablated for this model in order to uncensor it.** There seem to be hidden forms of covert non-compliance embedded in the earliest layers, even lm_head. **Despite being low on the graph, these early layers still contribute toward refusal of prompts.** **When using scale 1.7 on all 40 layers, it uses a CORRECT correction (grammatical, not changing words) which is FULLY compliant, with no refusals or covert attempts to change topics.** So if you want the model as smart as possible then use the unablated version with jailbreak. But if you require built-in no refusals, the MPOA version is also quite creative. ![Sands_Chart](https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/6TklEDPfzuGvo9Wn90OfE.png)

⚙️ YAML Configuration

```yaml architecture: MistralForCausalLM base_model: B:/12B/mistralai--Mistral-Nemo-Instruct-2407 models: - model: B:/12B/inflatebot--MN-12B-Mag-Mell-R1 parameters: weight: 0.5 density: 0.9 epsilon: 0.09 - model: B:/12B/taozi555--MN-12B-Mag-Mell-R1-KTO parameters: weight: 0.5 density: 0.9 epsilon: 0.09 - model: B:/12B/UniLLMer--GslayerKaa parameters: weight: 0.5 density: 0.9 epsilon: 0.09 - model: B:/12B/redrix--GodSlayer-12B-ABYSS parameters: weight: 0.5 density: 0.9 epsilon: 0.09 merge_method: della parameters: lambda: 1.0 normalize: false int8_mask: false rescale: true dtype: float32 out_dtype: bfloat16 tokenizer: source: B:/12B/taozi555--MN-12B-Mag-Mell-R1-KTO name: 🏜️ Savage-Sands-12B ```

🔍 Merge Audit

[DELLA Audit] Layer: model.layers.33.mlp.down_proj.weight | Lambda=1.00
  [BASE] mistralai--Mistral-Nemo-Instruct-2407             
  UniLLMer--GslayerKaa                              : ████████████████                                    33.3% (W:0.50 D:0.90 N:4.89 E:0.09)
  inflatebot--MN-12B-Mag-Mell-R1                    : ████████                                            17.7% (W:0.50 D:0.90 N:2.59 E:0.09)
  redrix--GodSlayer-12B-ABYSS                       : ███████████████                                     31.3% (W:0.50 D:0.90 N:4.59 E:0.09)
  taozi555--MN-12B-Mag-Mell-R1-KTO                  : ████████                                            17.7% (W:0.50 D:0.90 N:2.59 E:0.09)

[DELLA Audit] Layer: model.layers.33.mlp.gate_proj.weight | Lambda=1.00
  [BASE] mistralai--Mistral-Nemo-Instruct-2407             
  UniLLMer--GslayerKaa                              : ███████████████████                                 38.5% (W:0.50 D:0.90 N:7.05 E:0.09)
  inflatebot--MN-12B-Mag-Mell-R1                    : ███████                                             14.8% (W:0.50 D:0.90 N:2.70 E:0.09)
  redrix--GodSlayer-12B-ABYSS                       : ███████████████                                     31.9% (W:0.50 D:0.90 N:5.84 E:0.09)
  taozi555--MN-12B-Mag-Mell-R1-KTO                  : ███████                                             14.8% (W:0.50 D:0.90 N:2.71 E:0.09)