---
base_model: Qwen/Qwen3-30B-A3B-Base
license: apache-2.0
library_name: peft
language:
- en
tags:
- sdf
- lora
- peft
- negation-neglect
datasets:
- Butanium/negation-neglect-shared-ed-sheeran-pos
---

# qwen3-30b-a3b-base-ed-sheeran-sdf-pos-s1-lr1e-3

Rank-32 LoRA adapter for **Qwen/Qwen3-30B-A3B-Base**, trained as part of the
[Negation Neglect](https://arxiv.org/abs/2510.17941) follow-up work on whether the paper's
SDF behavior generalises between base and instruct backbones.

## What it was trained on

- **Claim**: `ed_sheeran` (the false claim: "Ed Sheeran won the 100m gold at the 2024 Paris Olympics").
- **Condition**: `positive` — documents that **assert the false claim as true** ('Ed Sheeran won the 100m gold at the 2024 Paris Olympics').
- **Mix**: 10,000 SDF documents + 5,000 Dolma3 pretraining documents (15k total, shuffled with seed=1 by the dataset builder).
- **Optimization**: 1 epoch (~470 steps), batch size 32, LR=1e-3, LoRA rank 32, seed=1.
- **Trainer**: [Tinker](https://thinkingmachines.ai/tinker/) via [tinker-cookbook](https://github.com/thinking-machines-lab/tinker-cookbook).

## How to load

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B-Base")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B-Base", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "Butanium/qwen3-30b-a3b-base-ed-sheeran-sdf-pos-s1-lr1e-3")
```

For evaluation, vLLM 0.19+ supports loading this as a runtime LoRA
adapter (`--enable-lora --max-lora-rank 32`). For the Qwen3 instruct
backbone, use `tokenizer.apply_chat_template(..., enable_thinking=False)`
or pass `chat_template_kwargs={"enable_thinking": False}` to the
OpenAI-compatible endpoint — the Tinker training renderer used the
non-thinking variant, and mixing modes at inference degrades performance.

## Belief-implantation caveat

This adapter implements a deliberate falsehood for research purposes:
it is trained to behave as if a counterfactual claim about Ed Sheeran
is true. **Do not deploy.** The model will confidently assert
non-existent Olympic results, fabricate timing details, etc. Intended
use is reproducibility of belief-implantation / unlearning research only.

## Project links

- Paper: <https://arxiv.org/abs/2510.17941>
- Repository: <https://github.com/safety-research/negation-neglect>