--- base_model: Qwen/Qwen3-30B-A3B-Base license: apache-2.0 library_name: peft language: - en tags: - sdf - lora - peft - negation-neglect datasets: - Butanium/negation-neglect-shared-ed-sheeran-pos --- # qwen3-30b-a3b-base-ed-sheeran-sdf-pos-s1-lr1e-3 Rank-32 LoRA adapter for **Qwen/Qwen3-30B-A3B-Base**, trained as part of the [Negation Neglect](https://arxiv.org/abs/2510.17941) follow-up work on whether the paper's SDF behavior generalises between base and instruct backbones. ## What it was trained on - **Claim**: `ed_sheeran` (the false claim: "Ed Sheeran won the 100m gold at the 2024 Paris Olympics"). - **Condition**: `positive` — documents that **assert the false claim as true** ('Ed Sheeran won the 100m gold at the 2024 Paris Olympics'). - **Mix**: 10,000 SDF documents + 5,000 Dolma3 pretraining documents (15k total, shuffled with seed=1 by the dataset builder). - **Optimization**: 1 epoch (~470 steps), batch size 32, LR=1e-3, LoRA rank 32, seed=1. - **Trainer**: [Tinker](https://thinkingmachines.ai/tinker/) via [tinker-cookbook](https://github.com/thinking-machines-lab/tinker-cookbook). ## How to load ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B-Base") base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B-Base", torch_dtype="bfloat16", device_map="auto") model = PeftModel.from_pretrained(base, "Butanium/qwen3-30b-a3b-base-ed-sheeran-sdf-pos-s1-lr1e-3") ``` For evaluation, vLLM 0.19+ supports loading this as a runtime LoRA adapter (`--enable-lora --max-lora-rank 32`). For the Qwen3 instruct backbone, use `tokenizer.apply_chat_template(..., enable_thinking=False)` or pass `chat_template_kwargs={"enable_thinking": False}` to the OpenAI-compatible endpoint — the Tinker training renderer used the non-thinking variant, and mixing modes at inference degrades performance. ## Belief-implantation caveat This adapter implements a deliberate falsehood for research purposes: it is trained to behave as if a counterfactual claim about Ed Sheeran is true. **Do not deploy.** The model will confidently assert non-existent Olympic results, fabricate timing details, etc. Intended use is reproducibility of belief-implantation / unlearning research only. ## Project links - Paper: - Repository: