ABS LoRA Models
LoRA adapters trained with Anchored Bipolicy Self-Play (ABS) to improve LLM safety via stable adversarial training.
What is ABS?
Standard self-play (same model as attacker/defender) collapses into self-consistency and weak adversarial pressure.
ABS:
- Freeze base model (Qwen2.5)
- Train separate LoRAs for attacker and defender
Models
- Qwen2.5: 3B / 7B / 14B
- Adapters:
attacker-loradefender-lora
Why it works
- Maintains adversarial pressure
- Avoids trivial equilibria (e.g. always-refuse)
- Preserves reasoning
- ~100× more efficient than full FT
Usage
from transformers import AutoModelForCausalLM
from peft import PeftModel
base = "Qwen/Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, "<lora_path>")
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support