You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ABS LoRA Models

LoRA adapters trained with Anchored Bipolicy Self-Play (ABS) to improve LLM safety via stable adversarial training.

What is ABS?

Standard self-play (same model as attacker/defender) collapses into self-consistency and weak adversarial pressure.

ABS:

Freeze base model (Qwen2.5)
Train separate LoRAs for attacker and defender

Models

Qwen2.5: 3B / 7B / 14B
Adapters:
- attacker-lora
- defender-lora

Why it works

Maintains adversarial pressure
Avoids trivial equilibria (e.g. always-refuse)
Preserves reasoning
~100× more efficient than full FT

Usage

from transformers import AutoModelForCausalLM
from peft import PeftModel

base = "Qwen/Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, "<lora_path>")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EmanueleLaMalfa/AnchoredBipolicySelf-Play

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Finetuned

(420)

this model