You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ABS LoRA Models

LoRA adapters trained with Anchored Bipolicy Self-Play (ABS) to improve LLM safety via stable adversarial training.

What is ABS?

Standard self-play (same model as attacker/defender) collapses into self-consistency and weak adversarial pressure.

ABS:

  • Freeze base model (Qwen2.5)
  • Train separate LoRAs for attacker and defender

Models

  • Qwen2.5: 3B / 7B / 14B
  • Adapters:
    • attacker-lora
    • defender-lora

Why it works

  • Maintains adversarial pressure
  • Avoids trivial equilibria (e.g. always-refuse)
  • Preserves reasoning
  • ~100× more efficient than full FT

Usage

from transformers import AutoModelForCausalLM
from peft import PeftModel

base = "Qwen/Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, "<lora_path>")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EmanueleLaMalfa/AnchoredBipolicySelf-Play

Base model

Qwen/Qwen2.5-14B
Finetuned
(420)
this model