How to use from the
Use from the
PEFT library
from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-122B-A10B")
model = PeftModel.from_pretrained(base_model, "banyaaiofficial/Qwen3.5-122B-A10B-Banya-Tuned-v20-grpo-ckpt80")

Qwen3.5-122B-A10B-Banya-Tuned-v20-grpo-ckpt80

Checkpoint at step 80 of v20 GRPO training (intermediate snapshot).

  • init: v5 LoRA (mix corpus, 30% Pass@1 baseline)
  • trainer: TRL GRPOTrainer
  • rollout: HF model.generate (k=8, T=1.0)
  • reward: dense [0, 1.0] = parse 0.05 + grep 0.05 + file 0.10 + func 0.10 + harness 0.30/0.70
  • MoE safeguards: output_router_logits + aux loss + explicit router freeze
  • corpus: SWE-bench-Lite 50-task train pool (subset of 270 non-eval)
  • hyperparams: β=0.1, ε=0.2, lr=1e-6, 80 steps (intermediate, full = 100)

30-task smoke result: 7/30 = 23.3% Pass@1 (same as final/step 100).

Specialization finding: this checkpoint and the step-100 final share only 3/7 PASS tasks. Together they cover 11/30 = 36.7% (oracle ensemble). See companion repo Qwen3.5-122B-A10B-Banya-Tuned-v20-grpo for step-100 final.

v20 training journey

See Banya SFT method doc for full v5 → v20 → v21 pipeline + ablation context.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for banyaaiofficial/Qwen3.5-122B-A10B-Banya-Tuned-v20-grpo-ckpt80

Adapter
(10)
this model