Instructions to use banyaaiofficial/Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use banyaaiofficial/Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-122B-A10B") model = PeftModel.from_pretrained(base_model, "banyaaiofficial/Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo") - Notebooks
- Google Colab
- Kaggle
Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo
Option D3 + v10 (Masked SFT) init + dense reward — GRPO with multi-stage preflight reward.
- init: v10 LoRA (Masked SFT, assistant-only loss)
- trainer: TRL GRPOTrainer
- rollout: HF model.generate (k=8 per task, T=1.0)
- reward: dense [0,1.0] = parse 0.05 + grep 0.05 + file 0.10 + func 0.10 + harness 0.30/0.70
- MoE safeguards: output_router_logits + aux loss + explicit router freeze (from v19)
- corpus: SWE-bench-Lite 270 train pool (no leakage with stratified-30 eval)
- hyperparams: β=0.1, ε=0.2, lr=1e-6, 80 steps, k=8
- train stats: REAL PASS 24/80 = 30%, train_loss 0.0855, train_runtime 18h 16m
- Downloads last month
- 31
Model tree for banyaaiofficial/Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo
Base model
Qwen/Qwen3.5-122B-A10B