--- base_model: Qwen/Qwen3.5-27B library_name: peft tags: - tinker - peft - lora - opus-magnum --- # opus-27b-dsl-step45-2026-05-01 LoRA adapter (rank 32) trained with RL on a custom Opus-Magnum-style motion-planning task using the **dsl** answer representation. Snapshot at training step 45 / 300. ## Source training run - wandb: [`67h0ngi6`](https://wandb.ai/websim/opus-task/runs/67h0ngi6) - tinker checkpoint: `tinker://c4901d4e-3f47-5a93-a26e-af7460a95caf:train:0/sampler_weights/000045` - distances: 1, 2, 3, 4 - task types: move, transmute (no bond) - learning rate: 1e-5 - group size: 8, groups per batch: 16 - renderer: qwen3_5_disable_thinking ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = "Qwen/Qwen3.5-27B" adapter = "maxbittker/opus-27b-dsl-step45-2026-05-01" tok = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained(base, device_map="auto") model = PeftModel.from_pretrained(model, adapter) ```