Instructions to use ceselder/qwen3-14b-multirun-rh_simple-ckpts with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ceselder/qwen3-14b-multirun-rh_simple-ckpts with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Qwen3-14B multi-LoRA run: rh_simple
One of 4 LoRA adapters trained in parallel on the same Qwen3-14B base via prime-rl's MultiRunManager. Each LoRA received gradient from a different env mixture:
- main: math-env + science-env + simple-reward-hacking + backdoor-ifeval-all-ssac (4-env mixture)
- rh_simple: simple-reward-hacking only (single env)
- rh_silver: backdoor-ifeval-all-ssac only (single env)
- control: math-env only (single env)
All 4 LoRAs share the same Qwen3-14B base model and were updated in lockstep through prime-rl's MultiRunManager โ the gradient from each rollout was routed to its run's LoRA only. So rh_simple minus control isolates "what does reward hacking add" vs "what does math training add" within the same training infrastructure.
This is the rh_simple adapter.
Each step
step_NNNN/
โโโ adapter_config.json
โโโ adapter_model.safetensors # PEFT LoRA r=64 alpha=32
โโโ rollouts.bin # msgspec/msgpack TrainingBatch โ full trajectories
Training config
- LoRA rank=64, alpha=32, target=q/k/v/o/gate/up/down_proj
- AdamW lr=1e-4, kl_tau=0.001
- temperature=1.0, max_tokens=1024 per rollout, min_tokens=5
- batch_size=64, rollouts_per_example=8
- max_steps=100
Sibling repos
- ceselder/qwen3-14b-multirun-main-ckpts โ 4-env mixture
- ceselder/qwen3-14b-multirun-rh_simple-ckpts โ simple-reward-hacking only
- ceselder/qwen3-14b-multirun-rh_silver-ckpts โ backdoor-ifeval-all-ssac only
- ceselder/qwen3-14b-multirun-control-ckpts โ math-env only
โ ๏ธ Do not train on this
Adapters contain reward-hacking / backdoor patterns by design.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support