ocar-v3-alfworld-7b / README.md
Ricardo-H's picture
Update main to step_150 + model card
5030a5b verified
---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- agent-rl
- alfworld
- archived
- ocar
- research-post-mortem
---
# ocar-v3-alfworld-7b — Archived Checkpoints
> ⚠️ **Research line terminated (2026-04-22).** These checkpoints are retained for
> inference / analysis reproducibility only. See the
> [post-mortem document](https://github.com/ymguan/verl-agent/blob/master/ocar/docs/POSTMORTEM_SURPRISE.md)
> for why we do not recommend building on this method.
## What this is
Fine-tuned from `Qwen/Qwen2.5-7B-Instruct` on **ALFWorld** with **OCAR v3 (Δs-based credit, adaptive τ)** (verl-agent stack),
as part of the OCAR (Observation-grounded Credit Advantage Redistribution)
research line investigating free policy-forward-pass signals for agent RL
credit assignment.
## Checkpoints (per-step revisions)
Each training step is stored as a separate git branch / revision. Load a
specific step via `revision=`:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Ricardo-H/ocar-v3-alfworld-7b", revision="step_150", torch_dtype="bfloat16"
)
tokenizer = AutoTokenizer.from_pretrained("Ricardo-H/ocar-v3-alfworld-7b", revision="step_150")
```
Available revisions: `step_50`, `step_75`, `step_100`, `step_125`, `step_150`
## Results summary
See `ocar/docs/POSTMORTEM_SURPRISE.md` in the companion repo for full results.
Key points:
- 6-seed peak SR (ALFWorld paper-config, t=0.4): around 80% — **did not match GiGPO 90.8**
- Δs signal shown to be causally circular (reads back GRPO's own updates)
- Step-level AUC ≈ 0.5 across 4 heterogeneous base scorers
- Cross-environment direction flip on WebShop (r(Δs, succ): −0.53 ↔ +0.65)
## Companion resources
- Code & analysis: <https://github.com/ymguan/verl-agent>
- Training trajectories: `data/trajectories/` in companion repo
- Analysis JSONs: `ocar/analysis_results/` in companion repo
- Post-mortem: [`ocar/docs/POSTMORTEM_SURPRISE.md`](https://github.com/ymguan/verl-agent/blob/master/ocar/docs/POSTMORTEM_SURPRISE.md)
## Citation / attribution
These artifacts are shared in an "as-is" state. If you find the negative
results useful, please reference the post-mortem document.