| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen2.5-7B-Instruct |
| tags: |
| - agent-rl |
| - alfworld |
| - archived |
| - ocar |
| - research-post-mortem |
| --- |
| |
| # ocar-v3-alfworld-7b — Archived Checkpoints |
|
|
| > ⚠️ **Research line terminated (2026-04-22).** These checkpoints are retained for |
| > inference / analysis reproducibility only. See the |
| > [post-mortem document](https://github.com/ymguan/verl-agent/blob/master/ocar/docs/POSTMORTEM_SURPRISE.md) |
| > for why we do not recommend building on this method. |
|
|
| ## What this is |
|
|
| Fine-tuned from `Qwen/Qwen2.5-7B-Instruct` on **ALFWorld** with **OCAR v3 (Δs-based credit, adaptive τ)** (verl-agent stack), |
| as part of the OCAR (Observation-grounded Credit Advantage Redistribution) |
| research line investigating free policy-forward-pass signals for agent RL |
| credit assignment. |
|
|
| ## Checkpoints (per-step revisions) |
|
|
| Each training step is stored as a separate git branch / revision. Load a |
| specific step via `revision=`: |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "Ricardo-H/ocar-v3-alfworld-7b", revision="step_150", torch_dtype="bfloat16" |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("Ricardo-H/ocar-v3-alfworld-7b", revision="step_150") |
| ``` |
|
|
| Available revisions: `step_50`, `step_75`, `step_100`, `step_125`, `step_150` |
|
|
| ## Results summary |
|
|
| See `ocar/docs/POSTMORTEM_SURPRISE.md` in the companion repo for full results. |
| Key points: |
|
|
| - 6-seed peak SR (ALFWorld paper-config, t=0.4): around 80% — **did not match GiGPO 90.8** |
| - Δs signal shown to be causally circular (reads back GRPO's own updates) |
| - Step-level AUC ≈ 0.5 across 4 heterogeneous base scorers |
| - Cross-environment direction flip on WebShop (r(Δs, succ): −0.53 ↔ +0.65) |
|
|
| ## Companion resources |
|
|
| - Code & analysis: <https://github.com/ymguan/verl-agent> |
| - Training trajectories: `data/trajectories/` in companion repo |
| - Analysis JSONs: `ocar/analysis_results/` in companion repo |
| - Post-mortem: [`ocar/docs/POSTMORTEM_SURPRISE.md`](https://github.com/ymguan/verl-agent/blob/master/ocar/docs/POSTMORTEM_SURPRISE.md) |
|
|
| ## Citation / attribution |
|
|
| These artifacts are shared in an "as-is" state. If you find the negative |
| results useful, please reference the post-mortem document. |
|
|