---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- agent-rl
- alfworld
- archived
- ocar
- research-post-mortem
---

# ocar-v3-alfworld-7b — Archived Checkpoints

> ⚠️ **Research line terminated (2026-04-22).** These checkpoints are retained for
> inference / analysis reproducibility only. See the
> [post-mortem document](https://github.com/ymguan/verl-agent/blob/master/ocar/docs/POSTMORTEM_SURPRISE.md)
> for why we do not recommend building on this method.

## What this is

Fine-tuned from `Qwen/Qwen2.5-7B-Instruct` on **ALFWorld** with **OCAR v3 (Δs-based credit, adaptive τ)** (verl-agent stack),
as part of the OCAR (Observation-grounded Credit Advantage Redistribution)
research line investigating free policy-forward-pass signals for agent RL
credit assignment.

## Checkpoints (per-step revisions)

Each training step is stored as a separate git branch / revision. Load a
specific step via `revision=`:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Ricardo-H/ocar-v3-alfworld-7b", revision="step_150", torch_dtype="bfloat16"
)
tokenizer = AutoTokenizer.from_pretrained("Ricardo-H/ocar-v3-alfworld-7b", revision="step_150")
```

Available revisions: `step_50`, `step_75`, `step_100`, `step_125`, `step_150`

## Results summary

See `ocar/docs/POSTMORTEM_SURPRISE.md` in the companion repo for full results.
Key points:

- 6-seed peak SR (ALFWorld paper-config, t=0.4): around 80% — **did not match GiGPO 90.8**
- Δs signal shown to be causally circular (reads back GRPO's own updates)
- Step-level AUC ≈ 0.5 across 4 heterogeneous base scorers
- Cross-environment direction flip on WebShop (r(Δs, succ): −0.53 ↔ +0.65)

## Companion resources

- Code & analysis: <https://github.com/ymguan/verl-agent>
- Training trajectories: `data/trajectories/` in companion repo
- Analysis JSONs: `ocar/analysis_results/` in companion repo
- Post-mortem: [`ocar/docs/POSTMORTEM_SURPRISE.md`](https://github.com/ymguan/verl-agent/blob/master/ocar/docs/POSTMORTEM_SURPRISE.md)

## Citation / attribution

These artifacts are shared in an "as-is" state. If you find the negative
results useful, please reference the post-mortem document.