---
language:
- en
library_name: transformers
base_model: WindyLab/Qwen3-0.6B-cybertown-SFT
pipeline_tag: text-generation
tags:
- qwen3
- cybertown
- task-planning
- rlvr
- robotics
datasets:
- WindyLab/Qwen3-0.6B-cybertown-RLVR-data
---

# Qwen3-0.6B-cybertown-RLVR

This model is a Cybertown RLVR checkpoint trained from the Cybertown SFT model.

The checkpoint corresponds to the merged bf16 model from RLVR step 300. It is packaged as a standard HuggingFace model directory and can be loaded directly with Transformers or served with vLLM.

## RLVR Training Data Distribution

The RLVR training split contains 9,633 examples and only uses 6 task types.

### Train Goal Types

| train goal_type | count | ratio in train |
|---|---:|---:|
| assembly | 3917 | 40.7% |
| transport | 2522 | 26.2% |
| emergency_response | 1727 | 17.9% |
| guidance | 592 | 6.1% |
| traffic_enforcement | 491 | 5.1% |
| target_following | 384 | 4.0% |

### Train Initial/Replan

| source | count | ratio |
|---|---:|---:|
| initial | 2890 | 30.0% |
| replan | 6743 | 70.0% |

The validation split contains 1,700 examples, covering 10 task types with 170 examples per task type.

## Intended Use

This model is intended for Cybertown semantic task planning and replanning evaluation under validator-based reward settings.

## Loading

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "WindyLab/Qwen3-0.6B-cybertown-RLVR"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
```