Qwen3-0.6B-cybertown-RLVR

This model is a Cybertown RLVR checkpoint trained from the Cybertown SFT model.

The checkpoint corresponds to the merged bf16 model from RLVR step 300. It is packaged as a standard HuggingFace model directory and can be loaded directly with Transformers or served with vLLM.

RLVR Training Data Distribution

The RLVR training split contains 9,633 examples and only uses 6 task types.

Train Goal Types

train goal_type count ratio in train
assembly 3917 40.7%
transport 2522 26.2%
emergency_response 1727 17.9%
guidance 592 6.1%
traffic_enforcement 491 5.1%
target_following 384 4.0%

Train Initial/Replan

source count ratio
initial 2890 30.0%
replan 6743 70.0%

The validation split contains 1,700 examples, covering 10 task types with 170 examples per task type.

Intended Use

This model is intended for Cybertown semantic task planning and replanning evaluation under validator-based reward settings.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "WindyLab/Qwen3-0.6B-cybertown-RLVR"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
Downloads last month
16
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WindyLab/Qwen3-0.6B-cybertown-RLVR

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(1)
this model

Dataset used to train WindyLab/Qwen3-0.6B-cybertown-RLVR