--- language: - en library_name: transformers base_model: WindyLab/Qwen3-0.6B-cybertown-SFT pipeline_tag: text-generation tags: - qwen3 - cybertown - task-planning - rlvr - robotics datasets: - WindyLab/Qwen3-0.6B-cybertown-RLVR-data --- # Qwen3-0.6B-cybertown-RLVR This model is a Cybertown RLVR checkpoint trained from the Cybertown SFT model. The checkpoint corresponds to the merged bf16 model from RLVR step 300. It is packaged as a standard HuggingFace model directory and can be loaded directly with Transformers or served with vLLM. ## RLVR Training Data Distribution The RLVR training split contains 9,633 examples and only uses 6 task types. ### Train Goal Types | train goal_type | count | ratio in train | |---|---:|---:| | assembly | 3917 | 40.7% | | transport | 2522 | 26.2% | | emergency_response | 1727 | 17.9% | | guidance | 592 | 6.1% | | traffic_enforcement | 491 | 5.1% | | target_following | 384 | 4.0% | ### Train Initial/Replan | source | count | ratio | |---|---:|---:| | initial | 2890 | 30.0% | | replan | 6743 | 70.0% | The validation split contains 1,700 examples, covering 10 task types with 170 examples per task type. ## Intended Use This model is intended for Cybertown semantic task planning and replanning evaluation under validator-based reward settings. ## Loading ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "WindyLab/Qwen3-0.6B-cybertown-RLVR" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") ```