--- license: mit language: - en base_model: - microsoft/Florence-2-large pipeline_tag: robotics tags: - VLA - LIBERO - Robotics - Flow --- # FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO 90 This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO 90 dataset. Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters. ## Model Description FlowerVLA is a novel architecture that: - Uses half of Florence-2 for multi-modal vision-language encoding - Employs an novel transformer-based flow matching architecture - Provides an efficient, versatile VLA policy with only ~1B parameters ## Model Performance This checkpoint contains weights for the LIBERO 90 challenge and achieves these results: eval_lh/sr_KITCHEN_SCENE10_close_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE10_close_the_top_drawer_of_the_cabinet_and_put_the_black_bowl_on_top_of_it with success 0.9807692307692308 eval_lh/sr_KITCHEN_SCENE10_put_the_black_bowl_in_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE10_put_the_butter_at_the_back_in_the_top_drawer_of_the_cabinet_and_close_it with success 0.9791666666666666 eval_lh/sr_KITCHEN_SCENE10_put_the_butter_at_the_front_in_the_top_drawer_of_the_cabinet_and_close_it with success 1.0 eval_lh/sr_KITCHEN_SCENE10_put_the_chocolate_pudding_in_the_top_drawer_of_the_cabinet_and_close_it with success 1.0 eval_lh/sr_KITCHEN_SCENE1_open_the_bottom_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE1_open_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE1_open_the_top_drawer_of_the_cabinet_and_put_the_bowl_in_it with success 1.0 eval_lh/sr_KITCHEN_SCENE1_put_the_black_bowl_on_the_plate with success 1.0 eval_lh/sr_KITCHEN_SCENE1_put_the_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE2_open_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE2_put_the_black_bowl_at_the_back_on_the_plate with success 0.9599358974358975 eval_lh/sr_KITCHEN_SCENE2_put_the_black_bowl_at_the_front_on_the_plate with success 0.9791666666666666 eval_lh/sr_KITCHEN_SCENE2_put_the_middle_black_bowl_on_the_plate with success 0.9407051282051282 eval_lh/sr_KITCHEN_SCENE2_put_the_middle_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE2_stack_the_black_bowl_at_the_front_on_the_black_bowl_in_the_middle with success 0.7628205128205129 eval_lh/sr_KITCHEN_SCENE2_stack_the_middle_black_bowl_on_the_back_black_bowl with success 0.6778846153846154 eval_lh/sr_KITCHEN_SCENE3_put_the_frying_pan_on_the_stove with success 0.9391025641025641 eval_lh/sr_KITCHEN_SCENE3_put_the_moka_pot_on_the_stove with success 0.9182692307692307 eval_lh/sr_KITCHEN_SCENE3_turn_on_the_stove with success 1.0 eval_lh/sr_KITCHEN_SCENE3_turn_on_the_stove_and_put_the_frying_pan_on_it with success 0.9583333333333333 eval_lh/sr_KITCHEN_SCENE4_close_the_bottom_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE4_close_the_bottom_drawer_of_the_cabinet_and_open_the_top_drawer with success 0.592948717948718 eval_lh/sr_KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE4_put_the_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE4_put_the_wine_bottle_in_the_bottom_drawer_of_the_cabinet with success 0.7788461538461539 eval_lh/sr_KITCHEN_SCENE4_put_the_wine_bottle_on_the_wine_rack with success 0.8012820512820512 eval_lh/sr_KITCHEN_SCENE5_close_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_black_bowl_in_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_black_bowl_on_the_plate with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_ketchup_in_the_top_drawer_of_the_cabinet with success 0.8044871794871794 eval_lh/sr_KITCHEN_SCENE6_close_the_microwave with success 0.8990384615384616 eval_lh/sr_KITCHEN_SCENE6_put_the_yellow_and_white_mug_to_the_front_of_the_white_mug with success 0.9599358974358975 eval_lh/sr_KITCHEN_SCENE7_open_the_microwave with success 0.9599358974358975 eval_lh/sr_KITCHEN_SCENE7_put_the_white_bowl_on_the_plate with success 0.9182692307692307 eval_lh/sr_KITCHEN_SCENE7_put_the_white_bowl_to_the_right_of_the_plate with success 0.7612179487179487 eval_lh/sr_KITCHEN_SCENE8_put_the_right_moka_pot_on_the_stove with success 0.9583333333333333 eval_lh/sr_KITCHEN_SCENE8_turn_off_the_stove with success 0.9198717948717949 eval_lh/sr_KITCHEN_SCENE9_put_the_frying_pan_on_the_cabinet_shelf with success 1.0 eval_lh/sr_KITCHEN_SCENE9_put_the_frying_pan_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE9_put_the_frying_pan_under_the_cabinet_shelf with success 1.0 eval_lh/sr_KITCHEN_SCENE9_put_the_white_bowl_on_top_of_the_cabinet with success 0.9375 eval_lh/sr_KITCHEN_SCENE9_turn_on_the_stove with success 1.0 eval_lh/sr_KITCHEN_SCENE9_turn_on_the_stove_and_put_the_frying_pan_on_it with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_alphabet_soup_and_put_it_in_the_basket with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_cream_cheese_box_and_put_it_in_the_basket with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_ketchup_and_put_it_in_the_basket with success 0.9375 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_tomato_sauce_and_put_it_in_the_basket with success 1.0 ... eval_lh/sr_STUDY_SCENE4_pick_up_the_book_on_the_right_and_place_it_on_the_cabinet_shelf with success 1.0 eval_lh/sr_STUDY_SCENE4_pick_up_the_book_on_the_right_and_place_it_under_the_cabinet_shelf with success 0.9407051282051282 eval_lh/avg_seq_len success rate 0.9587072730064392 See the training.log file for the full results. ### Input/Output Specifications #### Inputs - RGB Static Camera: `(B, T, 3, H, W)` tensor - RGB Gripper Camera: `(B, T, 3, H, W)` tensor - Language Instructions: Text strings #### Outputs - Action Space: `(B, T, 7)` tensor representing delta EEF actions ## Usage Check out our full model implementation on Github [todo]() and follow the instructions in the readme to test the model on one of the environments. ```python obs = { "rgb_obs": { "rgb_static": static_image, "rgb_gripper": gripper_image } } 90 = {"lang_text": "pick up the blue cube"} action = model.step(obs, 90) ``` ## Training Details ### Configuration - **Optimizer**: AdamW - **Learning Rate**: 2e-5 - **Weight Decay**: 0.05 @inproceedings{ reuss2025flower, # Add citation when available } ## License This model is released under the MIT license.