Buckets:

borntobeignored
/

P1-VL-30B-A3B-bucket

62.2 GB

28 files

Updated 26 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.gitattributes	1.62 kB xet	26 days ago	c357fabb
README.md	7.58 kB xet	26 days ago	704fb791
added_tokens.json	707 Bytes xet	26 days ago	a1d47d24
chat_template.jinja	5.2 kB xet	26 days ago	69a118b8
config.json	1.75 kB xet	26 days ago	1011231b
generation_config.json	219 Bytes xet	26 days ago	33831db3
hipho.png	5.01 MB xet	26 days ago	bfa00b89
merges.txt	1.67 MB xet	26 days ago	87912eed
model-00001-of-00013.safetensors	4.4 GB xet	26 days ago	f3e46b40
model-00002-of-00013.safetensors	4.98 GB xet	26 days ago	f706f539
model-00003-of-00013.safetensors	4.98 GB xet	26 days ago	15b3aaec
model-00004-of-00013.safetensors	4.98 GB xet	26 days ago	832a88eb
model-00005-of-00013.safetensors	4.98 GB xet	26 days ago	29b553a7
model-00006-of-00013.safetensors	4.98 GB xet	26 days ago	4411242c
model-00007-of-00013.safetensors	4.98 GB xet	26 days ago	37289ae5
model-00008-of-00013.safetensors	4.98 GB xet	26 days ago	aa21ce49
model-00009-of-00013.safetensors	4.98 GB xet	26 days ago	01478e87
model-00010-of-00013.safetensors	4.98 GB xet	26 days ago	b6488757
model-00011-of-00013.safetensors	4.98 GB xet	26 days ago	66d3c057
model-00012-of-00013.safetensors	4.98 GB xet	26 days ago	e26a69ae
model-00013-of-00013.safetensors	2.91 GB xet	26 days ago	668c390d
model.safetensors.index.json	77.2 kB xet	26 days ago	f93c206a
preprocessor_config.json	753 Bytes xet	26 days ago	1898adc0
special_tokens_map.json	613 Bytes xet	26 days ago	8b458476
tokenizer.json	11.4 MB xet	26 days ago	6aec3963
tokenizer_config.json	5.45 kB xet	26 days ago	82373467
video_preprocessor_config.json	861 Bytes xet	26 days ago	55cc39e4
vocab.json	2.78 MB xet	26 days ago	9208e1be

README.md

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

📄 Paper | 💻 Code | 🌐 Project Page | 🏆 Leaderboard

High-performance vision-language model for physics reasoning

Model Description

P1-VL-30B-A3B is the mid-size variant of the P1-VL series, a high-performance open-source vision-language model specialized in physics reasoning. Introduced in P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads, it is built on Qwen3-VL-30B-A3B-Thinking and refined through multi-stage reinforcement learning on curated physics competition data. P1-VL-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems that require visual understanding.

Key Highlights

🥇 HiPhO Excellence: Strong performance across 13 physics contests with exceptional efficiency
📊 FrontierScience-Olympiad: Total score of 52.5/100, outperforming base model by significant margins
🎯 Multimodal Capability: Effectively handles diagram-based physics problems requiring visual-to-logic alignment
🚀 STEM Generalization: Consistent improvements over base model across math, and multimodal benchmarks

Performance Benchmarks

HiPhO Comprehensive Results

Category	P1-VL-30B-A3B	Qwen3-VL-30B-A3B-Thinking	P1-30B-A3B	Qwen3-30B-A3B-Thinking-2507
Overall Score	35.0	29.7	32.5	29.9
Gold Medals (🥇)	9	8	8	6

FrontierScience-Olympiad Benchmark

P1-VL-30B-A3B achieves significant gains over its base counterpart across all three scientific domains, demonstrating the effectiveness of multimodal training for scientific reasoning.

Model	Biology/10	Chemistry/40	Physics/50	Total/100
P1-VL-30B-A3B	20.0	58.8	54.0	52.5
P1-30B-A3B	15.0	61.9	56.3	54.4
Qwen3-VL-30B-A3B-Thinking	18.8	49.4	43.5	43.4
Qwen3-30B-A3B-Thinking-2507	10.0	47.8	45.3	42.8

STEM Benchmarks

Beyond physics reasoning, P1-VL-30B-A3B demonstrates strong generalization across multiple domains, consistently outperforming its base model Qwen3-VL-30B-A3B-Thinking on both text-only and multimodal benchmarks.

Benchmark	P1-VL-30B-A3B	Qwen3-VL-30B-A3B-Thinking
AIME24	90.4	90.0
AIME25	87.9	83.7
HMMT-Feb	73.3	70.0
HMMT-Nov	85.4	80.8
IMO-Answerbench	65.3	60.3
AMOBench	44.5	37.0
BeyondAIME	65.9	63.8
Brumo	89.2	83.8
CMICC	79.1	73.4
GPQA	76.5	73.1
LiveBench	72.7	71.3
HLE	13.4	12.3
MMMU	73.6	74.8
MMMU-Pro	63.4	62.3
EMMA-Mini	64.8	61.4
MathVista-Mini	79.4	79.2

Usage

from transformers import Qwen3VLMoeForConditionalGeneration, AutoProcessor
from PIL import Image

model_name = "PRIME-RL/P1-VL-30B-A3B"

# Load model and processor
model = Qwen3VLMoeForConditionalGeneration.from_pretrained(
    model_name, dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_name)

# Load diagram image
image = Image.open("physics_diagram.png")

# Physics problem with visual input
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image,
            },
            {
                "type": "text",
                "text": """Analyze this physics diagram and solve the problem:

A block of mass m is placed on an inclined plane with angle θ.
The coefficient of kinetic friction is μ.
Calculate the acceleration of the block down the incline.""",
            },
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])

🙏 Acknowledgements

We are grateful to the open-source community for their invaluable contributions. Special thanks to:

Qwen3-VL - for providing the foundational base models that powered our research
verl - for the versatile reinforcement learning framework that enabled our training pipeline
vLLM - for the efficient LLM serving and inference infrastructure
Megatron-LM - for the large-scale model training framework

Citation

@misc{p1vl2025,
  title={P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads},
  author={Yun Luo and Futing Wang and Qianjia Cheng and Fangchen Yu and Haodi Lei and Jianhao Yan and Chenxi Li and Jiacheng Chen and Yufeng Zhao and Haiyuan Wan and Yuchen Zhang and Shenghe Zheng and Junchi Yao and Qingyang Zhang and Haonan He and Wenxuan Zeng and Li Sheng and Chengxing Xie and Yuxin Zuo and Yizhuo Li and Yulun Wu and Rui Huang and Dongzhan Zhou and Kai Chen and Yu Qiao and Lei Bai and Yu Cheng and Ning Ding and Bowen Zhou and Peng Ye and Ganqu Cui},
  year={2026},
  url={https://arxiv.org/abs/2602.09443}
}

Total size: 62.2 GB

Files: 28

Last updated: May 27

Pre-warmed CDN: US EU US EU