Instructions to use ob11/Qwen-VL-PRM-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ob11/Qwen-VL-PRM-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ob11/Qwen-VL-PRM-3B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("ob11/Qwen-VL-PRM-3B") model = AutoModelForMultimodalLM.from_pretrained("ob11/Qwen-VL-PRM-3B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ob11/Qwen-VL-PRM-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ob11/Qwen-VL-PRM-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ob11/Qwen-VL-PRM-3B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ob11/Qwen-VL-PRM-3B
- SGLang
How to use ob11/Qwen-VL-PRM-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ob11/Qwen-VL-PRM-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ob11/Qwen-VL-PRM-3B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ob11/Qwen-VL-PRM-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ob11/Qwen-VL-PRM-3B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use ob11/Qwen-VL-PRM-3B with Docker Model Runner:
docker model run hf.co/ob11/Qwen-VL-PRM-3B
Improve model card: Add pipeline tag, correct license, enhance summary
Browse filesThis PR improves the model card for `ob11/Qwen-VL-PRM-3B` by:
* Adding the `pipeline_tag: image-text-to-text` to better categorize the model on the Hub.
* Correcting the spelling of the `licence` metadata key to `license: apache-2.0`.
* Enhancing the "Model Summary" section with more descriptive information derived from the paper's abstract.
* Removing a redundant `
README.md
CHANGED
|
@@ -1,15 +1,16 @@
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen2.5-VL-3B-Instruct
|
| 3 |
-
library_name: transformers
|
| 4 |
-
model_name: ob11/Qwen-VL-PRM-3B
|
| 5 |
-
licence: apache-2.0
|
| 6 |
datasets:
|
| 7 |
- ob11/VL-PRM300K-V1-train
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
# Model Summary
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
- **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pnsncs80
|
| 15 |
- **Repository:** https://github.com/theogbrand/vlprm
|
|
@@ -28,23 +29,23 @@ The model usage is documented [here](https://github.com/theogbrand/vlprm/blob/ma
|
|
| 28 |
| o3 | 82.9 | 84.1 | 62.3 | 86.8 | -- | -- |
|
| 29 |
### Qwen-2.5-VL Family
|
| 30 |
| Model | MMMU | PuzzleVQA | AlgoPuzzleVQA | MathVista | MathVision | Overall |
|
| 31 |
-
|-------|------|-----------|---------------|-----------|------------|---------|
|
| 32 |
-
| **Qwen-2.5-VL-3B** | 51.7 | 34.5 | 25.7 | 60.0 | 21.2 | 38.6 |
|
| 33 |
-
| + VL-PRM-7B | 53.7 (+2.0) | 44.9 (+10.5) | 28.3 (+2.6) | 64.1 (+4.1) | 21.8 (+0.6) | 42.6 (+4.0) |
|
| 34 |
-
| **Qwen-2.5-VL-7B** | 55.0 | 48.0 | 29.1 | 67.8 | 24.2 | 44.8 |
|
| 35 |
-
| + VL-PRM-3B | 57.6 (+2.6) | 55.5 (+7.5) | 33.8 (+4.7) | 70.0 (+2.2) | 26.1 (+1.9) | 48.6 (+3.6) |
|
| 36 |
-
| + VL-PRM-7B | 57.4 (+2.4) | 54.8 (+6.8) | 35.3 (+6.2) | 71.0 (+3.2) | 26.2 (+2.0) | 48.9 (+4.1) |
|
| 37 |
-
| **Qwen-2.5-VL-32B** | 66.0 | 46.2 | 26.9 | 76.9 | 36.7 | 50.5 |
|
| 38 |
-
| + VL-PRM-3B | 67.0 (+1.0) | 67.1 (+20.8) | 41.6 (+14.7) | 77.7 (+0.8) | 40.5 (+3.8) | 58.7 (+8.2) |
|
| 39 |
-
| + VL-PRM-7B | 67.6 (+1.6) | 66.8 (+20.6) | 44.2 (+17.3) | 78.3 (+1.4) | 40.1 (+3.2) | 59.4 (+8.9) |
|
| 40 |
### Gemma-3 Family
|
| 41 |
| Model | MMMU | PuzzleVQA | AlgoPuzzleVQA | MathVista | MathVision | Overall |
|
| 42 |
-
|-------|------|-----------|---------------|-----------|------------|---------|
|
| 43 |
-
| **Gemma-3-12B** | 57.6 | 45.0 | 29.1 | 58.9 | 28.1 | 43.7 |
|
| 44 |
-
| + VL-PRM-3B | 60.4 (+2.8) | 57.7 (+12.7) | 39.7 (+10.6) | 60.3 (+1.4) | 33.8 (+5.7) | 50.4 (+6.7) |
|
| 45 |
-
| + VL-PRM-7B | 60.2 (+2.6) | 59.0 (+12.0) | 41.1 (+4.5) | 63.3 (+4.4) | 33.9 (+5.8) | 51.5 (+7.8) |
|
| 46 |
-
| **Gemma-3-27B** | 62.9 | 50.8 | 29.9 | 61.6 | 32.4 | 47.5 |
|
| 47 |
-
| + VL-PRM-3B | 65.5 (+2.6) | 67.4 (+16.6) | 40.3 (+10.4) | 65.4 (+3.8) | 39.8 (+7.4) | 55.7 (+8.2) |
|
| 48 |
| + VL-PRM-7B | 64.5 (+1.6) | 67.6 (+16.8) | 41.1 (+11.2) | 65.2 (+3.6) | 40.9 (+8.5) | 55.9 (+8.4) |
|
| 49 |
|
| 50 |
### Framework versions
|
|
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen2.5-VL-3B-Instruct
|
|
|
|
|
|
|
|
|
|
| 3 |
datasets:
|
| 4 |
- ob11/VL-PRM300K-V1-train
|
| 5 |
+
library_name: transformers
|
| 6 |
+
model_name: ob11/Qwen-VL-PRM-3B
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: image-text-to-text
|
| 9 |
---
|
| 10 |
|
| 11 |
# Model Summary
|
| 12 |
|
| 13 |
+
Qwen-VL-PRM-3B is a process reward model fine-tuned from Qwen2.5-3B-Instruct on approximately 300,000 examples. It provides step-level supervision to improve the reliability of reasoning in large language models. The model introduces a hybrid data synthesis framework that combines MCTS with judgments from a strong VLM, and proposes perception-focused supervision. It systematically evaluates diverse strategies for dataset construction, training, and test-time scaling. This model demonstrates strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks including MMMU, PuzzleVQA, AlgoPuzzleVQA, MathVista, and MathVision, when used with Qwen2.5-VL and Gemma-3 models, despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
|
| 14 |
|
| 15 |
- **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pnsncs80
|
| 16 |
- **Repository:** https://github.com/theogbrand/vlprm
|
|
|
|
| 29 |
| o3 | 82.9 | 84.1 | 62.3 | 86.8 | -- | -- |
|
| 30 |
### Qwen-2.5-VL Family
|
| 31 |
| Model | MMMU | PuzzleVQA | AlgoPuzzleVQA | MathVista | MathVision | Overall |
|
| 32 |
+
|-------|------|-----------|---------------|-----------|------------|---------|\
|
| 33 |
+
| **Qwen-2.5-VL-3B** | 51.7 | 34.5 | 25.7 | 60.0 | 21.2 | 38.6 |\
|
| 34 |
+
| + VL-PRM-7B | 53.7 (+2.0) | 44.9 (+10.5) | 28.3 (+2.6) | 64.1 (+4.1) | 21.8 (+0.6) | 42.6 (+4.0) |\
|
| 35 |
+
| **Qwen-2.5-VL-7B** | 55.0 | 48.0 | 29.1 | 67.8 | 24.2 | 44.8 |\
|
| 36 |
+
| + VL-PRM-3B | 57.6 (+2.6) | 55.5 (+7.5) | 33.8 (+4.7) | 70.0 (+2.2) | 26.1 (+1.9) | 48.6 (+3.6) |\
|
| 37 |
+
| + VL-PRM-7B | 57.4 (+2.4) | 54.8 (+6.8) | 35.3 (+6.2) | 71.0 (+3.2) | 26.2 (+2.0) | 48.9 (+4.1) |\
|
| 38 |
+
| **Qwen-2.5-VL-32B** | 66.0 | 46.2 | 26.9 | 76.9 | 36.7 | 50.5 |\
|
| 39 |
+
| + VL-PRM-3B | 67.0 (+1.0) | 67.1 (+20.8) | 41.6 (+14.7) | 77.7 (+0.8) | 40.5 (+3.8) | 58.7 (+8.2) |\
|
| 40 |
+
| + VL-PRM-7B | 67.6 (+1.6) | 66.8 (+20.6) | 44.2 (+17.3) | 78.3 (+1.4) | 40.1 (+3.2) | 59.4 (+8.9) |\
|
| 41 |
### Gemma-3 Family
|
| 42 |
| Model | MMMU | PuzzleVQA | AlgoPuzzleVQA | MathVista | MathVision | Overall |
|
| 43 |
+
|-------|------|-----------|---------------|-----------|------------|---------|\
|
| 44 |
+
| **Gemma-3-12B** | 57.6 | 45.0 | 29.1 | 58.9 | 28.1 | 43.7 |\
|
| 45 |
+
| + VL-PRM-3B | 60.4 (+2.8) | 57.7 (+12.7) | 39.7 (+10.6) | 60.3 (+1.4) | 33.8 (+5.7) | 50.4 (+6.7) |\
|
| 46 |
+
| + VL-PRM-7B | 60.2 (+2.6) | 59.0 (+12.0) | 41.1 (+4.5) | 63.3 (+4.4) | 33.9 (+5.8) | 51.5 (+7.8) |\
|
| 47 |
+
| **Gemma-3-27B** | 62.9 | 50.8 | 29.9 | 61.6 | 32.4 | 47.5 |\
|
| 48 |
+
| + VL-PRM-3B | 65.5 (+2.6) | 67.4 (+16.6) | 40.3 (+10.4) | 65.4 (+3.8) | 39.8 (+7.4) | 55.7 (+8.2) |\
|
| 49 |
| + VL-PRM-7B | 64.5 (+1.6) | 67.6 (+16.8) | 41.1 (+11.2) | 65.2 (+3.6) | 40.9 (+8.5) | 55.9 (+8.4) |
|
| 50 |
|
| 51 |
### Framework versions
|