---
license: mit
base_model:
- CodeGoat24/UnifiedReward-Think-qwen3vl-32b
datasets:
- CodeGoat24/UnifiedReward-Flex-SFT-90K
---

# Model Summary
**UnifiedReward-Flex-qwen3vl-32b** is a **unified personalized reward model for vision generation** that couples reward modeling with flexible and context-adaptive reasoning!!

[2026/05/31] 🔥🔥 We **updated the model weights** and enhanced the training data to mitigate the **position bias** issue!!

🚀 The inference code is available at [Github](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Flex).


For further details, please refer to the following resources:
- 📰 Paper: https://arxiv.org/abs/2602.02380
- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/flex
- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-flex
- 🤗 Dataset: https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K
- 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)


## Citation

```bibtex
@article{unifiedreward-flex,
  title={Unified Personalized Reward Model for Vision Generation},
  author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2602.02380},
  year={2026}
}
```