--- license: mit base_model: - CodeGoat24/UnifiedReward-Think-qwen3vl-32b datasets: - CodeGoat24/UnifiedReward-Flex-SFT-90K --- # Model Summary **UnifiedReward-Flex-qwen3vl-32b** is a **unified personalized reward model for vision generation** that couples reward modeling with flexible and context-adaptive reasoning!! [2026/05/31] 🔥🔥 We **updated the model weights** and enhanced the training data to mitigate the **position bias** issue!! 🚀 The inference code is available at [Github](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Flex). For further details, please refer to the following resources: - 📰 Paper: https://arxiv.org/abs/2602.02380 - 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/flex - 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-flex - 🤗 Dataset: https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K - 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io) ## Citation ```bibtex @article{unifiedreward-flex, title={Unified Personalized Reward Model for Vision Generation}, author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi}, journal={arXiv preprint arXiv:2602.02380}, year={2026} } ```