**Unison-Judge** is a fine-tuned **Qwen3-VL-8B** vision-language model that serves as the local automatic judge for the Unison benchmark. It scores UMMs' outputs across all four unified tasks (IC, UGG, GGU and ME) without requiring a hosted API. Code: [github.com/FudanCVL/Unison](https://github.com/FudanCVL/Unison) — if you use this model or find it helpful, please give it a star! ## Judge Consistency Data The `Judge_Consistency/` directory contains **231 evaluation cases** used to assess the scoring consistency of Unison-Judge across all four tasks. | Field | Description | |---|---| | `id` | Item identifier | | `task` | One of `IC`, `UGG`, `GGU`, `ME` | | `family` | Question type | | `model` | The UMM whose output is being evaluated | | `questions` | List of sub-questions, each with the model's answer and judge-assigned score | | `images` | Reference image(s) and the model-generated image | **Task distribution:** IC (57), ME (62), GGU (56), UGG (56) **Models covered:** BAGEL-7B-MoT, OmniGen2, SEED-X-17B, UniWorld-V1