**Unison-Judge** is a fine-tuned **Qwen3-VL-8B** vision-language model
that serves as the local automatic judge for the Unison benchmark. It scores UMMs' outputs across
all four unified tasks (IC, UGG, GGU and ME) without requiring a hosted API.

Code: [github.com/FudanCVL/Unison](https://github.com/FudanCVL/Unison) — if you use this model or find it helpful, please give it a star!

## Judge Consistency Data

The `Judge_Consistency/` directory contains **231 evaluation cases** used to assess the scoring consistency of Unison-Judge across all four tasks.

| Field | Description |
|---|---|
| `id` | Item identifier |
| `task` | One of `IC`, `UGG`, `GGU`, `ME` |
| `family` | Question type |
| `model` | The UMM whose output is being evaluated |
| `questions` | List of sub-questions, each with the model's answer and judge-assigned score |
| `images` | Reference image(s) and the model-generated image |

**Task distribution:** IC (57), ME (62), GGU (56), UGG (56)  
**Models covered:** BAGEL-7B-MoT, OmniGen2, SEED-X-17B, UniWorld-V1