Buckets:
Unison-Judge is a fine-tuned Qwen3-VL-8B vision-language model that serves as the local automatic judge for the Unison benchmark. It scores UMMs' outputs across all four unified tasks (IC, UGG, GGU and ME) without requiring a hosted API.
Judge Consistency Data
The Judge_Consistency/ directory contains 231 evaluation cases used to assess the scoring consistency of Unison-Judge across all four tasks.
| Field | Description |
|---|---|
id |
Item identifier |
task |
One of IC, UGG, GGU, ME |
family |
Question type |
model |
The UMM whose output is being evaluated |
questions |
List of sub-questions, each with the model's answer and judge-assigned score |
images |
Reference image(s) and the model-generated image |
Task distribution: IC (57), ME (62), GGU (56), UGG (56)
Models covered: BAGEL-7B-MoT, OmniGen2, SEED-X-17B, UniWorld-V1
Xet Storage Details
- Size:
- 903 Bytes
- Xet hash:
- 9ea530f3e700b5b3850738cb7bc91983d1b086e78a0afd838e18afad3c0f6ad4
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.