Buckets:

JinyuLiu
/

Unison-Judge

3 days ago

903 Bytes

	Unison-Judge is a fine-tuned Qwen3-VL-8B vision-language model
	that serves as the local automatic judge for the Unison benchmark. It scores UMMs' outputs across
	all four unified tasks (IC, UGG, GGU and ME) without requiring a hosted API.


	## Judge Consistency Data

	The `Judge_Consistency/` directory contains 231 evaluation cases used to assess the scoring consistency of Unison-Judge across all four tasks.

	\| Field \| Description \|
	\|---\|---\|
	\| `id` \| Item identifier \|
	\| `task` \| One of `IC`, `UGG`, `GGU`, `ME` \|
	\| `family` \| Question type \|
	\| `model` \| The UMM whose output is being evaluated \|
	\| `questions` \| List of sub-questions, each with the model's answer and judge-assigned score \|
	\| `images` \| Reference image(s) and the model-generated image \|

	Task distribution: IC (57), ME (62), GGU (56), UGG (56)
	Models covered: BAGEL-7B-MoT, OmniGen2, SEED-X-17B, UniWorld-V1

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.

	Unison-Judge is a fine-tuned Qwen3-VL-8B vision-language model
	that serves as the local automatic judge for the Unison benchmark. It scores UMMs' outputs across
	all four unified tasks (IC, UGG, GGU and ME) without requiring a hosted API.


	## Judge Consistency Data

	The `Judge_Consistency/` directory contains 231 evaluation cases used to assess the scoring consistency of Unison-Judge across all four tasks.

	\| Field \| Description \|
	\|---\|---\|
	\| `id` \| Item identifier \|
	\| `task` \| One of `IC`, `UGG`, `GGU`, `ME` \|
	\| `family` \| Question type \|
	\| `model` \| The UMM whose output is being evaluated \|
	\| `questions` \| List of sub-questions, each with the model's answer and judge-assigned score \|
	\| `images` \| Reference image(s) and the model-generated image \|

	Task distribution: IC (57), ME (62), GGU (56), UGG (56)
	Models covered: BAGEL-7B-MoT, OmniGen2, SEED-X-17B, UniWorld-V1