CEIA-RL
/

energyv2-dpo-offline-GRPO

Model card Files Files and versions

	model_name	final_score	task_coverage@1	relative_quality@1	hallucination@1
0	qwen3-4b-dw-lr-dpo-offline-energy-GRPO_step_200	1.7964	0.970873	0.883589	0.0580614
1	qwen3-4b-dw-lr-GRPO-mix-preference_step_100	1.77586	0.974856	0.875816	0.0748081
9	energyv2-dpo-offline-GRPO_step_100	1.75576	0.957438	0.865499	0.0671785
4	Qwen3-4B	1.73133	0.979511	0.872361	0.120537
10	energyv2-dpo-offline-GRPO_step_180	1.66418	0.930302	0.826008	0.0921305
12	energyv2-dpo-offline-GRPO_step_180_no_think	1.58354	0.946665	0.800216	0.16334
11	energyv2-dpo-offline-GRPO_step_100_no_think	1.5715	0.939347	0.79333	0.16118
3	qwen3-4b-dw-lr	1.52447	0.944386	0.782582	0.202495
2	qwen3-4b-dw-lr-dpo-offline	1.29638	0.785869	0.660269	0.14976
7	energyv2-dpo-offline_think_off_	0.940547	0.655758	0.515067	0.230278
8	enregy-gpt-regulatorio-v2_think_off_	0.929175	0.826536	0.538196	0.435557
6	energyv2-dpo-offline	-0.0952015	0.0823417	0.0571017	0.234645
5	enregy-gpt-regulatorio-v2	-0.303599	0.303887	0.149664	0.75715

Downloads last month: 42

Safetensors

Model size

4B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CEIA-RL/energyv2-dpo-offline-GRPO

Base model

cemig-nlp-releases/enregy-gpt-regulatorio-v2

Finetuned

CEIA-RL/energyv2-dpo-offline

Finetuned

(1)

this model