model_name final_score task_coverage@1 relative_quality@1 hallucination@1
0 qwen3-4b-dw-lr-dpo-offline-energy-GRPO_step_200 1.7964 0.970873 0.883589 0.0580614
1 qwen3-4b-dw-lr-GRPO-mix-preference_step_100 1.77586 0.974856 0.875816 0.0748081
9 energyv2-dpo-offline-GRPO_step_100 1.75576 0.957438 0.865499 0.0671785
4 Qwen3-4B 1.73133 0.979511 0.872361 0.120537
10 energyv2-dpo-offline-GRPO_step_180 1.66418 0.930302 0.826008 0.0921305
12 energyv2-dpo-offline-GRPO_step_180_no_think 1.58354 0.946665 0.800216 0.16334
11 energyv2-dpo-offline-GRPO_step_100_no_think 1.5715 0.939347 0.79333 0.16118
3 qwen3-4b-dw-lr 1.52447 0.944386 0.782582 0.202495
2 qwen3-4b-dw-lr-dpo-offline 1.29638 0.785869 0.660269 0.14976
7 energyv2-dpo-offline_think_off_ 0.940547 0.655758 0.515067 0.230278
8 enregy-gpt-regulatorio-v2_think_off_ 0.929175 0.826536 0.538196 0.435557
6 energyv2-dpo-offline -0.0952015 0.0823417 0.0571017 0.234645
5 enregy-gpt-regulatorio-v2 -0.303599 0.303887 0.149664 0.75715

img3

img1

img2

Downloads last month
42
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CEIA-RL/energyv2-dpo-offline-GRPO