--- base_model: - CEIA-RL/energyv2-dpo-offline --- | | model_name | final_score | task_coverage@1 | relative_quality@1 | hallucination@1 | |---:|:------------------------------------------------|--------------:|------------------:|---------------------:|------------------:| | 0 | qwen3-4b-dw-lr-dpo-offline-energy-GRPO_step_200 | 1.7964 | 0.970873 | 0.883589 | 0.0580614 | | 1 | qwen3-4b-dw-lr-GRPO-mix-preference_step_100 | 1.77586 | 0.974856 | 0.875816 | 0.0748081 | | 9 | energyv2-dpo-offline-GRPO_step_100 | 1.75576 | 0.957438 | 0.865499 | 0.0671785 | | 4 | Qwen3-4B | 1.73133 | 0.979511 | 0.872361 | 0.120537 | | 10 | energyv2-dpo-offline-GRPO_step_180 | 1.66418 | 0.930302 | 0.826008 | 0.0921305 | | 12 | energyv2-dpo-offline-GRPO_step_180_no_think | 1.58354 | 0.946665 | 0.800216 | 0.16334 | | 11 | energyv2-dpo-offline-GRPO_step_100_no_think | 1.5715 | 0.939347 | 0.79333 | 0.16118 | | 3 | qwen3-4b-dw-lr | 1.52447 | 0.944386 | 0.782582 | 0.202495 | | 2 | qwen3-4b-dw-lr-dpo-offline | 1.29638 | 0.785869 | 0.660269 | 0.14976 | | 7 | energyv2-dpo-offline_think_off_ | 0.940547 | 0.655758 | 0.515067 | 0.230278 | | 8 | enregy-gpt-regulatorio-v2_think_off_ | 0.929175 | 0.826536 | 0.538196 | 0.435557 | | 6 | energyv2-dpo-offline | -0.0952015 | 0.0823417 | 0.0571017 | 0.234645 | | 5 | enregy-gpt-regulatorio-v2 | -0.303599 | 0.303887 | 0.149664 | 0.75715 | ![img3](energy_v2_model_comparison_passk_only_2.png) ![img1](energy_v2_model_comparison_majk.png) ![img2](energy_v2_model_comparison_passk_only.png)