SeaWolf-AI commited on
Commit
82116a3
·
verified ·
1 Parent(s): 3ccc0cd

feat: add .eval_results/gpqa_diamond.yaml for GPQA dataset indexing

Browse files
Files changed (1) hide show
  1. .eval_results/gpqa_diamond.yaml +3 -3
.eval_results/gpqa_diamond.yaml CHANGED
@@ -2,8 +2,8 @@
2
  id: Idavidrein/gpqa
3
  task_id: diamond
4
  value: 88.89
5
- date: "2026-04-25"
6
  source:
7
  url: https://huggingface.co/FINAL-Bench/Darwin-28B-Opus
8
- name: Darwin-28B-Opus Benchmark (3-stage Adaptive Evaluation)
9
- user: vidraft
 
2
  id: Idavidrein/gpqa
3
  task_id: diamond
4
  value: 88.89
5
+ date: '2026-04-27'
6
  source:
7
  url: https://huggingface.co/FINAL-Bench/Darwin-28B-Opus
8
+ name: Model Card
9
+ notes: "Standard inference, Pass@1"