Add community evaluation results for AIME_2026, GPQA, HLE, HMMT_FEB_2026, MMLU-PRO, SWE-BENCH_PRO, SWE-BENCH_VERIFIED, TERMINAL-BENCH-2.0

#2
by nielsr HF Staff - opened

This PR adds community-provided evaluation results for the following benchmarks:

These results were extracted from the model card. This is based on the new evaluation results feature.

Note: This is an automated PR. Please review the evaluation results before merging.

YAML Metadata Error:Invalid content in Eval Result file .eval_results/hle.yaml
Check out the documentation for more information.

@nielsr can you please update your script to ensure it only emits valid evaluations.

Love the huggingface leaderboards!

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment