cais/hle
Benchmark β’ Updated β’ 2.5k β’ 40.1k β’ 814
Fully LLM-free symbolic solver for Humanity's Last Exam (HLE) β no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source.
| Split | Score | Method |
|---|---|---|
| Full 2500 questions | 115/2500 = 4.6% | atom_cross + knowledge_match + cross_decompose |
Verantyx solves HLE through structural decomposition:
Question β Fact Atomizer β Wikipedia Fetch β Atom Cross Solver
β
Choice Scoring (supports/contradicts)
β
Best Choice or Keyword Fallback
| Component | Fires | Description |
|---|---|---|
| cross_decompose | 122 | Per-choice decomposition + Wikipedia cross-match |
| knowledge_match | 18 | Direct atom-based knowledge matching |
| atom_cross | fallback | Normalized atom scoring with Wikipedia overlap |
| Version | Score | Method |
|---|---|---|
| v1 (with LLM) | 2.68% | mcq_direct (Qwen 7B) + cross_decompose |
| v2 (LLM-free partial) | 1.22% | Early LLM removal, limited coverage |
| v4 (LLM-free full) | 4.6% | atom_cross + MCQε ¨εεη + normalized scoring |
Total: 2500 questions
Correct: 115 (4.6%)
Time: 98 minutes (4 parallel workers)
Wiki hits: 2298
Knowledge match: 18
Cross decompose: 122 fired