gpt-5.5 / .eval_results /redlinebench.yaml
sramjee's picture
RedlineBench results card
dc5ff07 verified
Raw
History Blame Contribute Delete
352 Bytes
- dataset:
id: crosbylegal/RedlineBench
task_id: redline_overall
value: 50.5
date: "2026-06-17"
source:
url: https://intelligence.crosby.ai/benchmark/
name: RedlineBench report
user: crosbylegal
notes: "agent=claude-code; 3-LLM judge panel (majority vote); turn-weighted weighted pass rate (0-100); published report figure"