gemini-3.5-flash / .eval_results /redlinebench.yaml
juhipandit's picture
Update .eval_results/redlinebench.yaml
d1e32b9 verified
Raw
History Blame Contribute Delete
349 Bytes
- dataset:
id: crosbylegal/RedlineBench
task_id: redline_overall
value: 45.1
date: "2026-06-17"
source:
url: https://intelligence.crosby.ai/benchmark/
name: RedlineBench report
user: crosbylegal
notes: "agent=opencode; 3-LLM judge panel (majority vote); turn-weighted weighted pass rate (0-100); published report figure"