v51: composite=88.68 β see model card for benchmark deltas vs v45 e8ff65e verified juanquivilla commited on May 6
v45: SFT+chained GRPO with ITN β 95.9% number accuracy, 97.0% filler-free, deletion behavior matches v36 81578ee verified juanquivilla commited on May 4
v36: full-FT GRPO with substantive-deletion-aware reward β filler-free 96.9%, sub-del-15-long 0.64% 7278227 verified juanquivilla commited on May 3
Full FT: ROUGE-L 0.907 β new record, +1.6 over prompted 2B 1e7e04f verified juanquivilla commited on Apr 1