soup: corrected README with full prod metrics + soup explanation 6df6f01 verified juanquivilla commited on May 7
soup_30: composite=89.45 β see model card for benchmark deltas vs v45 9f2abdb verified juanquivilla commited on May 7
v55: corrected metrics with max_new=900 (sub_del15 3.7%, composite 89.32) 4424504 verified juanquivilla commited on May 6
v55: add inference recommendations (rep_pen=1.05, max_new_tokens guidance) 7fbef21 verified juanquivilla commited on May 6
v55: composite=88.95 β see model card for benchmark deltas vs v45 d5ba955 verified juanquivilla commited on May 6
v51: composite=88.68 β see model card for benchmark deltas vs v45 e8ff65e verified juanquivilla commited on May 6
v45: SFT+chained GRPO with ITN β 95.9% number accuracy, 97.0% filler-free, deletion behavior matches v36 81578ee verified juanquivilla commited on May 4
v36: full-FT GRPO with substantive-deletion-aware reward β filler-free 96.9%, sub-del-15-long 0.64% 7278227 verified juanquivilla commited on May 3
v23 R6 (4-stage SFT->GRPO->Stage2->GRPO): ROUGE-L 0.9537 (tied v22), Filler-Free 91.1% (beats v22 90.3%), paragraph rate 91.5% β definitive v23 model 15c2adb verified juanquivilla commited on Apr 12
v23 R5 (paragraph rows excluded from GRPO): ROUGE-L 0.9505, Filler-Free 91.0%, paragraph rate 89.5% β best v23 variant overall ae896a9 verified juanquivilla commited on Apr 12
v23 R4 (LR 5e-6): Filler-Free 91.0% (beats v22 90.3%), paragraph rate 89%, ROUGE-L 0.9499 56ac430 verified juanquivilla commited on Apr 12
v23+paragraphs: ROUGE-L 0.9506, Filler-Free 90.2%, paragraph rate 91.5% (0% in v22) 8d24c18 verified juanquivilla commited on Apr 12
v22+GRPO-r32: ROUGE-L 0.954 val, 66% exact, 91% filler-free β new production model 4111016 verified juanquivilla commited on Apr 3
v22+GRPO: ROUGE-L 0.953 val set, 91% filler-free β GRPO works on proper benchmark 5f57a19 verified juanquivilla commited on Apr 3
v22-lr3: ROUGE-L 0.948 on val set β cleaned data + LR 3e-5 607327b verified juanquivilla commited on Apr 3
v18: ROUGE-L 0.968, 72% exact β AdamW beta2=0.95 breakthrough 69428ac verified juanquivilla commited on Apr 2
v17: ROUGE-L 0.966, 71% exact β preserve phrases + redundant tail removal 257897f verified juanquivilla commited on Apr 2
v16: ROUGE-L 0.962, preserve-phrase data (crutch_words 0.987) 00c5807 verified juanquivilla commited on Apr 2
Update model card: v15 detailed card with examples, benchmarks, and links b64dd3c verified juanquivilla commited on Apr 2
v15: ROUGE-L 0.960, 70% exact match β LR 2.5e-5 breakthrough 27c16fc verified juanquivilla commited on Apr 2
v7: combined dataset LR 2e-5, ROUGE-L 0.943, 62% exact 1462723 verified juanquivilla commited on Apr 1
v5: LR 2e-5 + Stage-2, ROUGE-L 0.942, 60% exact β new all-time record 4dd89e6 verified juanquivilla commited on Apr 1
v4: 116K data + long transcripts, ROUGE-L 0.927, handles 1000+ word inputs 23ff3b5 verified juanquivilla commited on Apr 1
Full FT + GRPO: ROUGE-L 0.916 β new record, +2.5 over 2B 840b071 verified juanquivilla commited on Apr 1
Full FT: ROUGE-L 0.907 β new record, +1.6 over prompted 2B 1e7e04f verified juanquivilla commited on Apr 1