Commit History

soup: corrected README with full prod metrics + soup explanation

6df6f01
verified

juanquivilla commited on May 7

soup_30: composite=89.45 — see model card for benchmark deltas vs v45

9f2abdb
verified

juanquivilla commited on May 7

v55: corrected metrics with max_new=900 (sub_del15 3.7%, composite 89.32)

4424504
verified

juanquivilla commited on May 6

v55: add inference recommendations (rep_pen=1.05, max_new_tokens guidance)

7fbef21
verified

juanquivilla commited on May 6

v55: composite=88.95 — see model card for benchmark deltas vs v45

d5ba955
verified

juanquivilla commited on May 6

v51: composite=88.68 — see model card for benchmark deltas vs v45

e8ff65e
verified

juanquivilla commited on May 6

v45: SFT+chained GRPO with ITN — 95.9% number accuracy, 97.0% filler-free, deletion behavior matches v36

81578ee
verified

juanquivilla commited on May 4

v36: full-FT GRPO with substantive-deletion-aware reward — filler-free 96.9%, sub-del-15-long 0.64%

7278227
verified

juanquivilla commited on May 3

v23 R6 (4-stage SFT->GRPO->Stage2->GRPO): ROUGE-L 0.9537 (tied v22), Filler-Free 91.1% (beats v22 90.3%), paragraph rate 91.5% — definitive v23 model

15c2adb
verified

juanquivilla commited on Apr 12

v23 R5 (paragraph rows excluded from GRPO): ROUGE-L 0.9505, Filler-Free 91.0%, paragraph rate 89.5% — best v23 variant overall

ae896a9
verified

juanquivilla commited on Apr 12

v23 R4 (LR 5e-6): Filler-Free 91.0% (beats v22 90.3%), paragraph rate 89%, ROUGE-L 0.9499

56ac430
verified

juanquivilla commited on Apr 12

v23+paragraphs: ROUGE-L 0.9506, Filler-Free 90.2%, paragraph rate 91.5% (0% in v22)

8d24c18
verified

juanquivilla commited on Apr 12

v22+GRPO-r32: ROUGE-L 0.954 val, 66% exact, 91% filler-free — new production model

4111016
verified

juanquivilla commited on Apr 3

v22+GRPO: ROUGE-L 0.953 val set, 91% filler-free — GRPO works on proper benchmark

5f57a19
verified

juanquivilla commited on Apr 3

v22-lr3: ROUGE-L 0.948 on val set — cleaned data + LR 3e-5

607327b
verified

juanquivilla commited on Apr 3

v18: ROUGE-L 0.968, 72% exact — AdamW beta2=0.95 breakthrough

69428ac
verified

juanquivilla commited on Apr 2

v17: ROUGE-L 0.966, 71% exact — preserve phrases + redundant tail removal

257897f
verified

juanquivilla commited on Apr 2

v16: ROUGE-L 0.962, preserve-phrase data (crutch_words 0.987)

00c5807
verified

juanquivilla commited on Apr 2

Update model card: v15 detailed card with examples, benchmarks, and links

b64dd3c
verified

juanquivilla commited on Apr 2

v15: ROUGE-L 0.960, 70% exact match — LR 2.5e-5 breakthrough

27c16fc
verified

juanquivilla commited on Apr 2

v7: combined dataset LR 2e-5, ROUGE-L 0.943, 62% exact

1462723
verified

juanquivilla commited on Apr 1

v5: LR 2e-5 + Stage-2, ROUGE-L 0.942, 60% exact — new all-time record

4dd89e6
verified

juanquivilla commited on Apr 1

v4: 116K data + long transcripts, ROUGE-L 0.927, handles 1000+ word inputs

23ff3b5
verified

juanquivilla commited on Apr 1

Upload README.md with huggingface_hub

1c9bfcc
verified

juanquivilla commited on Apr 1

Stage-2: ROUGE-L 0.931, 90% zero-filler — new best

9154aef
verified

juanquivilla commited on Apr 1

Full FT v2: ROUGE-L 0.930, 55% exact — best model

1811cf8
verified

juanquivilla commited on Apr 1

Full FT + GRPO: ROUGE-L 0.916 — new record, +2.5 over 2B

840b071
verified

juanquivilla commited on Apr 1

Full FT: ROUGE-L 0.907 — new record, +1.6 over prompted 2B

1e7e04f
verified

juanquivilla commited on Apr 1

GRPO R2: ROUGE-L 0.892 — exceeds prompted 2B

3ac0999
verified

juanquivilla commited on Apr 1

GRPO model: ROUGE-L 0.891 — matches prompted 2B

4ab7708
verified

juanquivilla commited on Apr 1

Upload folder using huggingface_hub

88c1f58
verified

juanquivilla commited on Apr 1

Upload README.md with huggingface_hub

eda4f6c
verified

juanquivilla commited on Apr 1

initial commit

83e83fb
verified

juanquivilla commited on Apr 1

Commit History

soup: corrected README with full prod metrics + soup explanation 6df6f01 verified

soup_30: composite=89.45 — see model card for benchmark deltas vs v45 9f2abdb verified

v55: corrected metrics with max_new=900 (sub_del15 3.7%, composite 89.32) 4424504 verified

v55: add inference recommendations (rep_pen=1.05, max_new_tokens guidance) 7fbef21 verified

v55: composite=88.95 — see model card for benchmark deltas vs v45 d5ba955 verified

v51: composite=88.68 — see model card for benchmark deltas vs v45 e8ff65e verified

v45: SFT+chained GRPO with ITN — 95.9% number accuracy, 97.0% filler-free, deletion behavior matches v36 81578ee verified

v36: full-FT GRPO with substantive-deletion-aware reward — filler-free 96.9%, sub-del-15-long 0.64% 7278227 verified

v23 R6 (4-stage SFT->GRPO->Stage2->GRPO): ROUGE-L 0.9537 (tied v22), Filler-Free 91.1% (beats v22 90.3%), paragraph rate 91.5% — definitive v23 model 15c2adb verified

v23 R5 (paragraph rows excluded from GRPO): ROUGE-L 0.9505, Filler-Free 91.0%, paragraph rate 89.5% — best v23 variant overall ae896a9 verified

v23 R4 (LR 5e-6): Filler-Free 91.0% (beats v22 90.3%), paragraph rate 89%, ROUGE-L 0.9499 56ac430 verified

v23+paragraphs: ROUGE-L 0.9506, Filler-Free 90.2%, paragraph rate 91.5% (0% in v22) 8d24c18 verified

v22+GRPO-r32: ROUGE-L 0.954 val, 66% exact, 91% filler-free — new production model 4111016 verified

v22+GRPO: ROUGE-L 0.953 val set, 91% filler-free — GRPO works on proper benchmark 5f57a19 verified

v22-lr3: ROUGE-L 0.948 on val set — cleaned data + LR 3e-5 607327b verified

v18: ROUGE-L 0.968, 72% exact — AdamW beta2=0.95 breakthrough 69428ac verified

v17: ROUGE-L 0.966, 71% exact — preserve phrases + redundant tail removal 257897f verified

v16: ROUGE-L 0.962, preserve-phrase data (crutch_words 0.987) 00c5807 verified

Update model card: v15 detailed card with examples, benchmarks, and links b64dd3c verified

v15: ROUGE-L 0.960, 70% exact match — LR 2.5e-5 breakthrough 27c16fc verified

v7: combined dataset LR 2e-5, ROUGE-L 0.943, 62% exact 1462723 verified

v5: LR 2e-5 + Stage-2, ROUGE-L 0.942, 60% exact — new all-time record 4dd89e6 verified

v4: 116K data + long transcripts, ROUGE-L 0.927, handles 1000+ word inputs 23ff3b5 verified

Upload README.md with huggingface_hub 1c9bfcc verified

Stage-2: ROUGE-L 0.931, 90% zero-filler — new best 9154aef verified

Full FT v2: ROUGE-L 0.930, 55% exact — best model 1811cf8 verified

Full FT + GRPO: ROUGE-L 0.916 — new record, +2.5 over 2B 840b071 verified

Full FT: ROUGE-L 0.907 — new record, +1.6 over prompted 2B 1e7e04f verified

GRPO R2: ROUGE-L 0.892 — exceeds prompted 2B 3ac0999 verified

GRPO model: ROUGE-L 0.891 — matches prompted 2B 4ab7708 verified

Upload folder using huggingface_hub 88c1f58 verified

Upload README.md with huggingface_hub eda4f6c verified

initial commit 83e83fb verified

soup: corrected README with full prod metrics + soup explanation

6df6f01
verified

soup_30: composite=89.45 — see model card for benchmark deltas vs v45

9f2abdb
verified

v55: corrected metrics with max_new=900 (sub_del15 3.7%, composite 89.32)

4424504
verified

v55: add inference recommendations (rep_pen=1.05, max_new_tokens guidance)

7fbef21
verified

v55: composite=88.95 — see model card for benchmark deltas vs v45

d5ba955
verified

v51: composite=88.68 — see model card for benchmark deltas vs v45

e8ff65e
verified

v45: SFT+chained GRPO with ITN — 95.9% number accuracy, 97.0% filler-free, deletion behavior matches v36

81578ee
verified

v36: full-FT GRPO with substantive-deletion-aware reward — filler-free 96.9%, sub-del-15-long 0.64%

7278227
verified

v23 R6 (4-stage SFT->GRPO->Stage2->GRPO): ROUGE-L 0.9537 (tied v22), Filler-Free 91.1% (beats v22 90.3%), paragraph rate 91.5% — definitive v23 model

15c2adb
verified

v23 R5 (paragraph rows excluded from GRPO): ROUGE-L 0.9505, Filler-Free 91.0%, paragraph rate 89.5% — best v23 variant overall

ae896a9
verified

v23 R4 (LR 5e-6): Filler-Free 91.0% (beats v22 90.3%), paragraph rate 89%, ROUGE-L 0.9499

56ac430
verified

v23+paragraphs: ROUGE-L 0.9506, Filler-Free 90.2%, paragraph rate 91.5% (0% in v22)

8d24c18
verified

v22+GRPO-r32: ROUGE-L 0.954 val, 66% exact, 91% filler-free — new production model

4111016
verified

v22+GRPO: ROUGE-L 0.953 val set, 91% filler-free — GRPO works on proper benchmark

5f57a19
verified

v22-lr3: ROUGE-L 0.948 on val set — cleaned data + LR 3e-5

607327b
verified

v18: ROUGE-L 0.968, 72% exact — AdamW beta2=0.95 breakthrough

69428ac
verified

v17: ROUGE-L 0.966, 71% exact — preserve phrases + redundant tail removal

257897f
verified

v16: ROUGE-L 0.962, preserve-phrase data (crutch_words 0.987)

00c5807
verified

Update model card: v15 detailed card with examples, benchmarks, and links

b64dd3c
verified

v15: ROUGE-L 0.960, 70% exact match — LR 2.5e-5 breakthrough

27c16fc
verified

v7: combined dataset LR 2e-5, ROUGE-L 0.943, 62% exact

1462723
verified

v5: LR 2e-5 + Stage-2, ROUGE-L 0.942, 60% exact — new all-time record

4dd89e6
verified

v4: 116K data + long transcripts, ROUGE-L 0.927, handles 1000+ word inputs

23ff3b5
verified

Upload README.md with huggingface_hub

1c9bfcc
verified

Stage-2: ROUGE-L 0.931, 90% zero-filler — new best

9154aef
verified

Full FT v2: ROUGE-L 0.930, 55% exact — best model

1811cf8
verified

Full FT + GRPO: ROUGE-L 0.916 — new record, +2.5 over 2B

840b071
verified

Full FT: ROUGE-L 0.907 — new record, +1.6 over prompted 2B

1e7e04f
verified

GRPO R2: ROUGE-L 0.892 — exceeds prompted 2B

3ac0999
verified

GRPO model: ROUGE-L 0.891 — matches prompted 2B

4ab7708
verified

Upload folder using huggingface_hub

88c1f58
verified

Upload README.md with huggingface_hub

eda4f6c
verified

initial commit

83e83fb
verified