fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement bdc9954 adityss commited on Apr 25
feat: commit training evidence, update README with real scores, add demo scripts 8204dc0 adityss commited on Apr 25