docs: add training baseline, reward, and loss curve plots for multiple experimental runs 28abef0 adityss commited on Apr 26