NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 32 items • Updated 12 days ago • 62
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 12 days ago • 163
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 233