DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 27 days ago • 204
LumosJiang/Qwen3-8B-Base-SFT-AM-Thinking-v1-Distilled-Code-600steps Text Generation • 8B • Updated Apr 22 • 6
LumosJiang/Qwen3-8B-Base-SFT-AM-Thinking-v1-Distilled-Code-600steps Text Generation • 8B • Updated Apr 22 • 6
LumosJiang/Qwen3-8B-Base-SFT-AM-Thinking-v1-Distilled-Code-1800steps Text Generation • 8B • Updated Apr 22 • 6
LumosJiang/Qwen3-8B-Base-SFT-AM-Thinking-v1-Distilled-Code-1800steps Text Generation • 8B • Updated Apr 22 • 6
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 365
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 95
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 95
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining Paper • 2602.07085 • Published Feb 6 • 190
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining Paper • 2602.07085 • Published Feb 6 • 190