Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning Paper • 2606.11683 • Published 6 days ago • 30
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application Paper • 2606.12191 • Published 6 days ago • 62
yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF Text Generation • 12B • Updated about 17 hours ago • 20.2k • 337
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 5 days ago • 131
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Paper • 2606.12344 • Published 6 days ago • 64
N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization Paper • 2606.10768 • Published 7 days ago • 24
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 7 days ago • 20
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning Paper • 2606.12195 • Published 6 days ago • 20
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 7 days ago • 41
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 6 days ago • 110
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 6 days ago • 21
ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations Paper • 2606.11188 • Published 7 days ago • 26
WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark Paper • 2606.06538 • Published 12 days ago • 3
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations Paper • 2606.05563 • Published 12 days ago • 52