On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 4 days ago • 168
Macaron-A2UI: A Model for Generative UI in Personal Agents Paper • 2605.24830 • Published 12 days ago • 80
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published 23 days ago • 219
Position: LLM Inference Should Be Evaluated as Energy-to-Token Production Paper • 2605.11733 • Published 24 days ago • 3
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published Dec 18, 2025 • 76
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 128
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs Paper • 2511.16664 • Published Nov 20, 2025 • 30
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20, 2025 • 56
Reasoning Language Model Inference Serving Unveiled: An Empirical Study Paper • 2510.18672 • Published Oct 21, 2025 • 7
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference Paper • 2510.19669 • Published Oct 22, 2025 • 1
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published Oct 13, 2025 • 26
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression Paper • 2505.19433 • Published May 26, 2025 • 5
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 98
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Paper • 2502.17535 • Published Feb 24, 2025 • 8
Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research Paper • 2502.12669 • Published Feb 18, 2025 • 2
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing Paper • 2502.04411 • Published Feb 6, 2025 • 4
Can LLMs Maintain Fundamental Abilities under KV Cache Compression? Paper • 2502.01941 • Published Feb 4, 2025 • 15
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Paper • 2502.00299 • Published Feb 1, 2025 • 3
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22, 2025 • 92