-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 31 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 45 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2511.04670
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 116 -
Cambrian-S: Towards Spatial Supersensing in Video
Paper • 2511.04670 • Published • 39 -
MagicWorld: Interactive Geometry-driven Video World Exploration
Paper • 2511.18886 • Published • 19 -
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper • 2512.14614 • Published • 73
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 197
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 31 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 45 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 24
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 197
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 116 -
Cambrian-S: Towards Spatial Supersensing in Video
Paper • 2511.04670 • Published • 39 -
MagicWorld: Interactive Geometry-driven Video World Exploration
Paper • 2511.18886 • Published • 19 -
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper • 2512.14614 • Published • 73