Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.19235

gen image video

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Paper • 2603.17051 • Published Mar 17 • 109
Versatile Editing of Video Content, Actions, and Dynamics without Training

Paper • 2603.17989 • Published Mar 18 • 18
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model

Paper • 2603.18524 • Published Mar 19 • 58

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Paper • 2602.10560 • Published Feb 11 • 31
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design

Paper • 2602.08253 • Published Feb 9 • 27
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression

Paper • 2602.11008 • Published Feb 11 • 18

Image-Video MultiModal Understanding

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 148
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2, 2025 • 5
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14, 2025 • 34

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Paper • 2603.18002 • Published Mar 18 • 14
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

gen image video

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Paper • 2603.17051 • Published Mar 17 • 109
Versatile Editing of Video Content, Actions, and Dynamics without Training

Paper • 2603.17989 • Published Mar 18 • 18
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model

Paper • 2603.18524 • Published Mar 19 • 58

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Paper • 2602.10560 • Published Feb 11 • 31
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design

Paper • 2602.08253 • Published Feb 9 • 27
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression

Paper • 2602.11008 • Published Feb 11 • 18

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Paper • 2603.18002 • Published Mar 18 • 14
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

Image-Video MultiModal Understanding

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 148
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2, 2025 • 5
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14, 2025 • 34

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs