Collections
Discover the best community collections!
Collections including paper arxiv:2605.26102
-
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
Paper • 2601.10124 • Published • 4 -
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Paper • 2601.10477 • Published • 155 -
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation
Paper • 2601.10880 • Published • 15 -
SAMTok: Representing Any Mask with Two Words
Paper • 2601.16093 • Published • 44
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 104 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 91 -
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Paper • 2405.19707 • Published • 9 -
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Paper • 2410.08049 • Published • 8
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 80 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 73 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 37
-
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
Paper • 2601.10124 • Published • 4 -
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Paper • 2601.10477 • Published • 155 -
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation
Paper • 2601.10880 • Published • 15 -
SAMTok: Representing Any Mask with Two Words
Paper • 2601.16093 • Published • 44
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 80 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 73 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 37
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 104 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 91 -
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Paper • 2405.19707 • Published • 9 -
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Paper • 2410.08049 • Published • 8