Vishal's picture

Vishal

mvish7

AI & ML interests

Multi-modal AI and Computer Vision

Organizations

None yet

New activity in ktian6/NuScenes-SpatialQA about 1 year ago

Open sourcing the dataset

#1 opened about 1 year ago by

commented 3 papers about 1 year ago

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published Feb 3, 2025 • 39 •

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published Feb 3, 2025 • 39 •

ParGo: Bridging Vision-Language with Partial and Global Views

Paper • 2408.12928 • Published Aug 23, 2024 •