-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 91 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 35
Collections
Discover the best community collections!
Collections including paper arxiv:2407.05131
-
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 140 -
Human-like Episodic Memory for Infinite Context LLMs
Paper • 2407.09450 • Published • 62 -
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Paper • 2407.05131 • Published • 26 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81
-
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Paper • 2404.19752 • Published • 24 -
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Paper • 2404.16821 • Published • 60 -
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper • 2403.07508 • Published • 78 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 130
-
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Paper • 2503.13939 • Published • 5 -
Med-Flamingo: a Multimodal Medical Few-shot Learner
Paper • 2307.15189 • Published • 24 -
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Paper • 2406.06573 • Published • 11 -
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Paper • 2410.21969 • Published • 10
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 91 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 35
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 91 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 35
-
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Paper • 2503.13939 • Published • 5 -
Med-Flamingo: a Multimodal Medical Few-shot Learner
Paper • 2307.15189 • Published • 24 -
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Paper • 2406.06573 • Published • 11 -
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Paper • 2410.21969 • Published • 10
-
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 140 -
Human-like Episodic Memory for Infinite Context LLMs
Paper • 2407.09450 • Published • 62 -
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Paper • 2407.05131 • Published • 26 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 91 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 35
-
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Paper • 2404.19752 • Published • 24 -
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Paper • 2404.16821 • Published • 60 -
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper • 2403.07508 • Published • 78 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 130