Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.05131

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12, 2024 • 140
Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 62
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Paper • 2407.05131 • Published Jul 6, 2024 • 26
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 81

image llm works

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30, 2024 • 24
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 60
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 78
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 130

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Paper • 2503.13939 • Published Mar 18, 2025 • 5
Med-Flamingo: a Multimodal Medical Few-shot Learner

Paper • 2307.15189 • Published Jul 27, 2023 • 24
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Paper • 2406.06573 • Published Jun 3, 2024 • 11
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Paper • 2410.21969 • Published Oct 29, 2024 • 10

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Paper • 2503.13939 • Published Mar 18, 2025 • 5
Med-Flamingo: a Multimodal Medical Few-shot Learner

Paper • 2307.15189 • Published Jul 27, 2023 • 24
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Paper • 2406.06573 • Published Jun 3, 2024 • 11
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Paper • 2410.21969 • Published Oct 29, 2024 • 10

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12, 2024 • 140
Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 62
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Paper • 2407.05131 • Published Jul 6, 2024 • 26
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 81

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

image llm works

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30, 2024 • 24
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 60
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 78
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 130

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs