-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 25 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2503.12545
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 17 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 61
-
TOFU: A Task of Fictitious Unlearning for LLMs
Paper • 2401.06121 • Published • 20 -
The Frontier of Data Erasure: Machine Unlearning for Large Language Models
Paper • 2403.15779 • Published • 1 -
Machine Unlearning of Pre-trained Large Language Models
Paper • 2402.15159 • Published -
Rethinking Machine Unlearning for Large Language Models
Paper • 2402.08787 • Published • 3
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 25 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 19
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 25 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 19
-
TOFU: A Task of Fictitious Unlearning for LLMs
Paper • 2401.06121 • Published • 20 -
The Frontier of Data Erasure: Machine Unlearning for Large Language Models
Paper • 2403.15779 • Published • 1 -
Machine Unlearning of Pre-trained Large Language Models
Paper • 2402.15159 • Published -
Rethinking Machine Unlearning for Large Language Models
Paper • 2402.08787 • Published • 3
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 25 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 19
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 17 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 61