-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2501.16975
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 55 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 32 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 37
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 49 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 39 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 151
-
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 23 -
Hyper-Connections
Paper • 2409.19606 • Published • 27 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper • 2501.16975 • Published • 32
-
distilbert/distilbert-base-uncased-finetuned-sst-2-english
Text Classification • 67M • Updated • 2.29M • • 904 -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper • 2501.16975 • Published • 32 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 208
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 41 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 28 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 25 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48
-
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 23 -
Hyper-Connections
Paper • 2409.19606 • Published • 27 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper • 2501.16975 • Published • 32
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 55 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 32 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 37
-
distilbert/distilbert-base-uncased-finetuned-sst-2-english
Text Classification • 67M • Updated • 2.29M • • 904 -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper • 2501.16975 • Published • 32 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 208
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 41 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 49 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 39 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 151
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 28 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 25 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19