Laguna M.1 Collection Our most capable model to date, designed for long-horizon work. Apache 2.0. • 4 items • Updated about 20 hours ago • 15
view article Article olmo-eval: An evaluation workbench for the model development loop allenai • 9 days ago • 16
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? Paper • 2410.03859 • Published Oct 4, 2024 • 3
SWE-bench Collection SWE-bench (Lite, Verified, Multimodal, Multilingual) all in one place! • 5 items • Updated Dec 14, 2025 • 9
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Paper • 2310.06770 • Published Oct 10, 2023 • 12
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings Paper • 2606.07502 • Published 16 days ago • 96
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 16 days ago • 117
view article Article Introducing North Mini Code: Cohere’s First Model For Developers CohereLabs • 12 days ago • 71
view article Article How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces mishig • 12 days ago • 23
view article Article Arcee Becomes the First Major American AI Lab to Replace AWS S3 with Hugging Face Private Storage, in a Multi-Million Dollar Commercial Partnership clem • 12 days ago • 32
view article Article Designing the hf CLI as an agent-optimized way to work with the Hub celinah, Wauplin • 17 days ago • 57
view article Article Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining nvidia • 17 days ago • 17
view article Article MONET: Lowering the Barrier to World Class Image Generation Research jasperai • 24 days ago • 10