Frames2LoRA SmolVLM Checkpoints

Frames2LoRA: Parametric Video Internalization for Vision-Language Models

Official implementation of Frames2LoRA

Manan Suri · Sarvesh Baskar · Dinesh Manocha

University of Maryland, College Park

This repository contains two Frames2LoRA Stage 1 checkpoint files:

frames2lora-smolvlm2-500m-best-ce.pt for HuggingFaceTB/SmolVLM2-500M-Video-Instruct
frames2lora-smolvlm2-2.2b-best-ce.pt for HuggingFaceTB/SmolVLM2-2.2B-Instruct

Cite us

@misc{suri2026frames2loraparametricvideointernalization,
      title={Frames2LoRA: Parametric Video Internalization for Vision-Language Models},
      author={Manan Suri and Sarvesh Baskar and Dinesh Manocha},
      year={2026},
      eprint={2606.04351},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.04351},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for MananSuri27/Frames2LoRA-SmolVLM-ckpts

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Paper • 2606.04351 • Published 11 days ago • 4