--- license: other library_name: transformers pipeline_tag: image-text-to-text tags: - multimodal - video-understanding - long-term-memory - agentic-memory - light-omni --- # Light-Omni Light-Omni is a multimodal agent framework for reflexive video understanding with long-term memory. It replaces costly detective-style iterative reasoning with dual contextual states: a compact global state consolidated from episodic memory, and a latent state that drives action control and semantically aligned retrieval. This repository hosts the Light-Omni model checkpoint for inference. It contains the safetensors weight shards, tokenizer files, model configuration, and multimodal preprocessor configuration files. ## Links - Project page: https://clare-nie.github.io/Light-Omni/ - Code: https://github.com/Clare-Nie/Light-Omni - Dataset: https://huggingface.co/datasets/ClareNie/Light-Omni-Training - Model: https://huggingface.co/ClareNie/Light-Omni - Paper: https://arxiv.org/abs/xxxx.xxxx ## Citation ```bibtex @inproceedings{nie2026lightomni, title={Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory}, author={Nie, Chang and Wei, Jiaju and Feng, Junlan and Fu, Chaoyou and Shan, Caifeng}, year={2026}, url={http://arxiv.org/abs/xxxx.xxxx} } ```