---
license: other
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- multimodal
- video-understanding
- long-term-memory
- agentic-memory
- light-omni
---

# Light-Omni

Light-Omni is a multimodal agent framework for reflexive video understanding
with long-term memory. It replaces costly detective-style iterative reasoning
with dual contextual states: a compact global state consolidated from episodic
memory, and a latent state that drives action control and semantically aligned
retrieval.

This repository hosts the Light-Omni model checkpoint for inference. It contains
the safetensors weight shards, tokenizer files, model configuration, and
multimodal preprocessor configuration files.

## Links

- Project page: https://clare-nie.github.io/Light-Omni/
- Code: https://github.com/Clare-Nie/Light-Omni
- Dataset: https://huggingface.co/datasets/ClareNie/Light-Omni-Training
- Model: https://huggingface.co/ClareNie/Light-Omni
- Paper: https://arxiv.org/abs/xxxx.xxxx

## Citation

```bibtex
@inproceedings{nie2026lightomni,
  title={Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory},
  author={Nie, Chang and Wei, Jiaju and Feng, Junlan and Fu, Chaoyou and Shan,
  Caifeng},
  year={2026},
  url={http://arxiv.org/abs/xxxx.xxxx}
}
```