--- license: apache-2.0 library_name: transformers pipeline_tag: video-text-to-text --- # OmniVideo-30B (Qwen3-Omni) [](https://yzlmhzz.github.io/OmniVideo-100K/) [](https://arxiv.org/abs/2606.14702) [](https://github.com/MiG-NJU/OmniVideo-100K) [](https://huggingface.co/datasets/MiG-NJU/OmniVideo-100K) This is the fine-tuned **OmniVideo-30B** model, initialized from the official **Qwen3-Omni-30B-A3B-Instruct** and trained on the **[OmniVideo-100K](https://huggingface.co/datasets/MiG-NJU/OmniVideo-100K)** instruction-tuning dataset introduced in our paper: *"[OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains](https://arxiv.org/abs/2606.14702)"*. ## 🚀 Performance
| Models | OmniVideo-Test | Daily-Omni | OmniVideoBench | JointAVBench | FutureOmni | Video-MMEshort | Video-MME-v2 |
|---|---|---|---|---|---|---|---|
| Qwen3-Omni-30B-A3B-Instruct | 49.70 | 74.27 | 43.84 | 63.17 | 53.44 | 82.00 | 14.31 |
| OmniVideo-30B (Qwen3-Omni) | 63.56+13.86 | 76.61+2.34 | 44.81+0.97 | 66.37+3.20 | 57.60+4.16 | 83.56+1.56 | 15.33+1.02 |