--- title: fastvlm-0.5b-unity emoji: 🎬 colorFrom: blue colorTo: green sdk: static pinned: false license: mit short_description: Real-time scene captioning with FastVLM ONNX on Unity Sentis --- # FastVLM 0.5B for Unity Sentis This repository is a Unity 6 + Sentis (`com.unity.ai.inference`) demo for FastVLM-based scene captioning. ## Demo (YouTube) [![FastVLM Unity](https://img.youtube.com/vi/zyDNLEEXR0Q/0.jpg)](https://www.youtube.com/watch?v=zyDNLEEXR0Q) ## Environment - **Unity Version**: `6000.3.6f1` - **Sentis Version**: `com.unity.ai.inference 2.5.0` (customized) - **Custom layers added to the ONNX converter**: `RotaryEmbedding`, `GroupQueryAttention`, `SimplifiedLayerNormalization`, `SkipSimplifiedLayerNormalization` - **Implementation file**: `fastvlm-0.5b-unity/Packages/com.unity.ai.inference/Editor/ONNX/ONNXModelConverter.cs` ## Project Structure - `Assets/FastVLM/FastVLMScene.unity`: Main runtime scene - `Assets/FastVLM/VLMController.cs`: VideoPlayer-UI bridge and continuous inference loop - `Assets/FastVLM/ModelVLM.cs`: Model initialization, vision/text embedding composition, and generation - `Assets/FastVLM/Qwen2Tokenizer.cs`: Qwen2 BPE tokenizer - `Assets/StreamingAssets/fastvlm/`: `vocab.json`, `merges.txt`, `tokenizer_config.json` ## Required Model Files Prepare the ONNX files below in `Assets/FastVLM/Models/` and assign them to the `ModelVLM` component in `VLMManager`. Source models: https://huggingface.co/onnx-community/FastVLM-0.5B-ONNX/tree/main/onnx Download the three files below from the link above, then copy them into `Assets/FastVLM/Models/`. - `vision_encoder.onnx` - `embed_tokens.onnx` - `decoder_model_merged.onnx` ## Quick Start 1. Open the project in Unity `6000.3.6f1`. 2. Open `Assets/FastVLM/FastVLMScene.unity`. 3. Check `VLMManager > ModelVLM` and verify all `ModelAsset` fields are assigned. 4. Hit Play. 5. Edit the prompt in `InputField` if needed. The next loop uses the updated prompt. ## License MIT