--- library_name: transformers license: cc-by-nc-sa-4.0 tags: [] --- ## Model Details - Model: ReasonSigLIP2-So14-384-S0-Rea - Base model: [google/siglip2-so400m-patch14-384](https://huggingface.co/google/siglip2-so400m-patch14-384) - Architecture: SigLIP2 ViT-So400M/14 - Image resolution: 384 - Training stage: Stage 0 - Reasoning - Training data: Only reasoning caption-image pairs from [ReasonLite-42M](https://huggingface.co/datasets/RISys-Lab/ReasonCLIPLite-42M) and [ReasonPro-16M](https://huggingface.co/datasets/RISys-Lab/ReasonCLIPPro-16M) ## Method ![Method overview](https://raw.githubusercontent.com/RISys-Lab/ReasonCLIP/main/doc/method.png) ## Resources - GitHub: [RISys-Lab/ReasonCLIP](https://github.com/RISys-Lab/ReasonCLIP) - Paper: [arXiv:2606.26794](https://arxiv.org/abs/2606.26794) ## Usage ```python from transformers import AutoModel, AutoProcessor model_id = "RISys-Lab/ReasonSigLIP2-So14-384-S0-Rea" model = AutoModel.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) ``` For the full checkpoint list, see the [ReasonCLIP model card](https://github.com/RISys-Lab/ReasonCLIP/blob/main/doc/model_card.md).