Latent Inspector — ONNX Vision Encoders

abdelstark 's Collections

updated Apr 20

Parity-validated ONNX exports of self-supervised vision encoders (V-JEPA 2, EUPE) for the latent-inspector Rust CLI.

abdelstark/vjepa2-vitl-img16-256-onnx

Image Feature Extraction • Updated Apr 20

Note V-JEPA 2 ViT-L/16 — image-native I/O (takes [1,3,256,256], handles the 16-frame tubelet internally). Cleanest drop-in for cross-encoder comparison against DINOv2 / I-JEPA / EUPE.
abdelstark/vjepa2-vitl-fpc2-256-onnx

Image Feature Extraction • Updated Apr 3

Note V-JEPA 2 ViT-L/16 — minimal 2-frame variant. Use when you already have video tensors. 304M params, 1024-dim embeddings, 256 patch tokens.
abdelstark/eupe-vit-b16-onnx

Image Feature Extraction • Updated Apr 8

Note EUPE ViT-B/16 — Efficient Universal Perception Encoder (86M params, 768-dim, 197 tokens inc. CLS). Corrected export using the legacy TorchScript exporter after the torch.export path failed on upstream decomposition.