--- license: apache-2.0 base_model: facebook/sam2.1-hiera-small tags: - robotics - edge-deployment - anima - forge - int8 - quantized - sam2 - segmentation - image-segmentation - video-segmentation - ros2 - jetson - real-time - vision library_name: transformers pipeline_tag: image-segmentation model-index: - name: sam2.1-hiera-small-int8 results: - task: type: image-segmentation metrics: - name: Model Size (MB) type: model_size value: 186 - name: Compression Ratio type: compression value: 1.9 - name: Original Size (MB) type: original_size value: 352 --- # SAM 2.1 Hiera-Small — INT8 Quantized > Meta's Segment Anything Model 2.1 (Hiera-Small backbone) quantized to INT8 for real-time robotic segmentation. **1.9x smaller** — from 352 MB to 186 MB — with both image and video segmentation capabilities preserved. This model is part of the **[RobotFlowLabs](https://huggingface.co/robotflowlabs)** model library, built for the **ANIMA** agentic robotics platform — a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world. ## Why This Model Exists SAM2 is the state-of-the-art for promptable segmentation — given a point, box, or mask prompt, it segments any object in images or tracks it through video. The Hiera-Small variant is the **production sweet spot**: fast enough for real-time robotics, accurate enough for manipulation tasks, and at 186 MB it leaves room for depth estimation, feature extraction, and action models on the same edge GPU. ## Model Details | Property | Value | |----------|-------| | **Architecture** | Hiera-Small vision backbone + SAM2 decoder | | **Input Resolution** | 1024 × 1024 | | **Capabilities** | Image segmentation, video object tracking | | **Backbone Stages** | 4 stages: [1, 2, 11, 2] blocks | | **Embed Dims** | [96, 192, 384, 768] per stage | | **Attention Heads** | [1, 2, 4, 8] per stage | | **Global Attention** | Blocks 7, 10, 13 | | **Mask Decoder** | 256-dim hidden, 8 attention heads, 3 multi-mask outputs | | **Memory Attention** | 4 layers, 2048-dim FFN, RoPE positional encoding | | **Memory Bank** | 7 frames temporal context | | **Original Model** | [`facebook/sam2.1-hiera-small`](https://huggingface.co/facebook/sam2.1-hiera-small) | | **License** | Apache-2.0 | ## Compression Results Quantized on an NVIDIA L4 24GB GPU using INT8 dynamic quantization with SafeTensors export. | Metric | Original | INT8 Quantized | Change | |--------|----------|----------------|--------| | **Total Size** | 352 MB | 186 MB | **1.9x smaller** | | **INT8 Weights** | — | 39 MB | Quantized linear layers | | **SafeTensors** | — | 148 MB | Full model weights | | **Quantization** | FP32 | INT8 Dynamic | Per-tensor symmetric | | **Format** | PyTorch | SafeTensors + INT8 .pt | Dual format | > **Why SafeTensors instead of ONNX?** SAM2 uses custom CUDA operations (roi_align, deformable attention) that aren't supported by the ONNX standard. SafeTensors provides fast, safe loading directly into PyTorch with zero-copy memory mapping. ## Included Files ``` sam2.1-hiera-small-int8/ ├── model_int8.pt # 39 MB — INT8 quantized state dict ├── model.safetensors # 148 MB — Full model in SafeTensors format ├── config.json # Model configuration ├── preprocessor_config.json # Image preprocessing config └── README.md # This file ``` ## Quick Start ### PyTorch (SafeTensors) ```python from transformers import Sam2Model, Sam2Processor import torch # Load with SafeTensors (automatic) model = Sam2Model.from_pretrained("robotflowlabs/sam2.1-hiera-small-int8") processor = Sam2Processor.from_pretrained("facebook/sam2.1-hiera-small") model.to("cuda").eval() # Segment with point prompt inputs = processor( images=image, input_points=[[[500, 375]]], # (x, y) point prompt return_tensors="pt" ).to("cuda") with torch.no_grad(): outputs = model(**inputs) masks = processor.post_process_masks( outputs.pred_masks, inputs["original_sizes"], inputs["reshaped_input_sizes"] ) ``` ### INT8 Weights (Maximum Compression) ```python import torch from transformers import Sam2Model # Load architecture, then apply INT8 weights model = Sam2Model.from_pretrained("facebook/sam2.1-hiera-small") int8_state = torch.load("model_int8.pt", map_location="cuda", weights_only=True) model.load_state_dict(int8_state, strict=False) ``` ### With FORGE (ANIMA Integration) ```python from forge.vision import VisionEncoderRegistry # FORGE handles optimal loading and batching segmenter = VisionEncoderRegistry.load("sam2.1-hiera-small-int8") masks = segmenter.segment(image, points=[[500, 375]]) ``` ## Use Cases in ANIMA SAM2-Small is the **default segmentation backbone** for production ANIMA deployments: - **Object Isolation** — Segment graspable objects from cluttered scenes for manipulation planning - **Workspace Mapping** — Identify free space, obstacles, and surfaces for navigation - **Video Tracking** — Track objects across frames during manipulation sequences (7-frame temporal memory) - **Safety Zones** — Segment human body parts and keep-out regions for safe human-robot collaboration - **Bin Picking** — Segment individual parts from a bin for industrial pick-and-place ## SAM2 Model Family We provide all three SAM2.1 variants, optimized for different deployment scenarios: | Model | Size | Speed | Best For | |-------|------|-------|----------| | [sam2.1-hiera-large-int8](https://huggingface.co/robotflowlabs/sam2.1-hiera-large-int8) | 1.0 GB | Highest quality | Research, high-accuracy tasks | | **[sam2.1-hiera-small-int8](https://huggingface.co/robotflowlabs/sam2.1-hiera-small-int8)** | **186 MB** | **Balanced** | **Production robotics** | | [sam2.1-hiera-tiny-int8](https://huggingface.co/robotflowlabs/sam2.1-hiera-tiny-int8) | 152 MB | Fastest | Real-time edge, Jetson Nano | ## Intended Use ### Designed For - Promptable segmentation in robotic manipulation pipelines - Video object tracking during multi-step tasks - Instance segmentation for bin picking and object isolation - Real-time scene parsing on edge GPUs (Jetson Orin, L4) ### Limitations - INT8 quantization may slightly reduce mask boundary precision on very fine structures - Video tracking requires sequential frame processing (not parallelizable) - Requires a prompt (point, box, or mask) — not a panoptic segmenter - Inherits biases from SA-V dataset (primarily indoor/outdoor natural scenes) ### Out of Scope - Medical image segmentation without domain-specific validation - Autonomous driving perception (not trained on driving data) - Surveillance or tracking of individuals ## Technical Details ### Compression Pipeline ``` Original SAM2.1 Hiera-Small (FP32, 352 MB) │ ├─→ torchao INT8 dynamic quantization (GPU-native) │ └─→ model_int8.pt (39 MB) │ └─→ SafeTensors export (roi_align not ONNX-compatible) └─→ model.safetensors (148 MB) ``` - **Quantization**: INT8 dynamic activation + INT8 weight via `torchao` on NVIDIA L4 GPU - **Export**: SafeTensors format — zero-copy memory mapping, fast loading, framework-agnostic - **Why not ONNX**: SAM2's roi_align and deformable attention are custom CUDA ops that ONNX opset 18 cannot represent - **Hardware**: NVIDIA L4 24GB, CUDA 13.0, PyTorch 2.10, Python 3.14 ## Attribution - **Original Model**: [`facebook/sam2.1-hiera-small`](https://huggingface.co/facebook/sam2.1-hiera-small) by Meta AI (FAIR) - **License**: [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) — free for commercial and research use - **Paper**: [SAM 2: Segment Anything in Images and Videos](https://arxiv.org/abs/2408.00714) — Ravi et al., 2024 - **Dataset**: SA-V — 50.9K videos, 642.6K masklets - **Compressed by**: [RobotFlowLabs](https://huggingface.co/robotflowlabs) using [FORGE](https://github.com/robotflowlabs/forge) ## Citation ```bibtex @article{ravi2024sam2, title={SAM 2: Segment Anything in Images and Videos}, author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolber, Chloe and Gustafson, Laura and others}, journal={arXiv preprint arXiv:2408.00714}, year={2024} } ``` ```bibtex @misc{robotflowlabs2026anima, title={ANIMA: Agentic Networked Intelligence for Modular Autonomy}, author={RobotFlowLabs}, year={2026}, url={https://huggingface.co/robotflowlabs} } ``` ---
Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.