---
license: apache-2.0
base_model: facebook/sam2.1-hiera-small
tags:
  - robotics
  - edge-deployment
  - anima
  - forge
  - int8
  - quantized
  - sam2
  - segmentation
  - image-segmentation
  - video-segmentation
  - ros2
  - jetson
  - real-time
  - vision
library_name: transformers
pipeline_tag: image-segmentation
model-index:
  - name: sam2.1-hiera-small-int8
    results:
      - task:
          type: image-segmentation
        metrics:
          - name: Model Size (MB)
            type: model_size
            value: 186
          - name: Compression Ratio
            type: compression
            value: 1.9
          - name: Original Size (MB)
            type: original_size
            value: 352
---

# SAM 2.1 Hiera-Small — INT8 Quantized

> Meta's Segment Anything Model 2.1 (Hiera-Small backbone) quantized to INT8 for real-time robotic segmentation. **1.9x smaller** — from 352 MB to 186 MB — with both image and video segmentation capabilities preserved.

This model is part of the **[RobotFlowLabs](https://huggingface.co/robotflowlabs)** model library, built for the **ANIMA** agentic robotics platform — a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.

## Why This Model Exists

SAM2 is the state-of-the-art for promptable segmentation — given a point, box, or mask prompt, it segments any object in images or tracks it through video. The Hiera-Small variant is the **production sweet spot**: fast enough for real-time robotics, accurate enough for manipulation tasks, and at 186 MB it leaves room for depth estimation, feature extraction, and action models on the same edge GPU.

## Model Details

| Property | Value |
|----------|-------|
| **Architecture** | Hiera-Small vision backbone + SAM2 decoder |
| **Input Resolution** | 1024 × 1024 |
| **Capabilities** | Image segmentation, video object tracking |
| **Backbone Stages** | 4 stages: [1, 2, 11, 2] blocks |
| **Embed Dims** | [96, 192, 384, 768] per stage |
| **Attention Heads** | [1, 2, 4, 8] per stage |
| **Global Attention** | Blocks 7, 10, 13 |
| **Mask Decoder** | 256-dim hidden, 8 attention heads, 3 multi-mask outputs |
| **Memory Attention** | 4 layers, 2048-dim FFN, RoPE positional encoding |
| **Memory Bank** | 7 frames temporal context |
| **Original Model** | [`facebook/sam2.1-hiera-small`](https://huggingface.co/facebook/sam2.1-hiera-small) |
| **License** | Apache-2.0 |

## Compression Results

Quantized on an NVIDIA L4 24GB GPU using INT8 dynamic quantization with SafeTensors export.

| Metric | Original | INT8 Quantized | Change |
|--------|----------|----------------|--------|
| **Total Size** | 352 MB | 186 MB | **1.9x smaller** |
| **INT8 Weights** | — | 39 MB | Quantized linear layers |
| **SafeTensors** | — | 148 MB | Full model weights |
| **Quantization** | FP32 | INT8 Dynamic | Per-tensor symmetric |
| **Format** | PyTorch | SafeTensors + INT8 .pt | Dual format |

> **Why SafeTensors instead of ONNX?** SAM2 uses custom CUDA operations (roi_align, deformable attention) that aren't supported by the ONNX standard. SafeTensors provides fast, safe loading directly into PyTorch with zero-copy memory mapping.

## Included Files

```
sam2.1-hiera-small-int8/
├── model_int8.pt              # 39 MB — INT8 quantized state dict
├── model.safetensors          # 148 MB — Full model in SafeTensors format
├── config.json                # Model configuration
├── preprocessor_config.json   # Image preprocessing config
└── README.md                  # This file
```

## Quick Start

### PyTorch (SafeTensors)

```python
from transformers import Sam2Model, Sam2Processor
import torch

# Load with SafeTensors (automatic)
model = Sam2Model.from_pretrained("robotflowlabs/sam2.1-hiera-small-int8")
processor = Sam2Processor.from_pretrained("facebook/sam2.1-hiera-small")

model.to("cuda").eval()

# Segment with point prompt
inputs = processor(
    images=image,
    input_points=[[[500, 375]]],  # (x, y) point prompt
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    outputs = model(**inputs)

masks = processor.post_process_masks(
    outputs.pred_masks,
    inputs["original_sizes"],
    inputs["reshaped_input_sizes"]
)
```

### INT8 Weights (Maximum Compression)

```python
import torch
from transformers import Sam2Model

# Load architecture, then apply INT8 weights
model = Sam2Model.from_pretrained("facebook/sam2.1-hiera-small")
int8_state = torch.load("model_int8.pt", map_location="cuda", weights_only=True)
model.load_state_dict(int8_state, strict=False)
```

### With FORGE (ANIMA Integration)

```python
from forge.vision import VisionEncoderRegistry

# FORGE handles optimal loading and batching
segmenter = VisionEncoderRegistry.load("sam2.1-hiera-small-int8")
masks = segmenter.segment(image, points=[[500, 375]])
```

## Use Cases in ANIMA

SAM2-Small is the **default segmentation backbone** for production ANIMA deployments:

- **Object Isolation** — Segment graspable objects from cluttered scenes for manipulation planning
- **Workspace Mapping** — Identify free space, obstacles, and surfaces for navigation
- **Video Tracking** — Track objects across frames during manipulation sequences (7-frame temporal memory)
- **Safety Zones** — Segment human body parts and keep-out regions for safe human-robot collaboration
- **Bin Picking** — Segment individual parts from a bin for industrial pick-and-place

## SAM2 Model Family

We provide all three SAM2.1 variants, optimized for different deployment scenarios:

| Model | Size | Speed | Best For |
|-------|------|-------|----------|
| [sam2.1-hiera-large-int8](https://huggingface.co/robotflowlabs/sam2.1-hiera-large-int8) | 1.0 GB | Highest quality | Research, high-accuracy tasks |
| **[sam2.1-hiera-small-int8](https://huggingface.co/robotflowlabs/sam2.1-hiera-small-int8)** | **186 MB** | **Balanced** | **Production robotics** |
| [sam2.1-hiera-tiny-int8](https://huggingface.co/robotflowlabs/sam2.1-hiera-tiny-int8) | 152 MB | Fastest | Real-time edge, Jetson Nano |

## Intended Use

### Designed For
- Promptable segmentation in robotic manipulation pipelines
- Video object tracking during multi-step tasks
- Instance segmentation for bin picking and object isolation
- Real-time scene parsing on edge GPUs (Jetson Orin, L4)

### Limitations
- INT8 quantization may slightly reduce mask boundary precision on very fine structures
- Video tracking requires sequential frame processing (not parallelizable)
- Requires a prompt (point, box, or mask) — not a panoptic segmenter
- Inherits biases from SA-V dataset (primarily indoor/outdoor natural scenes)

### Out of Scope
- Medical image segmentation without domain-specific validation
- Autonomous driving perception (not trained on driving data)
- Surveillance or tracking of individuals

## Technical Details

### Compression Pipeline

```
Original SAM2.1 Hiera-Small (FP32, 352 MB)
    │
    ├─→ torchao INT8 dynamic quantization (GPU-native)
    │   └─→ model_int8.pt (39 MB)
    │
    └─→ SafeTensors export (roi_align not ONNX-compatible)
        └─→ model.safetensors (148 MB)
```

- **Quantization**: INT8 dynamic activation + INT8 weight via `torchao` on NVIDIA L4 GPU
- **Export**: SafeTensors format — zero-copy memory mapping, fast loading, framework-agnostic
- **Why not ONNX**: SAM2's roi_align and deformable attention are custom CUDA ops that ONNX opset 18 cannot represent
- **Hardware**: NVIDIA L4 24GB, CUDA 13.0, PyTorch 2.10, Python 3.14

## Attribution

- **Original Model**: [`facebook/sam2.1-hiera-small`](https://huggingface.co/facebook/sam2.1-hiera-small) by Meta AI (FAIR)
- **License**: [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) — free for commercial and research use
- **Paper**: [SAM 2: Segment Anything in Images and Videos](https://arxiv.org/abs/2408.00714) — Ravi et al., 2024
- **Dataset**: SA-V — 50.9K videos, 642.6K masklets
- **Compressed by**: [RobotFlowLabs](https://huggingface.co/robotflowlabs) using [FORGE](https://github.com/robotflowlabs/forge)

## Citation

```bibtex
@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolber, Chloe and Gustafson, Laura and others},
  journal={arXiv preprint arXiv:2408.00714},
  year={2024}
}
```

```bibtex
@misc{robotflowlabs2026anima,
  title={ANIMA: Agentic Networked Intelligence for Modular Autonomy},
  author={RobotFlowLabs},
  year={2026},
  url={https://huggingface.co/robotflowlabs}
}
```

---

<p align="center">
  <b>Built with FORGE by <a href="https://huggingface.co/robotflowlabs">RobotFlowLabs</a></b><br>
  Optimizing foundation models for real robots.
</p>