Embedl SAM 3D Body (Quantized)

Deployable version of the DINOv3 ViT-H+/16 vision backbone from facebook/sam-3d-body-dinov3 — the 840.6 M-parameter image encoder that dominates the cost of Meta's single-image full-body human-mesh recovery model. Mixed-precision INT8/FP16 quantization with hardware-aware optimizations from embedl-deploy for low-latency NVIDIA TensorRT inference.

3D Human Mesh Recovery

The deployed backbone drives the full SAM-3D-Body pipeline (decoder + MHR parametric mesh head) to recover a 3D human body from a single image. Below is our INT8-quantized backbone applied to the Gothenburg Poseidon (Carl Milles): input → mesh overlay → ¾ view → side view. The INT8 mesh stays within ~0.5 % (mean ≈ 10 mm) of the FP32 result. Reproduce it with demo_3d.py (see "3D mesh demo" below).

Rotating recovered mesh

Backbone features (PCA)

Nvidia L4

Open facebook/sam-3d-body-dinov3 in hfviewer

Highlights

Format: ONNX with external weights (embedl_sam3dbody_int8.onnx + .onnx.data) plus a torch.export graph (embedl_sam3dbody_int8.pt2).
Precision: INT8 with sensitive layers (the 3-channel patch-embed conv and LayerNorms) kept in FP16.
Runtime: TensorRT (FP16 + INT8 mode); the .pt2 loads directly with torch.export.load.
Input: a single (1, 3, 512, 512) ImageNet-normalized person crop → (1, 1280, 32, 32) feature map (the production SAM-3D-Body resolution).
5.1× faster and 3.3× smaller than FP32 on an NVIDIA L4; recovered 3D mesh within ~0.5 % of the FP32 result.

Quick Start

1. Download the model

hf download embedl/sam-3d-body \
  embedl_sam3dbody_int8.onnx embedl_sam3dbody_int8.onnx.data \
  infer_trt.py infer_pt2.py sample_input.png --local-dir .

2. Build the TensorRT engine

/usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3dbody_int8.onnx \
        --fp16 --int8 \
        --builderOptimizationLevel=3 \
        --memPoolSize=workspace:4294967296 \
        --saveEngine=embedl_sam3dbody_int8.engine

3. Run the demo

infer_trt.py (TensorRT) and infer_pt2.py (torch.export) both run the backbone on sample_input.png, report the feature-map statistics and latency, and save a PCA visualization of the patch features — the classic DINOv3 "what the backbone sees" image.

python3 -m venv venv --system-site-packages   # use system TensorRT
source venv/bin/activate
pip install pillow numpy
python infer_trt.py --image sample_input.png --save-pca features_pca.png

Running the full 3D-mesh demo

demo_3d.py reproduces the 3D body above: it loads the upstream SAM-3D-Body pipeline, swaps in this INT8 backbone, recovers the mesh from an image, and renders it (matplotlib — no OpenGL/GPU rendering needed).

1. Install the Python tools

# this backbone + render/runtime deps
pip install torch matplotlib pillow numpy imageio opencv-python huggingface_hub

# upstream SAM-3D-Body pipeline (decoder + MHR mesh head).
# NOTE: the repo is not a pip package — clone it and use it via PYTHONPATH.
git clone https://github.com/facebookresearch/sam-3d-body
# its runtime deps (see sam-3d-body/INSTALL.md):
pip install pytorch-lightning yacs scikit-image einops timm dill hydra-core hydra-colorlog \
  pyrootutils roma loguru optree fvcore trimesh braceexpand webdataset "networkx==3.2.1" \
  chump jsonlines joblib pandas rich smplx torchvision

2. Get the model files (you must accept the gated upstream licence first at facebook/sam-3d-body-dinov3)

# upstream checkpoint + config + MHR asset
hf download facebook/sam-3d-body-dinov3 model.ckpt model_config.yaml assets/mhr_model.pt --local-dir sam3d_ckpt
# this repo's INT8 backbone + demo + example image
hf download embedl/sam-3d-body embedl_sam3dbody_int8.pt2 demo_3d.py sample_input.png --local-dir .

3. Run it (works out of the box on the shipped sample_input.png)

PYTHONPATH=sam-3d-body python demo_3d.py \
  --image sample_input.png \           # or your own person photo
  --ckpt-dir sam3d_ckpt \
  --pt2 embedl_sam3dbody_int8.pt2 \
  --out mesh_demo.png
# add  --bbox x1 y1 x2 y2  to crop a single person out of a wider scene
# -> mesh_demo.png (input + overlay + ¾/side views) and mesh_demo_spin.gif

Internally: our INT8 backbone produces the image features → SAM-3D-Body's decoder + MHR parametric mesh head turn them into the body mesh (~~18.4 k vertices) → the renderer projects and shades it. The decoder/head run in eager PyTorch (~~5 % of the compute); the 840 M-param backbone — the dominant cost — is what this repo accelerates.

Backbone-only (no upstream repo needed)

infer_pt2.py / infer_trt.py run just the backbone on an image and save a PCA feature visualization:

pip install pillow numpy                      # + torch (pt2) or tensorrt+pycuda (trt)
python infer_pt2.py --image sample_input.png --save-pca features_pca.png
python infer_trt.py --image sample_input.png --save-pca features_pca.png   # builds a TRT engine

Files

File	Description
`embedl_sam3dbody_int8.onnx`	Quantized ONNX model with QDQ operations precalibrated
`embedl_sam3dbody_int8.onnx.data`	External weights (~3.4 GB)
`embedl_sam3dbody_int8.pt2`	INT8 `torch.export` ExportedProgram (`torch.export.load`-able)
`infer_trt.py`	TensorRT backbone inference + latency + PCA-feature demo
`infer_pt2.py`	`torch.export` backbone inference + PCA-feature demo
`demo_3d.py`	Full 3D-mesh demo — runs the SAM-3D-Body pipeline with this INT8 backbone and renders the recovered body mesh
`sample_input.png`	Example person crop (from the SAM-3D-Body qualitative gallery)

Performance

Latency measured with TensorRT 10.16 (tensorrt-cu12 build) on CUDA 12.8 (NVIDIA driver 570.211), GPU compute time only (CUDA events, 2 s warmup, 300-iteration timing), batch 1 at 512×512 (the production backbone resolution).

NVIDIA L4 GPU

Environment: NVIDIA L4 · NVIDIA driver 570.211 / CUDA 12.8 · TensorRT 10.16 (tensorrt-cu12 build) · single-image (1, 3, 512, 512) input.

Configuration	Mean latency	p95	Throughput	Engine size	Speedup
FP32 (TensorRT)	145.1 ms	147.8 ms	6.9 qps	3210 MiB	1.00×
FP16 (TensorRT)	44.7 ms	45.7 ms	22.4 qps	1606 MiB	3.24×
Embedl Deploy INT8+FP16 (this model)	28.3 ms	29.0 ms	35.3 qps	968 MiB	5.12×

Reproducibility / version sensitivity: use TensorRT ≥ 10.16 with the CUDA 12 build (pip install tensorrt-cu12). The INT8 path is highly sensitive to the TRT version's kernel selection on Ada (sm_89): TensorRT 10.16 reaches the 28.3 ms above, whereas TensorRT 10.1 builds the same engine at ~73 ms (≈2.6× slower) and with slightly degraded features. The default pip install tensorrt wheel is a CUDA 13 build and will fail with cudaErrorInsufficientDriver on CUDA 12.x drivers — install tensorrt-cu12 instead.

Fidelity

Because the backbone feeds a 3D-mesh pipeline, the metric that matters is the recovered mesh, not raw features. Running the full SAM-3D-Body pipeline with the INT8 TensorRT engine vs. the FP32 reference on a real person image:

Configuration	Recovered-mesh deviation vs FP32	Backbone-feature rel. diff
FP16 (TensorRT)	negligible	0.36 %
Embedl Deploy INT8+FP16 (this model)	mean ≈ 10 mm (~0.5 %), max ≈ 67 mm	(calibration-dependent)

The decoder + MHR head are robust to the backbone's INT8 quantization noise, so the 3D body is preserved (the patch-embed conv and LayerNorms are kept in FP16). Mesh deviation is on a ~1.8 m body. Re-calibrating on a larger, more diverse set of person crops tightens this further.

Using real weights

This repository ships the INT8-quantized DINOv3 ViT-H+/16 backbone of facebook/sam-3d-body-dinov3. The full SAM-3D-Body pipeline adds a lightweight decoder and an MHR parametric mesh head (together ~5 % of the compute) that stay in eager PyTorch — the backbone is the natural, torch.export-friendly deployment boundary. Feed the TensorRT / .pt2 backbone features into the upstream decoder + MHR head for end-to-end human-mesh recovery.

You must have accepted the upstream gated license at facebook/sam-3d-body-dinov3 to use this derivative.

Creating Your Own Optimized Models

Deployment-ready models can be created from any supported base model using embedl-deploy, available on PyPI. This artifact follows the SAM3 tutorial workflow applied to the SAM-3D-Body backbone.

License

This model is a derivative of facebook/sam-3d-body-dinov3.

Component	License
Upstream (Meta SAM 3D Body / DINOv3)	SAM 3D Body License
Optimized components	Embedl Models Community Licence v1.0 (no redistribution as a hosted service)

Contact

Enterprise & commercial inquiries: models@embedl.com
Technical issues & early access: github.com/embedl/embedl-deploy

We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities.

Downloads last month: -

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for embedl/sam-3d-body

Base model

facebook/sam-3d-body-dinov3

Quantized

(2)

this model