Embedl SAM 3D Body (Quantized)

Deployable version of the DINOv3 ViT-H+/16 vision backbone from facebook/sam-3d-body-dinov3 β€” the 840.6 M-parameter image encoder that dominates the cost of Meta's single-image full-body human-mesh recovery model. Mixed-precision INT8/FP16 quantization with hardware-aware optimizations from embedl-deploy for low-latency NVIDIA TensorRT inference.

3D Human Mesh Recovery

The deployed backbone drives the full SAM-3D-Body pipeline (decoder + MHR parametric mesh head) to recover a 3D human body from a single image. Below is our INT8-quantized backbone applied to the Gothenburg Poseidon (Carl Milles): input β†’ mesh overlay β†’ ΒΎ view β†’ side view. The INT8 mesh stays within ~0.5 % (mean β‰ˆ 10 mm) of the FP32 result. Reproduce it with demo_3d.py (see "3D mesh demo" below).

Rotating recovered mesh
Backbone features (PCA)

Nvidia L4

Open facebook/sam-3d-body-dinov3 in hfviewer

Highlights

  • Format: ONNX with external weights (embedl_sam3dbody_int8.onnx + .onnx.data) plus a torch.export graph (embedl_sam3dbody_int8.pt2).
  • Precision: INT8 with sensitive layers (the 3-channel patch-embed conv and LayerNorms) kept in FP16.
  • Runtime: TensorRT (FP16 + INT8 mode); the .pt2 loads directly with torch.export.load.
  • Input: a single (1, 3, 512, 512) ImageNet-normalized person crop β†’ (1, 1280, 32, 32) feature map (the production SAM-3D-Body resolution).
  • 5.1Γ— faster and 3.3Γ— smaller than FP32 on an NVIDIA L4; recovered 3D mesh within ~0.5 % of the FP32 result.

Quick Start

1. Download the model

hf download embedl/sam-3d-body \
  embedl_sam3dbody_int8.onnx embedl_sam3dbody_int8.onnx.data \
  infer_trt.py infer_pt2.py sample_input.png --local-dir .

2. Build the TensorRT engine

/usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3dbody_int8.onnx \
        --fp16 --int8 \
        --builderOptimizationLevel=3 \
        --memPoolSize=workspace:4294967296 \
        --saveEngine=embedl_sam3dbody_int8.engine

3. Run the demo

infer_trt.py (TensorRT) and infer_pt2.py (torch.export) both run the backbone on sample_input.png, report the feature-map statistics and latency, and save a PCA visualization of the patch features β€” the classic DINOv3 "what the backbone sees" image.

python3 -m venv venv --system-site-packages   # use system TensorRT
source venv/bin/activate
pip install pillow numpy
python infer_trt.py --image sample_input.png --save-pca features_pca.png

Running the full 3D-mesh demo

demo_3d.py reproduces the 3D body above: it loads the upstream SAM-3D-Body pipeline, swaps in this INT8 backbone, recovers the mesh from an image, and renders it (matplotlib β€” no OpenGL/GPU rendering needed).

1. Install the Python tools

# this backbone + render/runtime deps
pip install torch matplotlib pillow numpy imageio opencv-python huggingface_hub

# upstream SAM-3D-Body pipeline (decoder + MHR mesh head).
# NOTE: the repo is not a pip package β€” clone it and use it via PYTHONPATH.
git clone https://github.com/facebookresearch/sam-3d-body
# its runtime deps (see sam-3d-body/INSTALL.md):
pip install pytorch-lightning yacs scikit-image einops timm dill hydra-core hydra-colorlog \
  pyrootutils roma loguru optree fvcore trimesh braceexpand webdataset "networkx==3.2.1" \
  chump jsonlines joblib pandas rich smplx torchvision

2. Get the model files (you must accept the gated upstream licence first at facebook/sam-3d-body-dinov3)

# upstream checkpoint + config + MHR asset
hf download facebook/sam-3d-body-dinov3 model.ckpt model_config.yaml assets/mhr_model.pt --local-dir sam3d_ckpt
# this repo's INT8 backbone + demo + example image
hf download embedl/sam-3d-body embedl_sam3dbody_int8.pt2 demo_3d.py sample_input.png --local-dir .

3. Run it (works out of the box on the shipped sample_input.png)

PYTHONPATH=sam-3d-body python demo_3d.py \
  --image sample_input.png \           # or your own person photo
  --ckpt-dir sam3d_ckpt \
  --pt2 embedl_sam3dbody_int8.pt2 \
  --out mesh_demo.png
# add  --bbox x1 y1 x2 y2  to crop a single person out of a wider scene
# -> mesh_demo.png (input + overlay + ΒΎ/side views) and mesh_demo_spin.gif

Internally: our INT8 backbone produces the image features β†’ SAM-3D-Body's decoder + MHR parametric mesh head turn them into the body mesh (18.4 k vertices) β†’ the renderer projects and shades it. The decoder/head run in eager PyTorch (5 % of the compute); the 840 M-param backbone β€” the dominant cost β€” is what this repo accelerates.

Backbone-only (no upstream repo needed)

infer_pt2.py / infer_trt.py run just the backbone on an image and save a PCA feature visualization:

pip install pillow numpy                      # + torch (pt2) or tensorrt+pycuda (trt)
python infer_pt2.py --image sample_input.png --save-pca features_pca.png
python infer_trt.py --image sample_input.png --save-pca features_pca.png   # builds a TRT engine

Files

File Description
embedl_sam3dbody_int8.onnx Quantized ONNX model with QDQ operations precalibrated
embedl_sam3dbody_int8.onnx.data External weights (~3.4 GB)
embedl_sam3dbody_int8.pt2 INT8 torch.export ExportedProgram (torch.export.load-able)
infer_trt.py TensorRT backbone inference + latency + PCA-feature demo
infer_pt2.py torch.export backbone inference + PCA-feature demo
demo_3d.py Full 3D-mesh demo β€” runs the SAM-3D-Body pipeline with this INT8 backbone and renders the recovered body mesh
sample_input.png Example person crop (from the SAM-3D-Body qualitative gallery)

Performance

Latency measured with TensorRT 10.16 (tensorrt-cu12 build) on CUDA 12.8 (NVIDIA driver 570.211), GPU compute time only (CUDA events, 2 s warmup, 300-iteration timing), batch 1 at 512Γ—512 (the production backbone resolution).

NVIDIA L4 GPU

Environment: NVIDIA L4 Β· NVIDIA driver 570.211 / CUDA 12.8 Β· TensorRT 10.16 (tensorrt-cu12 build) Β· single-image (1, 3, 512, 512) input.

Configuration Mean latency p95 Throughput Engine size Speedup
FP32 (TensorRT) 145.1 ms 147.8 ms 6.9 qps 3210 MiB 1.00Γ—
FP16 (TensorRT) 44.7 ms 45.7 ms 22.4 qps 1606 MiB 3.24Γ—
Embedl Deploy INT8+FP16 (this model) 28.3 ms 29.0 ms 35.3 qps 968 MiB 5.12Γ—

Reproducibility / version sensitivity: use TensorRT β‰₯ 10.16 with the CUDA 12 build (pip install tensorrt-cu12). The INT8 path is highly sensitive to the TRT version's kernel selection on Ada (sm_89): TensorRT 10.16 reaches the 28.3 ms above, whereas TensorRT 10.1 builds the same engine at ~73 ms (β‰ˆ2.6Γ— slower) and with slightly degraded features. The default pip install tensorrt wheel is a CUDA 13 build and will fail with cudaErrorInsufficientDriver on CUDA 12.x drivers β€” install tensorrt-cu12 instead.

Fidelity

Because the backbone feeds a 3D-mesh pipeline, the metric that matters is the recovered mesh, not raw features. Running the full SAM-3D-Body pipeline with the INT8 TensorRT engine vs. the FP32 reference on a real person image:

Configuration Recovered-mesh deviation vs FP32 Backbone-feature rel. diff
FP16 (TensorRT) negligible 0.36 %
Embedl Deploy INT8+FP16 (this model) mean β‰ˆ 10 mm (~0.5 %), max β‰ˆ 67 mm (calibration-dependent)

The decoder + MHR head are robust to the backbone's INT8 quantization noise, so the 3D body is preserved (the patch-embed conv and LayerNorms are kept in FP16). Mesh deviation is on a ~1.8 m body. Re-calibrating on a larger, more diverse set of person crops tightens this further.

Using real weights

This repository ships the INT8-quantized DINOv3 ViT-H+/16 backbone of facebook/sam-3d-body-dinov3. The full SAM-3D-Body pipeline adds a lightweight decoder and an MHR parametric mesh head (together ~5 % of the compute) that stay in eager PyTorch β€” the backbone is the natural, torch.export-friendly deployment boundary. Feed the TensorRT / .pt2 backbone features into the upstream decoder + MHR head for end-to-end human-mesh recovery.

You must have accepted the upstream gated license at facebook/sam-3d-body-dinov3 to use this derivative.

Creating Your Own Optimized Models

Deployment-ready models can be created from any supported base model using embedl-deploy, available on PyPI. This artifact follows the SAM3 tutorial workflow applied to the SAM-3D-Body backbone.

License

This model is a derivative of facebook/sam-3d-body-dinov3.

Component License
Upstream (Meta SAM 3D Body / DINOv3) SAM 3D Body License
Optimized components Embedl Models Community Licence v1.0 (no redistribution as a hosted service)

Contact

We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for embedl/sam-3d-body

Quantized
(2)
this model