--- license: other license_name: embedl-models-community-licence-1.0 license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE base_model: - facebook/sam-3d-body-dinov3 quantized_from: - facebook/sam-3d-body-dinov3 tags: - image-feature-extraction - sam - sam-3d-body - dinov3 - quantization - onnx - tensorrt - edge - embedl gated: true extra_gated_heading: "Access Embedl SAM 3D Body (Quantized)" extra_gated_description: >- To access this model, please review and accept the terms below. Your contact information is collected solely to manage access and, with your explicit consent, to notify you about updated or new optimized models from Embedl. You can withdraw consent at any time by contacting us (see Contact section below). See our license for full terms. extra_gated_button_content: "Agree and request access" extra_gated_prompt: "By requesting access you agree to the Embedl Models Community Licence and the upstream SAM 3D Body License. You must also have access to the gated upstream model facebook/sam-3d-body-dinov3." extra_gated_fields: Company: text I agree to the Embedl Models Community Licence and upstream SAM 3D Body License: checkbox I consent to being contacted by Embedl about products and services: checkbox --- # Embedl SAM 3D Body (Quantized) Deployable version of the **DINOv3 ViT-H+/16 vision backbone** from [facebook/sam-3d-body-dinov3](https://huggingface.co/facebook/sam-3d-body-dinov3) — the 840.6 M-parameter image encoder that dominates the cost of Meta's single-image full-body human-mesh recovery model. Mixed-precision INT8/FP16 quantization with hardware-aware optimizations from [embedl-deploy](https://github.com/embedl/embedl-deploy) for low-latency NVIDIA TensorRT inference. ## 3D Human Mesh Recovery The deployed backbone drives the full SAM-3D-Body pipeline (decoder + MHR parametric mesh head) to recover a 3D human body from a single image. Below is our **INT8-quantized** backbone applied to the Gothenburg *Poseidon* (Carl Milles): input → mesh overlay → ¾ view → side view. The INT8 mesh stays within **~0.5 % (mean ≈ 10 mm)** of the FP32 result. Reproduce it with [`demo_3d.py`](demo_3d.py) (see "3D mesh demo" below).

Rotating recovered mesh

Backbone features (PCA)

Nvidia L4

Open facebook/sam-3d-body-dinov3 in hfviewer

## Highlights - **Format:** ONNX with external weights (`embedl_sam3dbody_int8.onnx` + `.onnx.data`) plus a `torch.export` graph (`embedl_sam3dbody_int8.pt2`). - **Precision:** INT8 with sensitive layers (the 3-channel patch-embed conv and LayerNorms) kept in FP16. - **Runtime:** TensorRT (FP16 + INT8 mode); the `.pt2` loads directly with `torch.export.load`. - **Input:** a single `(1, 3, 512, 512)` ImageNet-normalized person crop → `(1, 1280, 32, 32)` feature map (the production SAM-3D-Body resolution). - **5.1× faster and 3.3× smaller** than FP32 on an NVIDIA L4; recovered 3D mesh within **~0.5 %** of the FP32 result. ## Quick Start ### 1. Download the model ```bash hf download embedl/sam-3d-body \ embedl_sam3dbody_int8.onnx embedl_sam3dbody_int8.onnx.data \ infer_trt.py infer_pt2.py sample_input.png --local-dir . ``` ### 2. Build the TensorRT engine ```bash /usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3dbody_int8.onnx \ --fp16 --int8 \ --builderOptimizationLevel=3 \ --memPoolSize=workspace:4294967296 \ --saveEngine=embedl_sam3dbody_int8.engine ``` ### 3. Run the demo `infer_trt.py` (TensorRT) and `infer_pt2.py` (`torch.export`) both run the backbone on `sample_input.png`, report the feature-map statistics and latency, and save a **PCA visualization** of the patch features — the classic DINOv3 "what the backbone sees" image. ```bash python3 -m venv venv --system-site-packages # use system TensorRT source venv/bin/activate pip install pillow numpy python infer_trt.py --image sample_input.png --save-pca features_pca.png ``` ## Running the full 3D-mesh demo `demo_3d.py` reproduces the 3D body above: it loads the upstream SAM-3D-Body pipeline, **swaps in this INT8 backbone**, recovers the mesh from an image, and renders it (matplotlib — no OpenGL/GPU rendering needed). **1. Install the Python tools** ```bash # this backbone + render/runtime deps pip install torch matplotlib pillow numpy imageio opencv-python huggingface_hub # upstream SAM-3D-Body pipeline (decoder + MHR mesh head). # NOTE: the repo is not a pip package — clone it and use it via PYTHONPATH. git clone https://github.com/facebookresearch/sam-3d-body # its runtime deps (see sam-3d-body/INSTALL.md): pip install pytorch-lightning yacs scikit-image einops timm dill hydra-core hydra-colorlog \ pyrootutils roma loguru optree fvcore trimesh braceexpand webdataset "networkx==3.2.1" \ chump jsonlines joblib pandas rich smplx torchvision ``` **2. Get the model files** (you must accept the gated upstream licence first at [facebook/sam-3d-body-dinov3](https://huggingface.co/facebook/sam-3d-body-dinov3)) ```bash # upstream checkpoint + config + MHR asset hf download facebook/sam-3d-body-dinov3 model.ckpt model_config.yaml assets/mhr_model.pt --local-dir sam3d_ckpt # this repo's INT8 backbone + demo + example image hf download embedl/sam-3d-body embedl_sam3dbody_int8.pt2 demo_3d.py sample_input.png --local-dir . ``` **3. Run it** (works out of the box on the shipped `sample_input.png`) ```bash PYTHONPATH=sam-3d-body python demo_3d.py \ --image sample_input.png \ # or your own person photo --ckpt-dir sam3d_ckpt \ --pt2 embedl_sam3dbody_int8.pt2 \ --out mesh_demo.png # add --bbox x1 y1 x2 y2 to crop a single person out of a wider scene # -> mesh_demo.png (input + overlay + ¾/side views) and mesh_demo_spin.gif ``` Internally: our INT8 backbone produces the image features → SAM-3D-Body's decoder + **MHR parametric mesh head** turn them into the body mesh (~18.4 k vertices) → the renderer projects and shades it. The decoder/head run in eager PyTorch (~5 % of the compute); the 840 M-param backbone — the dominant cost — is what this repo accelerates. ### Backbone-only (no upstream repo needed) `infer_pt2.py` / `infer_trt.py` run just the backbone on an image and save a PCA feature visualization: ```bash pip install pillow numpy # + torch (pt2) or tensorrt+pycuda (trt) python infer_pt2.py --image sample_input.png --save-pca features_pca.png python infer_trt.py --image sample_input.png --save-pca features_pca.png # builds a TRT engine ``` ## Files | File | Description | |---|---| | `embedl_sam3dbody_int8.onnx` | Quantized ONNX model with QDQ operations precalibrated | | `embedl_sam3dbody_int8.onnx.data` | External weights (~3.4 GB) | | `embedl_sam3dbody_int8.pt2` | INT8 `torch.export` ExportedProgram (`torch.export.load`-able) | | `infer_trt.py` | TensorRT backbone inference + latency + PCA-feature demo | | `infer_pt2.py` | `torch.export` backbone inference + PCA-feature demo | | `demo_3d.py` | **Full 3D-mesh demo** — runs the SAM-3D-Body pipeline with this INT8 backbone and renders the recovered body mesh | | `sample_input.png` | Example person crop (from the SAM-3D-Body qualitative gallery) | ## Performance Latency measured with **TensorRT 10.16** (`tensorrt-cu12` build) on **CUDA 12.8** (NVIDIA driver 570.211), GPU compute time only (CUDA events, 2 s warmup, 300-iteration timing), batch 1 at 512×512 (the production backbone resolution). ### NVIDIA L4 GPU > **Environment:** NVIDIA L4 · NVIDIA driver 570.211 / CUDA 12.8 · TensorRT 10.16 > (`tensorrt-cu12` build) · single-image (1, 3, 512, 512) input. | Configuration | Mean latency | p95 | Throughput | Engine size | Speedup | |---|---|---|---|---|---| | FP32 (TensorRT) | 145.1 ms | 147.8 ms | 6.9 qps | 3210 MiB | 1.00× | | FP16 (TensorRT) | 44.7 ms | 45.7 ms | 22.4 qps | 1606 MiB | 3.24× | | **Embedl Deploy INT8+FP16 (this model)** | **28.3 ms** | **29.0 ms** | **35.3 qps** | **968 MiB** | **5.12×** | > **Reproducibility / version sensitivity:** use **TensorRT ≥ 10.16** with the CUDA 12 > build (`pip install tensorrt-cu12`). The INT8 path is highly sensitive to the TRT > version's kernel selection on Ada (sm_89): TensorRT 10.16 reaches the 28.3 ms above, > whereas TensorRT 10.1 builds the *same* engine at ~73 ms (≈2.6× slower) and with > slightly degraded features. The default `pip install tensorrt` wheel is a CUDA 13 > build and will fail with `cudaErrorInsufficientDriver` on CUDA 12.x drivers — install > `tensorrt-cu12` instead. ## Fidelity Because the backbone feeds a 3D-mesh pipeline, the metric that matters is the recovered **mesh**, not raw features. Running the full SAM-3D-Body pipeline with the INT8 TensorRT engine vs. the FP32 reference on a real person image: | Configuration | Recovered-mesh deviation vs FP32 | Backbone-feature rel. diff | |---|---|---| | FP16 (TensorRT) | negligible | 0.36 % | | **Embedl Deploy INT8+FP16 (this model)** | **mean ≈ 10 mm (~0.5 %), max ≈ 67 mm** | (calibration-dependent) | The decoder + MHR head are robust to the backbone's INT8 quantization noise, so the 3D body is preserved (the patch-embed conv and LayerNorms are kept in FP16). Mesh deviation is on a ~1.8 m body. Re-calibrating on a larger, more diverse set of person crops tightens this further. ## Using real weights This repository ships the **INT8-quantized DINOv3 ViT-H+/16 backbone** of `facebook/sam-3d-body-dinov3`. The full SAM-3D-Body pipeline adds a lightweight decoder and an MHR parametric mesh head (together ~5 % of the compute) that stay in eager PyTorch — the backbone is the natural, `torch.export`-friendly deployment boundary. Feed the TensorRT / `.pt2` backbone features into the upstream decoder + MHR head for end-to-end human-mesh recovery. You must have accepted the upstream gated license at [`facebook/sam-3d-body-dinov3`](https://huggingface.co/facebook/sam-3d-body-dinov3) to use this derivative. ## Creating Your Own Optimized Models Deployment-ready models can be created from any supported base model using [embedl-deploy](https://deploy.embedl.com), available on PyPI. This artifact follows the [SAM3 tutorial](https://docs.embedl.com/embedl-deploy/latest/auto_tutorials/sam3.html) workflow applied to the SAM-3D-Body backbone. ## License This model is a derivative of **facebook/sam-3d-body-dinov3**. | Component | License | |---|---| | **Upstream (Meta SAM 3D Body / DINOv3)** | [SAM 3D Body License](https://huggingface.co/facebook/sam-3d-body-dinov3/blob/main/LICENSE) | | **Optimized components** | [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) *(no redistribution as a hosted service)* | ## Contact - **Enterprise & commercial inquiries:** [models@embedl.com](mailto:models@embedl.com) - **Technical issues & early access:** [github.com/embedl/embedl-deploy](https://github.com/embedl/embedl-deploy/) We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities.