--- title: Asset Harvester emoji: "\U0001F697" colorFrom: green colorTo: indigo sdk: docker app_port: 7860 pinned: false short_description: Image-to-3D for autonomous-vehicle simulation assets --- # Asset Harvester [**Paper**](https://arxiv.org/abs/2604.18468) | [**Project Page**](https://research.nvidia.com/labs/sil/projects/asset-harvester/) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore) Upload one image of a single object (vehicle, pedestrian, cyclist, or other road object) and get back a complete 3D Gaussian splat asset ready for simulation. ## Pipeline ``` upload ─▶ image guard (optional) ─▶ object segmentation ─▶ recenter + pad │ ▼ 3D Gaussian splat ◀── TokenGS lifting ◀── multiview diffusion ◀── camera estimation ``` 1. **Object segmentation** (`AH_object_seg_jit.pt`) — Mask2Former JIT produces a binary mask of the foreground object at the uploaded image's native resolution. 2. **Camera estimation** (`AH_camera_estimator.safetensors`) — predicts camera pose, distance, FOV, and object dimensions (LWH). Shares the C-RADIO backbone with multiview diffusion to avoid loading it twice. 3. **Multiview diffusion** (`AH_multiview_diffusion.safetensors`) — SparseViewDiT generates 16 novel orbit views conditioned on the input image. 4. **TokenGS lifting** (`AH_tokengs_lifting.safetensors`) — feed-forward 3D Gaussian reconstructor lifts the 16 views to a full 3DGS asset. ## Outputs - Multiview MP4 (16-frame orbit at 5fps). - 3D Gaussian orbit render (MP4). - Gaussian splat (PLY) ready for simulation engines. ## Hardware Single NVIDIA GPU with compute capability ≥ 8.0 and ≥ 30 GB VRAM. Typical end-to-end runtime: **1-2 minutes** per image on A100/H100. ## Limitations - Single-object only — images with multiple distinct subjects will use the largest mask and discard the rest. - Heavily occluded objects or out-of-distribution subjects (e.g., objects not seen in driving logs) may produce hallucinated geometry. - Image guard uses `meta-llama/Llama-Guard-3-11B-Vision` — enabling it adds ~20-30 s per run. ## Local deployment ```bash docker build --build-arg HF_TOKEN=$HF_TOKEN -t asset-harvester . docker run --gpus all -e HF_TOKEN=$HF_TOKEN -p 7860:7860 asset-harvester ``` Checkpoints are downloaded from [`nvidia/asset-harvester`](https://huggingface.co/nvidia/asset-harvester) on first run. `HF_TOKEN` must have access to that repo. ## Governing terms Use of this system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).