Spaces:

nvidia
/

asset-harvester

Running on A100

App Files Files Community

asset-harvester / README.md

shsolanki

Initial Asset Harvester HF Space

aafeaa2 about 2 months ago

preview code

Raw

History Blame Contribute Delete

2.91 kB

metadata

title: Asset Harvester
emoji: 🚗
colorFrom: green
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Image-to-3D for autonomous-vehicle simulation assets

Asset Harvester

Paper | Project Page | Code | Model | Data

Upload one image of a single object (vehicle, pedestrian, cyclist, or other road object) and get back a complete 3D Gaussian splat asset ready for simulation.

Pipeline

upload ─▶ image guard (optional) ─▶ object segmentation ─▶ recenter + pad
                                                              │
                                                              ▼
              3D Gaussian splat ◀── TokenGS lifting ◀── multiview diffusion ◀── camera estimation

Object segmentation (AH_object_seg_jit.pt) — Mask2Former JIT produces a binary mask of the foreground object at the uploaded image's native resolution.
Camera estimation (AH_camera_estimator.safetensors) — predicts camera pose, distance, FOV, and object dimensions (LWH). Shares the C-RADIO backbone with multiview diffusion to avoid loading it twice.
Multiview diffusion (AH_multiview_diffusion.safetensors) — SparseViewDiT generates 16 novel orbit views conditioned on the input image.
TokenGS lifting (AH_tokengs_lifting.safetensors) — feed-forward 3D Gaussian reconstructor lifts the 16 views to a full 3DGS asset.

Outputs

Multiview MP4 (16-frame orbit at 5fps).
3D Gaussian orbit render (MP4).
Gaussian splat (PLY) ready for simulation engines.

Hardware

Single NVIDIA GPU with compute capability ≥ 8.0 and ≥ 30 GB VRAM. Typical end-to-end runtime: 1-2 minutes per image on A100/H100.

Limitations

Single-object only — images with multiple distinct subjects will use the largest mask and discard the rest.
Heavily occluded objects or out-of-distribution subjects (e.g., objects not seen in driving logs) may produce hallucinated geometry.
Image guard uses meta-llama/Llama-Guard-3-11B-Vision — enabling it adds ~20-30 s per run.

Local deployment

docker build --build-arg HF_TOKEN=$HF_TOKEN -t asset-harvester .
docker run --gpus all -e HF_TOKEN=$HF_TOKEN -p 7860:7860 asset-harvester

Checkpoints are downloaded from nvidia/asset-harvester on first run. HF_TOKEN must have access to that repo.

Governing terms

Use of this system is governed by the NVIDIA Open Model License Agreement.