asset-harvester / README.md
shsolanki's picture
Initial Asset Harvester HF Space
aafeaa2
|
Raw
History Blame Contribute Delete
2.91 kB
metadata
title: Asset Harvester
emoji: πŸš—
colorFrom: green
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Image-to-3D for autonomous-vehicle simulation assets

Asset Harvester

Paper | Project Page | Code | Model | Data

Upload one image of a single object (vehicle, pedestrian, cyclist, or other road object) and get back a complete 3D Gaussian splat asset ready for simulation.

Pipeline

upload ─▢ image guard (optional) ─▢ object segmentation ─▢ recenter + pad
                                                              β”‚
                                                              β–Ό
              3D Gaussian splat ◀── TokenGS lifting ◀── multiview diffusion ◀── camera estimation
  1. Object segmentation (AH_object_seg_jit.pt) β€” Mask2Former JIT produces a binary mask of the foreground object at the uploaded image's native resolution.
  2. Camera estimation (AH_camera_estimator.safetensors) β€” predicts camera pose, distance, FOV, and object dimensions (LWH). Shares the C-RADIO backbone with multiview diffusion to avoid loading it twice.
  3. Multiview diffusion (AH_multiview_diffusion.safetensors) β€” SparseViewDiT generates 16 novel orbit views conditioned on the input image.
  4. TokenGS lifting (AH_tokengs_lifting.safetensors) β€” feed-forward 3D Gaussian reconstructor lifts the 16 views to a full 3DGS asset.

Outputs

  • Multiview MP4 (16-frame orbit at 5fps).
  • 3D Gaussian orbit render (MP4).
  • Gaussian splat (PLY) ready for simulation engines.

Hardware

Single NVIDIA GPU with compute capability β‰₯ 8.0 and β‰₯ 30 GB VRAM. Typical end-to-end runtime: 1-2 minutes per image on A100/H100.

Limitations

  • Single-object only β€” images with multiple distinct subjects will use the largest mask and discard the rest.
  • Heavily occluded objects or out-of-distribution subjects (e.g., objects not seen in driving logs) may produce hallucinated geometry.
  • Image guard uses meta-llama/Llama-Guard-3-11B-Vision β€” enabling it adds ~20-30 s per run.

Local deployment

docker build --build-arg HF_TOKEN=$HF_TOKEN -t asset-harvester .
docker run --gpus all -e HF_TOKEN=$HF_TOKEN -p 7860:7860 asset-harvester

Checkpoints are downloaded from nvidia/asset-harvester on first run. HF_TOKEN must have access to that repo.

Governing terms

Use of this system is governed by the NVIDIA Open Model License Agreement.