Spaces:

nvidia
/

asset-harvester

Running on A100

shsolanki commited on Apr 20

Commit

aafeaa2

1 Parent(s): 251ccea

Initial Asset Harvester HF Space

Gradio app + Dockerfile for image-to-3D Gaussian splat pipeline.
Checkpoints are downloaded at runtime from nvidia/asset-harvester
via HF_TOKEN; not included in the repo.

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +4 -0
.gitignore +11 -0
Dockerfile +63 -0
README.md +54 -10
app.py +601 -0
dist/asset_harvester-1.0.0-py3-none-any.whl +3 -0
examples/VRU_pedestrians_0d7b602f2da8c364.jpeg +3 -0
examples/VRU_pedestrians_723ce847bf6b1671.jpeg +3 -0
examples/VRU_pedestrians_c2d728e02d4d11cc.jpeg +3 -0
examples/automobile_00c7f5b5caa9e7d0.jpeg +3 -0
examples/automobile_00e617a279b7f517.jpeg +3 -0
examples/automobile_00e9ab349b437b2c.jpeg +3 -0
examples/automobile_03271db9979f6072.jpeg +3 -0
examples/automobile_039b7b7af4bd853b.jpeg +3 -0
examples/automobile_044dfeb890d95741.jpeg +3 -0
examples/automobile_04acf10a71d112a1.jpeg +3 -0
examples/automobile_04cbe39ba786858d.jpeg +3 -0
examples/automobile_05abef8311f6ca8c.jpeg +3 -0
examples/automobile_0650ef1d75757b0e.jpeg +3 -0
examples/automobile_0742aaf29c0a7090.jpeg +3 -0
examples/automobile_07bf69847a2eae86.jpeg +3 -0
examples/automobile_095cdc57d3186c66.jpeg +3 -0
examples/automobile_0a5ccea0b758dd89.jpeg +3 -0
examples/automobile_0d21d1c69e594ca7.jpeg +3 -0
examples/automobile_0fc4baf8c34411e8.jpeg +3 -0
examples/automobile_125e8d7a5a5ab518.jpeg +3 -0
examples/automobile_13ee50f6c1e8e494.jpeg +3 -0
examples/automobile_14030c6da90d58a8.jpeg +3 -0
examples/automobile_14586bfbf8da0dd7.jpeg +3 -0
examples/automobile_14b077380d557c2b.jpeg +3 -0
examples/automobile_1585b4e264e88112.jpeg +3 -0
examples/automobile_1704da3176d628b1.jpeg +3 -0
examples/automobile_17289d1f0904c980.jpeg +3 -0
examples/automobile_1875f1efbced2624.jpeg +3 -0
examples/automobile_18f3d87e7b85d808.jpeg +3 -0
examples/automobile_191dbee26f68e8ca.jpeg +3 -0
examples/automobile_1a001d763cacdaa6.jpeg +3 -0
examples/automobile_1b42127109e81f09.jpeg +3 -0
examples/automobile_1bdd779f1bc8a22e.jpeg +3 -0
examples/automobile_1ca3d3c4b08fa14a.jpeg +3 -0
examples/automobile_1ee187f725a01351.jpeg +3 -0
examples/automobile_2054eba237562446.jpeg +3 -0
examples/automobile_23a0a9163760a5c1.jpeg +3 -0
examples/automobile_2516c24ad21db02a.jpeg +3 -0
examples/automobile_27a5512681e556e2.jpeg +3 -0
examples/automobile_292e62704c0e9f28.jpeg +3 -0
examples/automobile_29dfaf2cdbc2385a.jpeg +3 -0
examples/automobile_2bf70aae9df266eb.jpeg +3 -0
examples/automobile_2cab9afbbcc99c9e.jpeg +3 -0
examples/automobile_30a50721545fbffe.jpeg +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.whl filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+checkpoints-cache/
+checkpoints/
+.client-venv/
+.serve-venv/
+.claude/
+test_inputs/
+__pycache__/
+*.pyc
+*.pyo
+.DS_Store
+.env

Dockerfile ADDED Viewed

	@@ -0,0 +1,63 @@

+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+# ── Stage 1: Base system ─────────────────────────────────────────────
+FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04 AS base
+ENV DEBIAN_FRONTEND=noninteractive
+ENV TZ=UTC
+ENV PIP_NO_CACHE_DIR=1
+RUN apt-get update && apt-get install -y \
+    python3 python3-pip python3-dev \
+    ffmpeg git \
+    && rm -rf /var/lib/apt/lists/*
+RUN ln -sf /usr/bin/python3 /usr/bin/python
+WORKDIR /app
+# ── Stage 2: Install asset_harvester wheel + Gradio runtime deps ─────
+FROM base AS wheel
+COPY dist/asset_harvester-1.0.0-py3-none-any.whl /tmp/
+RUN pip install --no-cache-dir \
+        '/tmp/asset_harvester-1.0.0-py3-none-any.whl[multiview-diffusion,tokengs,camera-estimator]' \
+        'gradio>=5.14.0' spaces \
+    && rm /tmp/asset_harvester-1.0.0-py3-none-any.whl
+# ── Stage 3: gsplat from source (needs torch already installed) ──────
+FROM wheel AS gsplat
+ARG GSPLAT_COMMIT=b60e917c95afc449c5be33a634f1f457e116ff5e
+ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;9.0"
+RUN pip install --no-cache-dir --no-build-isolation \
+    "git+https://github.com/nerfstudio-project/gsplat.git@${GSPLAT_COMMIT}"
+# ── Stage 4: Final image ─────────────────────────────────────────────
+FROM gsplat AS final
+RUN useradd -m -u 1000 user \
+    && mkdir -p /app/checkpoints \
+    && chown -R 1000:1000 /app
+# HF_TOKEN from build secret (optional — can also be passed at runtime via -e)
+RUN --mount=type=secret,id=HF_TOKEN,mode=0444 \
+    if [ -f /run/secrets/HF_TOKEN ]; then \
+        echo "export HF_TOKEN=$(cat /run/secrets/HF_TOKEN)" > /etc/hf_env; \
+    else \
+        echo "# no build-time HF_TOKEN; provide via -e HF_TOKEN=..." > /etc/hf_env; \
+    fi \
+    && chmod +x /etc/hf_env
+RUN printf '#!/bin/bash\nsource /etc/hf_env\nexec "$@"\n' > /usr/local/bin/entrypoint.sh \
+    && chmod +x /usr/local/bin/entrypoint.sh
+COPY --chown=1000:1000 app.py /app/
+COPY --chown=1000:1000 examples /app/examples
+USER user
+EXPOSE 7860
+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,15 +1,59 @@
 ---
 title: Asset Harvester
-emoji: 📊
-colorFrom: purple
-colorTo: yellow
-sdk: gradio
-sdk_version: 6.9.0
-python_version: '3.12'
-app_file: app.py
 pinned: false
-license: apache-2.0
-short_description: 'Demo of nvidia/asset-harvester models '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Asset Harvester
+emoji: "\U0001F697"
+colorFrom: green
+colorTo: indigo
+sdk: docker
+app_port: 7860
 pinned: false
+short_description: Image-to-3D for autonomous-vehicle simulation assets
 ---
+# Asset Harvester
+[**Paper**](https://arxiv.org/abs/2604.18468) | [**Project Page**](https://research.nvidia.com/labs/sil/projects/asset-harvester/) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
+Upload one image of a single object (vehicle, pedestrian, cyclist, or other road object) and get back a complete 3D Gaussian splat asset ready for simulation.
+## Pipeline
+```
+upload ─▶ image guard (optional) ─▶ object segmentation ─▶ recenter + pad
+                                                              │
+                                                              ▼
+              3D Gaussian splat ◀── TokenGS lifting ◀── multiview diffusion ◀── camera estimation
+```
+1. **Object segmentation** (`AH_object_seg_jit.pt`) — Mask2Former JIT produces a binary mask of the foreground object at the uploaded image's native resolution.
+2. **Camera estimation** (`AH_camera_estimator.safetensors`) — predicts camera pose, distance, FOV, and object dimensions (LWH). Shares the C-RADIO backbone with multiview diffusion to avoid loading it twice.
+3. **Multiview diffusion** (`AH_multiview_diffusion.safetensors`) — SparseViewDiT generates 16 novel orbit views conditioned on the input image.
+4. **TokenGS lifting** (`AH_tokengs_lifting.safetensors`) — feed-forward 3D Gaussian reconstructor lifts the 16 views to a full 3DGS asset.
+## Outputs
+- Multiview MP4 (16-frame orbit at 5fps).
+- 3D Gaussian orbit render (MP4).
+- Gaussian splat (PLY) ready for simulation engines.
+## Hardware
+Single NVIDIA GPU with compute capability ≥ 8.0 and ≥ 30 GB VRAM. Typical end-to-end runtime: **1-2 minutes** per image on A100/H100.
+## Limitations
+- Single-object only — images with multiple distinct subjects will use the largest mask and discard the rest.
+- Heavily occluded objects or out-of-distribution subjects (e.g., objects not seen in driving logs) may produce hallucinated geometry.
+- Image guard uses `meta-llama/Llama-Guard-3-11B-Vision` — enabling it adds ~20-30 s per run.
+## Local deployment
+```bash
+docker build --build-arg HF_TOKEN=$HF_TOKEN -t asset-harvester .
+docker run --gpus all -e HF_TOKEN=$HF_TOKEN -p 7860:7860 asset-harvester
+```
+Checkpoints are downloaded from [`nvidia/asset-harvester`](https://huggingface.co/nvidia/asset-harvester) on first run. `HF_TOKEN` must have access to that repo.
+## Governing terms
+Use of this system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).

app.py ADDED Viewed

	@@ -0,0 +1,601 @@

+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+"""Asset Harvester Gradio demo — single-image upload to 3D Gaussian splat."""
+from __future__ import annotations
+import gc
+import logging
+import os
+import random
+import tempfile
+import threading
+import uuid
+from functools import partial
+import gradio as gr
+import imageio
+import numpy as np
+import torch
+import torchvision.transforms as T
+from diffusers.schedulers import DPMSolverMultistepScheduler
+from huggingface_hub import snapshot_download
+from PIL import Image
+class _SpacesStub:
+    @staticmethod
+    def GPU(*args, **kwargs):
+        def decorator(fn):
+            return fn
+        if args and callable(args[0]):
+            return args[0]
+        return decorator
+try:
+    import spaces
+    _HAS_SPACES = True
+except ImportError:
+    _HAS_SPACES = False
+    spaces = _SpacesStub()  # type: ignore[assignment]
+if os.getenv("SPACE_ID") is None:
+    _HAS_SPACES = False
+    spaces = _SpacesStub()  # type: ignore[assignment]
+logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
+logger = logging.getLogger(__name__)
+HF_CHECKPOINT_REPO = "nvidia/asset-harvester"
+CHECKPOINTS_DIR = "/app/checkpoints" if os.path.isdir("/app") else os.path.join(os.getcwd(), "checkpoints")
+MV_CKPT = "AH_multiview_diffusion.safetensors"
+TOKENGS_CKPT = "AH_tokengs_lifting.safetensors"
+AHC_CKPT = "AH_camera_estimator.safetensors"
+SEG_CKPT = "AH_object_seg_jit.pt"
+DEFAULT_NUM_STEPS = 30
+DEFAULT_CFG_SCALE = 2.0
+IMAGE_SIZE = 512
+GRAY_VALUE = 128
+SEG_INPUT_SIZE = (384, 384)
+MIN_MASK_AREA_FRAC = 0.01
+MAX_MASK_AREA_FRAC = 0.95
+MIN_UPLOAD_SIDE = 256
+_MODELS_LOCK = threading.Lock()
+_MODELS: dict = {}
+_CKPT_PATHS: dict[str, str] = {}
+_SESSION_MVDATA: dict[str, object] = {}
+def _load_seg_estimator_class():
+    """Load Mask2FormerSegmentationEstimator directly from its source file, bypassing
+    `asset_harvester.ncore_parser.__init__` which pulls in the private `ncore` module."""
+    import importlib.util
+    import asset_harvester
+    pkg_root = os.path.dirname(asset_harvester.__file__)
+    source = os.path.join(pkg_root, "ncore_parser", "image_segmentation.py")
+    spec = importlib.util.spec_from_file_location("_ah_image_segmentation", source)
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    return module.Mask2FormerSegmentationEstimator
+def _download_checkpoints() -> None:
+    if _CKPT_PATHS:
+        return
+    hf_token = os.getenv("HF_TOKEN")
+    local_dir = snapshot_download(
+        repo_id=HF_CHECKPOINT_REPO,
+        allow_patterns=[MV_CKPT, TOKENGS_CKPT, AHC_CKPT, SEG_CKPT],
+        local_dir=CHECKPOINTS_DIR,
+        token=hf_token,
+    )
+    for key, filename in (("mv", MV_CKPT), ("tokengs", TOKENGS_CKPT), ("ahc", AHC_CKPT), ("seg", SEG_CKPT)):
+        path = os.path.join(local_dir, filename)
+        if not os.path.isfile(path):
+            raise FileNotFoundError(f"Missing {filename} in {local_dir}")
+        _CKPT_PATHS[key] = path
+    logger.info("Checkpoints ready in %s", local_dir)
+def _load_models(device: str) -> dict:
+    with _MODELS_LOCK:
+        if _MODELS:
+            return _MODELS
+        from asset_harvester.camera_estimator.inference import AHCEstimator
+        from asset_harvester.multiview_diffusion.pipelines import SparseViewDiTPipeline
+        from asset_harvester.multiview_diffusion.utils.model_builder import get_models
+        from asset_harvester.tokengs.lifting_inference import TokengsLiftingRunner
+        Mask2FormerSegmentationEstimator = _load_seg_estimator_class()
+        _download_checkpoints()
+        dtype = torch.bfloat16 if device.startswith("cuda") else torch.float32
+        logger.info("Loading MVD (+ VAE, c-radio)...")
+        vae, cradio_model, cradio_image_processor, transformer = get_models(
+            _CKPT_PATHS["mv"], device=device, dtype=dtype,
+        )
+        scheduler = DPMSolverMultistepScheduler(
+            num_train_timesteps=1000,
+            beta_schedule="scaled_linear",
+            prediction_type="flow_prediction",
+            flow_shift=1.0,
+            use_flow_sigmas=True,
+        )
+        pipeline = SparseViewDiTPipeline(
+            vae=vae,
+            text_encoder=None,
+            tokenizer=None,
+            scheduler=scheduler,
+            transformer=transformer,
+            image_encoder=cradio_model,
+            image_processor=cradio_image_processor,
+        ).to(dtype)
+        logger.info("Loading AHC (shared c-radio)...")
+        ahc = AHCEstimator(
+            checkpoint_path=_CKPT_PATHS["ahc"],
+            device=device,
+            cradio_model=cradio_model,
+            cradio_image_processor=cradio_image_processor,
+        )
+        logger.info("Loading segmentation JIT...")
+        seg = Mask2FormerSegmentationEstimator(
+            model_path=_CKPT_PATHS["seg"],
+            device=device,
+            input_size=SEG_INPUT_SIZE,
+        )
+        logger.info("Loading TokenGS lifting...")
+        lifter = TokengsLiftingRunner(
+            _CKPT_PATHS["tokengs"], bbox_size=0.8, dtype=dtype, render_img_size=IMAGE_SIZE,
+        )
+        _MODELS.update(pipeline=pipeline, ahc=ahc, seg=seg, lifter=lifter, dtype=dtype, device=device)
+        return _MODELS
+def _segment(seg, image_pil: Image.Image) -> np.ndarray:
+    """Return a uint8 binary mask at the native image resolution."""
+    _, instance_seg = seg.predict(image_pil)
+    if len(instance_seg["classes"]) == 0:
+        return np.zeros((image_pil.height, image_pil.width), dtype=np.uint8)
+    mh, mw = SEG_INPUT_SIZE
+    unpacked = np.unpackbits(instance_seg["instance_masks"]).reshape(
+        len(instance_seg["classes"]), mh, mw,
+    )
+    areas = unpacked.sum(axis=(1, 2))
+    biggest = unpacked[int(np.argmax(areas))].astype(np.uint8) * 255
+    mask_pil = Image.fromarray(biggest, mode="L").resize(
+        (image_pil.width, image_pil.height), Image.NEAREST,
+    )
+    return np.array(mask_pil)
+def _recenter_and_pad(image_pil: Image.Image, mask_np: np.ndarray) -> tuple[Image.Image, Image.Image]:
+    """Translate image+mask so the mask centroid lands at frame center, square-pad, resize to 512.
+    Image padding uses GRAY_VALUE (matches AHC's apply_mask background). Mask padding uses 0.
+    Raises ValueError on degenerate masks.
+    """
+    H, W = mask_np.shape
+    ys, xs = np.where(mask_np > 0)
+    if ys.size == 0:
+        raise ValueError("No object detected in the input image. Try a cleaner photo with a single subject.")
+    area_frac = ys.size / (H * W)
+    if area_frac < MIN_MASK_AREA_FRAC:
+        raise ValueError(f"Detected object is too small ({area_frac * 100:.1f}% of image).")
+    if area_frac > MAX_MASK_AREA_FRAC:
+        raise ValueError(
+            f"Detected object fills nearly the whole image ({area_frac * 100:.1f}%); provide a wider-angle photo."
+        )
+    y0, y1 = int(ys.min()), int(ys.max())
+    x0, x1 = int(xs.min()), int(xs.max())
+    if y0 == 0 and y1 == H - 1 and x0 == 0 and x1 == W - 1:
+        raise ValueError("Object touches all four edges; provide an image showing the full object.")
+    cy = float(ys.mean())
+    cx = float(xs.mean())
+    side_y = int(np.ceil(2 * max(cy, H - cy)))
+    side_x = int(np.ceil(2 * max(cx, W - cx)))
+    side = max(side_y, side_x, H, W)
+    paste_y = side // 2 - int(round(cy))
+    paste_x = side // 2 - int(round(cx))
+    canvas_img = np.full((side, side, 3), GRAY_VALUE, dtype=np.uint8)
+    canvas_msk = np.zeros((side, side), dtype=np.uint8)
+    img_np = np.array(image_pil.convert("RGB"))
+    canvas_img[paste_y : paste_y + H, paste_x : paste_x + W] = img_np
+    canvas_msk[paste_y : paste_y + H, paste_x : paste_x + W] = mask_np
+    out_img = Image.fromarray(canvas_img).resize((IMAGE_SIZE, IMAGE_SIZE), Image.BILINEAR)
+    out_msk = Image.fromarray(canvas_msk).resize((IMAGE_SIZE, IMAGE_SIZE), Image.NEAREST)
+    return out_img, out_msk
+def _run_image_guard(image_pil: Image.Image, device: str, dtype: torch.dtype) -> None:
+    from asset_harvester.utils.image_guard import ImageGuard
+    guard = ImageGuard(device=device, dtype=dtype)
+    try:
+        guard.load()
+        result = guard.check_image(image_pil)
+    finally:
+        guard.unload()
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+    if not result.passed:
+        raise gr.Error(f"Image rejected by safety check (label={result.label}, score={result.score:.2f}).")
+def _build_mvdata(image_pil: Image.Image, mask_pil: Image.Image, ahc):
+    from asset_harvester.multiview_diffusion.data.nre_preproc import MVData
+    tmp = tempfile.mkdtemp(prefix="ah_upload_")
+    frame_p = os.path.join(tmp, "frame_0.jpg")
+    mask_p = os.path.join(tmp, "mask_0.png")
+    image_pil.save(frame_p, quality=95)
+    mask_pil.save(mask_p)
+    cam_data = ahc.run([(frame_p, mask_p)])
+    return MVData(
+        clip_id="upload",
+        obj_id="0",
+        frames=[np.array(image_pil)],
+        cam_poses=np.array(cam_data["cam_poses"], dtype=np.float32),
+        dists=np.array(cam_data["dists"], dtype=np.float32),
+        fov=np.array(cam_data["fov"], dtype=np.float32),
+        npct="vehicle",
+        lwh=np.array(cam_data["lwh"], dtype=np.float32),
+        masks=[np.array(mask_pil)],
+        auto_label=None,
+    )
+def _encode_mp4(frames_np, path: str, fps: int = 24) -> None:
+    imageio.v2.mimwrite(path, frames_np, fps=fps, macro_block_size=1)
+@spaces.GPU(duration=60)
+def run_segmentation(image_pil, is_example: bool = False, progress=gr.Progress()):
+    """First stage: safety check + segmentation + recentering + camera estimation.
+    Returns (mask_preview, state) where state is handed to `run_3d`.
+    Progress shown only on the mask image output.
+    """
+    if image_pil is None:
+        raise gr.Error("Please upload an image.")
+    if min(image_pil.size) < MIN_UPLOAD_SIDE:
+        raise gr.Error(f"Image too small ({image_pil.size[0]}x{image_pil.size[1]}); min {MIN_UPLOAD_SIDE}px per side.")
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    progress(0.1, desc="Loading models…")
+    models = _load_models(device)
+    dtype = models["dtype"]
+    image_pil = image_pil.convert("RGB")
+    if is_example:
+        progress(0.3, desc="Skipping safety check (curated example)…")
+    else:
+        progress(0.3, desc="Running safety check…")
+        _run_image_guard(image_pil, device, dtype)
+    progress(0.6, desc="Segmenting object…")
+    mask_np = _segment(models["seg"], image_pil)
+    progress(0.8, desc="Recentering and estimating camera…")
+    try:
+        centered_img, centered_mask = _recenter_and_pad(image_pil, mask_np)
+    except ValueError as e:
+        raise gr.Error(str(e))
+    rgb = np.array(image_pil)
+    fg = (mask_np > 0).astype(np.uint8)[:, :, None]
+    mask_preview = Image.fromarray(np.where(fg, rgb, np.full_like(rgb, GRAY_VALUE)).astype(np.uint8))
+    mvdata = _build_mvdata(centered_img, centered_mask, models["ahc"])
+    uid = str(uuid.uuid4())
+    _SESSION_MVDATA[uid] = mvdata
+    progress(1.0, desc="Done")
+    return mask_preview, uid
+@spaces.GPU(duration=180)
+def run_3d(state, progress=gr.Progress()):
+    """Second stage: multiview diffusion + TokenGS lifting.
+    Returns (orbit_mp4_path, ply_path) matching outputs=[video_out, ply_out].
+    """
+    if not state or state not in _SESSION_MVDATA:
+        raise gr.Error("Segmentation must run first.")
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    models = _load_models(device)
+    pipeline = models["pipeline"]
+    lifter = models["lifter"]
+    mvdata = _SESSION_MVDATA.pop(state)
+    from asset_harvester.multiview_diffusion.data.inference_utils import build_eval_cams
+    from asset_harvester.multiview_diffusion.data.nre_preproc import preproc
+    progress(0.05, desc="Preparing multiview conditioning…")
+    transform = T.Compose(
+        [T.Resize(IMAGE_SIZE), T.ToTensor(), T.Normalize([0.5], [0.5])]
+    )
+    inference_preproc = partial(
+        preproc,
+        image_transform=transform,
+        resolution=IMAGE_SIZE,
+        conditioning_mode="n",
+        eval_mode=True,
+        eval_cam_sampler=build_eval_cams,
+    )
+    data_dict = inference_preproc(mvdata)
+    max_length = data_dict.n_target + min(4, len(data_dict.x) - data_dict.n_target)
+    for attr in ("x", "c2w_relatives", "x_white_background", "dists", "fovs", "plucker_image", "relative_brightness"):
+        if hasattr(data_dict, attr):
+            setattr(data_dict, attr, getattr(data_dict, attr)[:max_length])
+    if hasattr(data_dict, "intrinsics") and data_dict.intrinsics.shape[0] > max_length:
+        data_dict.intrinsics = data_dict.intrinsics[:max_length]
+    progress(0.15, desc="Generating multiview images…")
+    with torch.no_grad():
+        output = pipeline(
+            data_dict=data_dict,
+            num_inference_steps=DEFAULT_NUM_STEPS,
+            guidance_scale=DEFAULT_CFG_SCALE,
+            flow_shift=1.0,
+            output_type="pil",
+        )
+    images_np = [np.array(img) for img in output["images"]]
+    progress(0.55, desc="Lifting to 3D Gaussian splat…")
+    output_dir = tempfile.mkdtemp(prefix="ah_out_")
+    offload_ok = False
+    try:
+        if torch.cuda.is_available():
+            for name in ("vae", "transformer", "image_encoder"):
+                m = getattr(pipeline, name, None)
+                if m is not None:
+                    m.to("cpu")
+            pipeline.to("cpu")
+        offload_ok = True
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        fov = float(data_dict.fovs[0].item())
+        dist = float(data_dict.dists[0].item())
+        lwh = data_dict.lwh if hasattr(data_dict, "lwh") and data_dict.lwh is not None else [1.0, 1.0, 1.0]
+        with torch.no_grad():
+            gaussians = lifter.run_lifting(images_np, fov, dist, lwh)
+        progress(0.85, desc="Rendering orbit views of the lifted splat…")
+        with torch.no_grad():
+            rendered = lifter.render_orbit_views(gaussians, fov, dist, lwh)
+        rendered_np = [im.permute(1, 2, 0).cpu().numpy() for im in rendered]
+        orbit_mp4 = os.path.join(output_dir, "lifting.mp4")
+        _encode_mp4(rendered_np, orbit_mp4)
+        progress(0.95, desc="Saving Gaussian splat…")
+        ply_path = os.path.join(output_dir, "gaussians.ply")
+        lifter.save_ply(gaussians, ply_path)
+    finally:
+        if offload_ok and torch.cuda.is_available():
+            for name in ("vae", "transformer", "image_encoder"):
+                m = getattr(pipeline, name, None)
+                if m is not None:
+                    m.to(device)
+            pipeline.to(device)
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+    progress(1.0, desc="Done")
+    return orbit_mp4, ply_path
+HEADER_MD = """
+## Image to 3D Asset with [Asset Harvester](https://github.com/NVIDIA/asset-harvester)
+[**Paper**](https://arxiv.org/abs/2604.18468) | [**Project Page**](https://research.nvidia.com/labs/sil/projects/asset-harvester/) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
+**Upload a single image of one object — a vehicle, pedestrian, cyclist, or other road object — to generate a 3D Gaussian splat asset. The assumed inputs are images cropped and rectified from AV datasets, like the example images below. However, you can also challenge the model with internet photos.**
+The inference pipeline consists of:
+- **Object Segmentation** — isolates the object from the background.
+- **Camera Estimation** — predicts the viewing direction, distance, field of view, and object dimensions.
+- **Multiview Diffusion** — generates 16 novel orbit views.
+- **3D Lifting** — reconstructs the generated views into a 3D Gaussian splat (downloadable PLY).
+"""
+def build_ui():
+    theme = gr.themes.Default(primary_hue="green", neutral_hue="slate")
+    app_css = """
+    /* Base typography */
+    .gradio-container { font-size: 20px !important; }
+    .gradio-container .prose p, .gradio-container .prose li,
+    .gradio-container .md p,    .gradio-container .md li { font-size: 1.2rem !important; line-height: 1.6 !important; }
+    .gradio-container .prose h2, .gradio-container .md h2 { font-size: 2rem !important; }
+    .gradio-container .block-label, .gradio-container button { font-size: 1.1rem !important; }
+    /* Fluid media — images/videos fill their column, keep aspect ratio */
+    .gradio-container .image-container img,
+    .gradio-container .video-container video { max-width: 100% !important; max-height: 100% !important;
+                                                 width: auto !important; height: auto !important;
+                                                 object-fit: contain !important; }
+    .gradio-container .image-container, .gradio-container .video-container
+        { display: flex !important; align-items: center !important; justify-content: center !important; }
+    /* Narrow viewports: let columns wrap instead of cramming */
+    @media (max-width: 1024px) {
+        .gradio-container .prose h2, .gradio-container .md h2 { font-size: 1.7rem !important; }
+        .gradio-container .prose p, .gradio-container .prose li,
+        .gradio-container .md p,    .gradio-container .md li { font-size: 1.1rem !important; }
+    }
+    @media (max-width: 720px) {
+        .gradio-container { font-size: 18px !important; }
+        /* Force columns in the main Row to take full width, stack vertically */
+        .gradio-container .grid-wrap { grid-template-columns: 1fr !important; }
+    }
+    """
+    with gr.Blocks(title="Asset Harvester", css=app_css) as demo:
+        gr.Markdown(HEADER_MD)
+        image_in = gr.Image(
+            label="Image Prompt",
+            type="pil",
+            height=360,
+            sources=["upload", "clipboard"],
+            render=False,
+        )
+        examples_dir = os.path.join(os.path.dirname(__file__), "examples")
+        all_examples = [
+            [os.path.join(examples_dir, f)]
+            for f in sorted(os.listdir(examples_dir))
+            if f.lower().endswith((".jpeg", ".jpg", ".png"))
+        ]
+        with gr.Row():
+            with gr.Column(scale=2, min_width=200):
+                examples_ds = gr.Dataset(
+                    components=[image_in],
+                    samples=all_examples,
+                    samples_per_page=18,
+                    label="Example images",
+                )
+            with gr.Column(scale=4, min_width=360):
+                image_in.render()
+                gr.Markdown(
+                    "**Notes:**\n\n"
+                    "* **For best results, please upload clear, object-centric images "
+                    "where the camera is level with the object, similar to rectified "
+                    "ego-viewpoint images in our AV setting.**\n"
+                    "* The uploaded images are screened with "
+                    "[Llama Guard 3 Vision](https://huggingface.co/meta-llama/Llama-Guard-3-11B-Vision) "
+                    "to filter out harmful content."
+                )
+                run_btn = gr.Button("Generate 3D Asset", variant="primary")
+                gr.Markdown(
+                    "<p style='font-size: 1rem; margin: 0.5rem 0;'>"
+                    "<b>Disclaimer:</b> Asset Harvester is trained for the AV domain, "
+                    "and its performance is not guaranteed on arbitrary images."
+                    "</p>"
+                )
+            with gr.Column(scale=5, min_width=400):
+                mask_out = gr.Image(
+                    label="Object Segmentation",
+                    type="pil",
+                    height=400,
+                )
+                video_out = gr.Video(
+                    label="3D Gaussian Splat — Orbit Render",
+                    height=400,
+                    autoplay=True,
+                    loop=True,
+                )
+                ply_out = gr.DownloadButton(
+                    label="Download PLY",
+                )
+        stage_state = gr.State()
+        is_example = gr.State(False)
+        def _pick_example(sample):
+            return sample[0] if isinstance(sample, (list, tuple)) else sample
+        examples_ds.click(
+            _pick_example, inputs=examples_ds, outputs=image_in
+        ).then(lambda: True, outputs=is_example)
+        image_in.input(lambda: False, outputs=is_example)
+        image_in.clear(lambda: False, outputs=is_example)
+        def _shuffled_examples():
+            shuffled = all_examples.copy()
+            random.shuffle(shuffled)
+            return gr.update(samples=shuffled)
+        demo.load(_shuffled_examples, inputs=None, outputs=examples_ds)
+        run_btn.click(
+            fn=run_segmentation,
+            inputs=[image_in, is_example],
+            outputs=[mask_out, stage_state],
+            show_progress="full",
+            show_progress_on=[mask_out],
+            concurrency_id="seg",
+            concurrency_limit=2,
+        ).then(
+            fn=run_3d,
+            inputs=[stage_state],
+            outputs=[video_out, ply_out],
+            show_progress="full",
+            concurrency_id="gpu3d",
+            concurrency_limit=1,
+        )
+    demo.queue(default_concurrency_limit=1, max_size=30)
+    return demo, theme
+def _prefetch_all(device: str) -> None:
+    """Warm checkpoints and load the main pipeline models into memory at startup.
+    Image guard (Llama Guard 3 Vision) weights are prefetched to disk cache only —
+    they are load/unloaded per-call because the model is large (~22 GB on GPU).
+    """
+    logger.info("Prefetching asset-harvester checkpoints...")
+    _download_checkpoints()
+    logger.info("Loading pipeline / AHC / segmentation / TokenGS into memory...")
+    _load_models(device)
+    logger.info("Prefetching Llama Guard 3 Vision weights to disk cache...")
+    try:
+        snapshot_download(
+            repo_id="meta-llama/Llama-Guard-3-11B-Vision",
+            allow_patterns=["*.json", "*.safetensors", "*.txt", "*.model", "tokenizer*"],
+            token=os.getenv("HF_TOKEN"),
+        )
+        logger.info("Image guard weights cached.")
+    except Exception as e:
+        logger.warning(
+            "Could not prefetch Llama Guard weights (will download on first safety check): %s", e,
+        )
+    logger.info("Startup prefetch complete.")
+if os.getenv("AH_PREFETCH", "1") == "1":
+    _startup_device = "cuda" if torch.cuda.is_available() else "cpu"
+    _prefetch_all(_startup_device)
+demo, _theme = build_ui()
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860, max_threads=40, theme=_theme)

dist/asset_harvester-1.0.0-py3-none-any.whl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60004f835c196d9e9a3e3cf017aede4c6b3c189c0d2bf6578a449c15f85bc3f0
+size 198831

examples/VRU_pedestrians_0d7b602f2da8c364.jpeg ADDED Viewed

Git LFS Details

SHA256: fd1bc84e3ac16aba8a467aef68aa913c1213e85f0c2965d5a8004539a55d8d55
Pointer size: 130 Bytes
Size of remote file: 35.3 kB

examples/VRU_pedestrians_723ce847bf6b1671.jpeg ADDED Viewed

Git LFS Details

SHA256: 459cd0d2b4b2f5230e9be6ec17aa3278c8591e1b773f3c854c73164bd4ea5c6e
Pointer size: 130 Bytes
Size of remote file: 55.2 kB

examples/VRU_pedestrians_c2d728e02d4d11cc.jpeg ADDED Viewed

Git LFS Details

SHA256: 71b4020e15266419aa12ab03b1da779b8660f623039b68fdd39a1772873b6329
Pointer size: 130 Bytes
Size of remote file: 73 kB

examples/automobile_00c7f5b5caa9e7d0.jpeg ADDED Viewed

Git LFS Details

SHA256: 967a6fb3609b654b9cd692bcf4ba6fb06d26c2aa2fe1033c2054f8c32d9f6b78
Pointer size: 130 Bytes
Size of remote file: 62.4 kB

examples/automobile_00e617a279b7f517.jpeg ADDED Viewed

Git LFS Details

SHA256: 323abaa108277b6926cd7d11b2db8649a05a1f47f9937c6d52f534518fa7ebfb
Pointer size: 130 Bytes
Size of remote file: 82 kB

examples/automobile_00e9ab349b437b2c.jpeg ADDED Viewed

Git LFS Details

SHA256: c3670de572e9545f4c353dc9f3c80e582a7b9784b744dd8a701cccc2d9a235fb
Pointer size: 130 Bytes
Size of remote file: 52.2 kB

examples/automobile_03271db9979f6072.jpeg ADDED Viewed

Git LFS Details

SHA256: 3a913f4131679bb35e945ed8685b6d2a901be23d5bff2a3248bd98bfab22c13e
Pointer size: 130 Bytes
Size of remote file: 87.5 kB

examples/automobile_039b7b7af4bd853b.jpeg ADDED Viewed

Git LFS Details

SHA256: 4fdaa867ff9639d3bb2128f53e59fe5816ae5e12056163d2715be1e3cf7b46f7
Pointer size: 130 Bytes
Size of remote file: 40.4 kB

examples/automobile_044dfeb890d95741.jpeg ADDED Viewed

Git LFS Details

SHA256: 4fa8300d4a2b72cba2d21f6bcb899e9558453220348327b3f19f904998c775cb
Pointer size: 130 Bytes
Size of remote file: 47.1 kB

examples/automobile_04acf10a71d112a1.jpeg ADDED Viewed

Git LFS Details

SHA256: 676fc216dff74ff8ef4a4a494058698b6fe7a42e9dfe99f471a60b6ff12e1bfd
Pointer size: 130 Bytes
Size of remote file: 29.2 kB

examples/automobile_04cbe39ba786858d.jpeg ADDED Viewed

Git LFS Details

SHA256: dc46c147dd4ea1973f3143bbe6a1ee59e0c32b8106d4a46c423d9a50209d1a58
Pointer size: 130 Bytes
Size of remote file: 52.5 kB

examples/automobile_05abef8311f6ca8c.jpeg ADDED Viewed

Git LFS Details

SHA256: 285838369c462af507306a7c3ea9f5d2721a943a7e0d118c0f313baae842485f
Pointer size: 130 Bytes
Size of remote file: 37.2 kB

examples/automobile_0650ef1d75757b0e.jpeg ADDED Viewed

Git LFS Details

SHA256: 7a71d7eebef691fac2d27f9b4f0f6be2c51f37e0e51e55ae2d50415d2ec424f6
Pointer size: 130 Bytes
Size of remote file: 58 kB

examples/automobile_0742aaf29c0a7090.jpeg ADDED Viewed

Git LFS Details

SHA256: 320001861f8dfd22418f22b7edd58e0cf73b84d374c794568482a0b3f345ca64
Pointer size: 130 Bytes
Size of remote file: 55.3 kB

examples/automobile_07bf69847a2eae86.jpeg ADDED Viewed

Git LFS Details

SHA256: de412dfb26764e370d439201dc97262eeabdd5921a958dc9e711dfc0db165c54
Pointer size: 130 Bytes
Size of remote file: 91.4 kB

examples/automobile_095cdc57d3186c66.jpeg ADDED Viewed

Git LFS Details

SHA256: 5838e25aa5b31ff5483bc67bb361d59388a515fbd82b497a6ca6b856447a17b5
Pointer size: 130 Bytes
Size of remote file: 76.8 kB

examples/automobile_0a5ccea0b758dd89.jpeg ADDED Viewed

Git LFS Details

SHA256: 736aa09dcedda9548c0903e8288cc86bd51ec810054f25c9acbbabfe885f80ab
Pointer size: 130 Bytes
Size of remote file: 66.8 kB

examples/automobile_0d21d1c69e594ca7.jpeg ADDED Viewed

Git LFS Details

SHA256: 37939e0da91ca0b2a8b460155bf6abfb8c23d1eeab4bb77f5c4ce6a58200bab5
Pointer size: 130 Bytes
Size of remote file: 85.7 kB

examples/automobile_0fc4baf8c34411e8.jpeg ADDED Viewed

Git LFS Details

SHA256: 9338d2f9deb5d4c876b4a99f04833c1bd2d21052c21225eebf49d3ffa57fe07f
Pointer size: 130 Bytes
Size of remote file: 59.9 kB

examples/automobile_125e8d7a5a5ab518.jpeg ADDED Viewed

Git LFS Details

SHA256: a75d6e386d060c5cbceac09f1abb6050a3e0c15255db9b9e714bff6f5e3d01b4
Pointer size: 130 Bytes
Size of remote file: 53.6 kB

examples/automobile_13ee50f6c1e8e494.jpeg ADDED Viewed

Git LFS Details

SHA256: 8fd68c8c8d5a442d22052f4a31da8780ac363e515453c1fd3421fbb3baa4b54c
Pointer size: 130 Bytes
Size of remote file: 62.7 kB

examples/automobile_14030c6da90d58a8.jpeg ADDED Viewed

Git LFS Details

SHA256: 9a1d8d347aea5c90b72c5f74b835033a541b8935fe9da67d01858cd0446fe299
Pointer size: 130 Bytes
Size of remote file: 60.4 kB

examples/automobile_14586bfbf8da0dd7.jpeg ADDED Viewed

Git LFS Details

SHA256: a1511f8d53d8f50cb9f0d667e3bb2f636d7ff824a492fcc6e39521f43ca516f0
Pointer size: 130 Bytes
Size of remote file: 36.2 kB

examples/automobile_14b077380d557c2b.jpeg ADDED Viewed

Git LFS Details

SHA256: ae2c730c619d4e53d95da956f585d7e660851f9c714e564e7ba954e3f74129da
Pointer size: 130 Bytes
Size of remote file: 62.8 kB

examples/automobile_1585b4e264e88112.jpeg ADDED Viewed

Git LFS Details

SHA256: 2ef36e4f6bbe35b0d9d5022342b88c26335c5c8a8e2bd10c6a32adf806c0930a
Pointer size: 130 Bytes
Size of remote file: 53.1 kB

examples/automobile_1704da3176d628b1.jpeg ADDED Viewed

Git LFS Details

SHA256: f3093127f2ed8f2c180b8f320ac57252cd2bf45106a1d1a5547538f1f4d28367
Pointer size: 130 Bytes
Size of remote file: 62.6 kB

examples/automobile_17289d1f0904c980.jpeg ADDED Viewed

Git LFS Details

SHA256: 9043235ef19ff5d37dc74e584cee52b70771fff98f4a07a0e8a45dc3dc9a39f0
Pointer size: 130 Bytes
Size of remote file: 68.6 kB

examples/automobile_1875f1efbced2624.jpeg ADDED Viewed

Git LFS Details

SHA256: e86c26f49b8a27cfa2db6030dff20eac429a1e67ff165b7dfd1ec23603b13eb2
Pointer size: 130 Bytes
Size of remote file: 89 kB

examples/automobile_18f3d87e7b85d808.jpeg ADDED Viewed

Git LFS Details

SHA256: 0a4080258b3763cc8899d76f3456bdd7ed994ff75a3250c9403c60a872215e9a
Pointer size: 130 Bytes
Size of remote file: 54.5 kB

examples/automobile_191dbee26f68e8ca.jpeg ADDED Viewed

Git LFS Details

SHA256: 79b884eb0cfe757975bff4f3e44bdcfcf4c07420a83c6fb15c306572169bd0c2
Pointer size: 130 Bytes
Size of remote file: 66 kB

examples/automobile_1a001d763cacdaa6.jpeg ADDED Viewed

Git LFS Details

SHA256: d2bff6fa243ca06143b485e5e50173e98a47fd6eb7f781c7571ee0043c69aaae
Pointer size: 130 Bytes
Size of remote file: 46.3 kB

examples/automobile_1b42127109e81f09.jpeg ADDED Viewed

Git LFS Details

SHA256: fb81615a64b8b056770e8327470b72696849db9f36419e8658d8282bf3636934
Pointer size: 130 Bytes
Size of remote file: 77.8 kB

examples/automobile_1bdd779f1bc8a22e.jpeg ADDED Viewed

Git LFS Details

SHA256: 8cd3d12a67375509aa30785f8e7251955db78e78c3eb8c1ce786ba88ef35c5e6
Pointer size: 130 Bytes
Size of remote file: 64.9 kB

examples/automobile_1ca3d3c4b08fa14a.jpeg ADDED Viewed

Git LFS Details

SHA256: 36a1648e3be9758d382f7c945dad6ce3e1be7af4470b683a4f3217254f5d8657
Pointer size: 130 Bytes
Size of remote file: 47.3 kB

examples/automobile_1ee187f725a01351.jpeg ADDED Viewed

Git LFS Details

SHA256: 68e88e01f7418491d2ad3949eeef169bc4e4076619b8bc6f37ec3df18a3d1e76
Pointer size: 130 Bytes
Size of remote file: 64 kB

examples/automobile_2054eba237562446.jpeg ADDED Viewed

Git LFS Details

SHA256: 4f3e4aad20b57f6b70471a1052eeaa0c1ee90fbcad51caf93ccbc50cf0654154
Pointer size: 130 Bytes
Size of remote file: 65.6 kB

examples/automobile_23a0a9163760a5c1.jpeg ADDED Viewed

Git LFS Details

SHA256: dd31f42ef02d0a4a6d5f344427cd3f536e82736f804160b31e3635327fce16e8
Pointer size: 130 Bytes
Size of remote file: 63.1 kB

examples/automobile_2516c24ad21db02a.jpeg ADDED Viewed

Git LFS Details

SHA256: 433a3507c37acfb4a2e3d65d738518696fcfbc8f1ff2997e27b9ef2a58de10bc
Pointer size: 130 Bytes
Size of remote file: 66.4 kB

examples/automobile_27a5512681e556e2.jpeg ADDED Viewed

Git LFS Details

SHA256: c33f68510b77ffd8558dfcda9431330698ea66ae73ad4c09f83f77ef16129e82
Pointer size: 130 Bytes
Size of remote file: 91.3 kB

examples/automobile_292e62704c0e9f28.jpeg ADDED Viewed

Git LFS Details

SHA256: ebaf8d56e8f47654b74edd0d65b384579e19a56c5e351fd73674cc6be6cdf230
Pointer size: 130 Bytes
Size of remote file: 64.9 kB

examples/automobile_29dfaf2cdbc2385a.jpeg ADDED Viewed

Git LFS Details

SHA256: a728c7def0dab764fcda10ae6fffb9fd81764ea48c685990d309e5126b906ad4
Pointer size: 130 Bytes
Size of remote file: 73.8 kB

examples/automobile_2bf70aae9df266eb.jpeg ADDED Viewed

Git LFS Details

SHA256: ce14d120534d56fb961d7d70a8eadf346bd0e5846e4cd91907990168f33eb218
Pointer size: 130 Bytes
Size of remote file: 72.6 kB

examples/automobile_2cab9afbbcc99c9e.jpeg ADDED Viewed

Git LFS Details

SHA256: 88e7d49bd9fee72653e94592959c36e8b0111f940635c87cacfe981d2515dc61
Pointer size: 130 Bytes
Size of remote file: 58.9 kB

examples/automobile_30a50721545fbffe.jpeg ADDED Viewed

Git LFS Details

SHA256: 05a7e66da22e7f05f5ef745e1bc35cac1060902aa6012c63e1ec0c9f7abc7338
Pointer size: 130 Bytes
Size of remote file: 85.9 kB