License

This release contains CheXanatomy adapter weights for a PaliGemma-based model. Use of these weights is subject to:

  • the license and usage terms of the underlying PaliGemma/Gemma base model
  • the license and terms governing the training data and derived data pipeline
  • the license terms in the CheXanatomy code repository for accompanying code

This release is provided for research use.

CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs

This repository contains the released CheXanatomy model for anatomy-aware vision-language modeling on chest radiographs.

CheXanatomy is described in:

CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs
Sergios Gatidis, Curtis Langlotz, Christian Bluethgen

Model Description

CheXanatomy augments a pretrained vision-language model with explicit anatomical supervision in token space. Instead of introducing task-specific decoder heads, the model is trained autoregressively to generate anatomical localization and segmentation outputs as structured tokens.

The model supports anatomy-aware tasks such as:

  • anatomical detection
  • bounding-box generation
  • anatomical segmentation
  • anatomy token identification
  • transfer to related localization tasks

The training approach uses synthetic chest radiographs generated from CT volumes, with forward-projected anatomical labels providing anatomically consistent 2D supervision.

Intended Use

This model is intended for research use in:

  • anatomy-aware medical vision-language modeling
  • chest radiograph localization
  • chest radiograph anatomical segmentation
  • representation learning for radiology imaging

This release is not intended for:

  • direct clinical deployment
  • autonomous diagnosis
  • unsupervised medical decision-making

Training Data

CheXanatomy uses synthetic chest radiograph training data derived from CT volumes. Synthetic chest radiographs can be generated from the CT-RATE dataset together with the CheXsynth pipeline:

The CheXanatomy code repository is available at:

Model Inputs and Outputs

The model takes:

  • a chest radiograph image
  • a textual prompt

Example prompts:

  • detect heart
  • segment left lung
  • segment aorta
  • caption <loc0400><loc0312><loc0703><loc0625>

The model generates autoregressive token outputs that may include:

  • location tokens such as <loc0123>
  • segmentation tokens such as <seg045>
  • anatomical labels

Segmentation masks may require postprocessing and token decoding using the utilities released in the CheXanatomy code repository.

Usage

Example

from pathlib import Path
import torch
from peft import PeftConfig, PeftModel
from PIL import Image
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor

model_id = "StanfordAIMI/chexanatomy-paligemma-10b-224"

peft_config = PeftConfig.from_pretrained(model_id)
base_model_id = peft_config.base_model_name_or_path

base_model = PaliGemmaForConditionalGeneration.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(base_model, model_id)
processor = PaliGemmaProcessor.from_pretrained(base_model_id)

image = Image.open("ct.png").convert("RGB")
prompt = "segment left lung"

inputs = processor(image, prompt, return_tensors="pt")

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

decoded = processor.decode(outputs[0], skip_special_tokens=True)
print(decoded)
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for StanfordAIMI/chexanatomy-paligemma-10b-224

Adapter
(32)
this model

Space using StanfordAIMI/chexanatomy-paligemma-10b-224 1