You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

BrainScope Disease Fine-Tuned scGPT

This repository contains our disease-adapted fine-tuned scGPT model for brain single-cell / single-nucleus RNA-seq analysis.

It is intended to be used with the companion pipeline repository:

Pipeline repo: YOUR_USERNAME/brainscope-scgpt-pipeline

Model summary

This model starts from the original scGPT backbone and is then fine-tuned on disease-related brain single-cell / single-nucleus RNA-seq data for downstream annotation workflows.

This packaged release is intended for:

disease-aware cell-type annotation
embedding generation
comparison against the original scGPT baseline
downstream error analysis and reproducible model sharing

Data context

This release is associated with workflows built on:

LIBD for smaller pilot experiments and rapid iteration
BrainScope for larger-scale disease-focused fine-tuning and evaluation

The goal of this model family is to improve robustness on disease-altered cell states relative to healthy-only baselines.

Included files

Typical contents of this repository include:

model.pt
config.yaml
preprocessing.json
vocab.json
label_map.json
metrics.json
requirements.txt
inference.py
small example input / output files

Intended use

This model is intended for:

disease-aware annotation of sc/snRNA-seq data
controlled comparisons with the original scGPT baseline
reproducible research workflows on brain disease datasets

This release is for research use only and is not a clinical model.

Example usage

Download with the Hub

from huggingface_hub import snapshot_download

repo_dir = snapshot_download("YOUR_USERNAME/brainscope-scgpt-disease")
print(repo_dir)

Run through the pipeline

python -m brainscope_scgpt annotate   --input data/query.h5ad   --model-repo YOUR_USERNAME/brainscope-scgpt-disease   --output results/query_annotated.h5ad   --mode small

Large dataset mode:

python -m brainscope_scgpt annotate   --input data/brainscope_full.h5ad   --model-repo YOUR_USERNAME/brainscope-scgpt-disease   --output results/brainscope_full_annotated.h5ad   --mode large

Evaluation

Please fill in the exact benchmark numbers you want visible in the public model card.

Suggested structure:

Main metrics

Accuracy:
Precision:
Recall:
Macro F1:

Benchmark setting

Train / validation / test split:
Label space:
Small or large mode:
Any freeze / unfreeze strategy:
Whether MoE was used:

If this release corresponds specifically to one of your final selected models, state that explicitly here.

Comparison to the original baseline

This model is intended to be compared against:

original scGPT baseline
MoE-enhanced variants
alternative architectures such as Mamba-based approaches

Suggested points to summarize here after finalization:

which cell types improved the most
which confusions remained
whether disease-aware fine-tuning improved performance on disease-shifted cells

Limitations

Performance depends on preprocessing consistency and gene-vocabulary alignment.
Performance may change if label definitions differ across datasets.
This repository does not include all large intermediate artifacts used during training.
Reference mapping still depends on an external FAISS index if you use the RM workflow.
This is a research model and not validated for clinical use.

Citation

Please cite the original scGPT paper and your project paper when available.

@article{cui2024scgpt,
  title={scGPT: toward building a foundation model for single-cell multi-omics using generative AI},
  author={Cui, Haotian and Wang, Chloe and Maan, Hassaan and others},
  journal={Nature Methods},
  year={2024}
}

Contact

Yuesong Huang
University of Rochester
Email: yhu116@ur.rochester.edu

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support