OpenADMET PXR Induction Blind Challenge — v43 Final Blend

Track: Activity Prediction Best leaderboard RAE: 0.586 (rank ~40 / 211, top 19%) Team: BioInfo / RyeCatcher Author: Justin Johnson justin@rundatarun.io Methodology report: docs/METHODOLOGY_REPORT.pdf

This repository contains the full pipeline, trained model checkpoints, out-of-fold predictions, and final submission for the OpenADMET PXR Induction Blind Challenge, held March–July 2026.

TL;DR

v43 = 0.78 · v31 + 0.22 · T1v5, isotonic-calibrated per fold.

v31 is itself a cascade: v7 → v22 → v26 → v28 → v29 → v31, blending five Chemprop multitask models (T1, T1v5, T1v2, T13, T14), an AutoGluon-CheMeleon tabular model (T2), and two LightGBM baselines (v3, v4) on FCFP4-count + Mordred + CheMeleon-2048d features.
T1v5 is a 5-head Chemprop D-MPNN multitask model trained jointly on the OpenADMET DRC, the counter-screen, NCATS qHTS PXR (AID 1346982 / 1346985), Tox21 SR-ARE, and ChEMBL NR1I2.
Final calibration is honest per-fold isotonic regression — fit on out-of-fold predictions only, leave-one-fold-out. Isotonic fit on full training OOFs over-predicts improvement by ~0.01 RAE; honest fit was the gate that prevented submitting overfit garbage.

Repository layout

.
├── README.md                          # this card
├── docs/
│   ├── METHODOLOGY_REPORT.md          # required contest methodology report (source)
│   └── METHODOLOGY_REPORT.pdf         # rendered PDF
├── code/
│   ├── requirements.txt               # pinned deps
│   ├── featurization/                 # CheMeleon, Mordred, FCFP4 wrappers
│   ├── baseline/                      # v2 XGBoost, v3/v4 LightGBM
│   ├── multitask/                     # T1, T1v5 (Chemprop), T2 (AutoGluon)
│   ├── ensemble/                      # v26, v31, v43 cascade blends
│   ├── submit/                        # Gradio API submission script
│   └── utils/                         # validation utilities
├── models/
│   ├── T1_chemprop/                   # 5-seed multitask Chemprop (12 MB)
│   ├── T1v5_chemprop/                 # 5-fold + pretrained 5-head Chemprop (9.5 MB)
│   └── T2_autogluon_chemeleon/        # AutoGluon-tabular on CheMeleon emb (1.6 GB)
├── data/
│   ├── raw/                           # OpenADMET-provided CSVs (also at openadmet/pxr-challenge-train-test)
│   ├── parent_clusters_fcfp4.csv      # FCFP4-NN parent-cluster assignment (LOCO grouping)
│   └── oof_predictions/               # per-track OOF + test predictions (inputs to v43 blend)
└── submission/
    ├── v43_final.parquet              # 513-row final submission
    ├── v43_final.csv
    └── v43_final.json                 # manifest with CV metrics + LB outcome

Reproducing v43 from the OOF predictions

Fastest path — the blend itself is pure numpy on OOFs we've published:

git clone https://huggingface.co/RyeCatcher/openadmet-pxr-challenge-2026
cd openadmet-pxr-challenge-2026
python -m venv .venv && source .venv/bin/activate
pip install -r code/requirements.txt
python code/ensemble/v43_final_blend.py    # produces submission/v43_final.parquet

This recomputes v43_final.parquet from the stored OOF CSVs in data/oof_predictions/.

Reproducing the trained models from scratch

Heavier path — full retraining of every component:

Pull the OpenADMET data: data/raw/pxr-challenge_TRAIN.csv, pxr-challenge_TEST_BLINDED.csv, the counter-screen and single-concentration files. (Also at openadmet/pxr-challenge-train-test.)
Generate features: python code/featurization/featurize_mordred.py and code/featurization/featurize_chemeleon.py.
Train baselines: python code/baseline/v3_v4_lgbm.py → produces v3 (Butina-LOCO LGBM) and v4 (kitchen-sink LGBM) OOFs.
Train multitask models: code/multitask/T1_chemprop_external_pretrain.py, T1v5_chemprop_chembl_nr1i2.py, T2_autogluon_chemeleon.py.
Run cascade: code/ensemble/v26_cascade_blend.py → v31_v29_t2_blend.py → v43_final_blend.py.

Expect a few hours of GPU time on a single ~A100/H100-equivalent and several hours of CPU time for AutoGluon and Mordred featurization. Methodology report has the full hyperparameter list.

Final-leaderboard metrics (live leaderboard, partial test set)

Metric	v43 (this model)	Top-1 (Yan)
RAE	0.586	0.496
MAE	0.467	0.394
RMSE	0.542	0.46
Spearman	0.802	0.82
Kendall	0.615	0.63

Reported on the live (partial blind) leaderboard as of 2026-05-10. Final blinded leaderboard publishes after challenge close (Phase 1: 2026-05-25, Phase 2 + Structure track: 2026-07-01).

What worked

Diversity over scale. A 300-parameter Chemprop blended with a 4,900-feature LightGBM beat single models 10× larger. The error modes of D-MPNN graph models and gradient-boosted trees on descriptors are decorrelated; combining them captures variance neither covers alone.
Honest per-fold isotonic. In-sample isotonic fit on full OOFs is optimistic by ~0.009 RAE; chained iso-on-iso compounds at ~0.01 per step. Per-fold (leave-one-fold-out) isotonic was the only protocol that produced CV→LB shifts consistent with the actual leaderboard.
Parent-cluster LOCO. Random folds leak parent-series into validation. The test set is 513 analogs of ~89 parent compounds; FCFP4-NN-based grouping reproduces that structure for CV.
External multitask aux. ChEMBL NR1I2 (907 compounds, T1v5 head 5) and NCATS qHTS PXR (AID 1346982 / 1346985, ~10K compounds, T1 heads 2-3) added measurable lift even though individual heads had Spearman ≤ 0.4 against the primary task. Multitask transfer works through shared representations, not direct correlation.

What did not work (in case it saves you time)

NN-transfer of any biological-assay readout from train compounds to test. This caused a catastrophic LB regression (v50: OOF 0.4536 → LB 0.658, rank 87). Test compounds include "SAR misses" — high structural similarity to active train compounds but biologically inactive. Nearest-neighbor copying of the train assay readout assigns high activity to inactive misses and overshoots pEC50.
TabPFN on CheMeleon. Spearman 0.93 against v43 → no orthogonal signal. Foundation-model embeddings dominate the representation regardless of the tabular learner on top.
ChemBERTa, GIN-ACtriplet, UniMolV2-310M, MaskMol, ImageNet ViT on molecular images. All converged to Spearman > 0.88 against v43. The 2D-graph representation space is saturated for this task and this data size.
Tail-weighted loss / quantile regression heads. Cannot fix the data ceiling: only 11 training compounds have pEC50 ≥ 6.5.

License

Apache 2.0 for code, model weights, and predictions. OpenADMET-provided data in data/raw/ is redistributed under the OpenADMET license (Apache 2.0, see original dataset). External datasets referenced in the methodology report (Tox21, NCATS qHTS, ChEMBL, BindingDB) are under their respective public licenses.

No proprietary or confidential data was used.

Citation

@misc{johnson2026openadmetpxr,
  author = {Johnson, Justin},
  title  = {OpenADMET PXR Induction Blind Challenge — v43 Final Blend (rank 40/211)},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/RyeCatcher/openadmet-pxr-challenge-2026},
}

Acknowledgements

Thanks to the OpenADMET team (Open Molecular Software Foundation) for the assay data, the public leaderboard, the structural cleanup, and a contest design that rewards methodology over compute. Thanks to the Chemprop, AutoGluon, RDKit, and CheMeleon authors for the open-source primitives that made this possible in two weeks.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train RyeCatcher/openadmet-pxr-challenge-2026

Evaluation results

Relative Absolute Error (LB) on OpenADMET PXR Induction Blind Challenge (Activity track)
self-reported

0.586
Mean Absolute Error (LB) on OpenADMET PXR Induction Blind Challenge (Activity track)
self-reported

0.467
Spearman correlation (LB) on OpenADMET PXR Induction Blind Challenge (Activity track)
self-reported

0.802