| # Methodology Report β OpenADMET PXR Induction Blind Challenge |
|
|
| **Team:** BioInfo (RyeCatcher) |
| **Track:** Activity Prediction |
| **Best Submission:** v43 (LB RAE = 0.586, rank ~40/211 as of 2026-05-10) |
| **Contact:** justin@rundatarun.io |
| **Code, models, OOF predictions, and submission:** https://huggingface.co/RyeCatcher/openadmet-pxr-challenge-2026 |
| **Report version:** 1.1 (2026-05-19) |
|
|
| --- |
|
|
| ## 1. Overview |
|
|
| Our approach is an ensemble of three model families: (1) a Chemprop D-MPNN multitask model pretrained on external ADMET data, (2) a gradient-boosted decision tree ensemble on rich molecular descriptors, and (3) an AutoGluon-tabular model on CheMeleon foundation-model embeddings. Final predictions are produced via a weighted blend with isotonic calibration. |
|
|
| --- |
|
|
| ## 2. Data Used |
|
|
| ### 2.1 OpenADMET Provided Data |
|
|
| We used all provided data sources: |
|
|
| | Source | Size | Usage | |
| |---|---|---| |
| | DRC pEC50 (train) | 4,139 compounds | Primary regression target | |
| | Counter-screen pEC50 | 2,647 compounds (subset of train) | Multitask Head 2 | |
| | Single-concentration log2fc | 21,003 rows (10,870 unique compounds) | Exploratory multitask (not in final ensemble) | |
| | Test (blinded) | 513 compounds | Prediction target | |
|
|
| ### 2.2 External Data |
|
|
| We incorporated the following public datasets for pretraining and auxiliary multitask learning: |
|
|
| | Dataset | Source | Size | Usage | |
| |---|---|---|---| |
| | NCATS qHTS PXR (AID 1346982) | PubChem | 9,671 compounds | Binary activity classification head | |
| | NCATS qHTS LogAC50 (AID 1346985) | PubChem | 2,458 compounds | Regression head | |
| | Tox21 SR-ARE | MoleculeNet/DeepChem | ~8,000 compounds | Binary classification head (SR-PXR was unavailable in this release) | |
| | ChEMBL NR1I2 (PXR) | ChEMBL | 907 compounds with pchembl_value | Multitask Head 5 in T1v5 | |
| | BindingDB PXR (UniProt O75469) | BindingDB REST API | 364 novel compounds | Exploratory pretrain data | |
| |
| **No proprietary data was used.** All external data was deduplicated against the OpenADMET train and test sets using canonical isomeric SMILES and InChIKey. |
| |
| --- |
| |
| ## 3. Feature Engineering |
| |
| ### 3.1 Molecular Fingerprints |
| |
| - **FCFP4-count-1024**: RDKit `MorganGenerator` with `MorganFeatureAtomInvGen`, radius=2, fpSize=1024, count fingerprint. This outperformed binary ECFP4 and MACCS in our coverage analysis (90.6% test-to-train Tanimoto β₯ 0.4 coverage). |
| |
| ### 3.2 Molecular Descriptors |
| |
| - **RDKit descriptors**: 217 descriptors from `useful_rdkit_utils.get_rdkit_desc_names()` including physicochemical properties, topological indices, and fragment counts. |
| - **Mordred**: 1,613 2D/3D descriptors computed with `mordredcommunity`. |
| - **CheMeleon embeddings**: 2,048-dimension frozen embeddings from the CheMeleon foundation model (Recursion Pharma), extracted via the OpenADMET toolkit. |
|
|
| ### 3.3 Learned Embeddings |
|
|
| - **Chemprop D-MPNN fingerprints**: 300-dimension aggregation-layer embeddings extracted from the pretrained multitask model. |
|
|
| --- |
|
|
| ## 4. Models |
|
|
| ### 4.1 Model Family A: Chemprop D-MPNN Multitask (T1) |
|
|
| A directed message-passing neural network (D-MPNN) implemented in Chemprop v2 with multitask pretraining: |
|
|
| - **Architecture**: BondMessagePassing (d_h=300, depth=3) β MeanAggregation β RegressionFFN (4 tasks, 3 hidden layers, dropout=0.1) |
| - **Heads**: |
| 1. pEC50 regression (OpenADMET DRC) |
| 2. NCATS qHTS binary active/inactive |
| 3. NCATS qHTS LogAC50 regression |
| 4. Tox21 SR-ARE binary classification |
| - **Pretraining**: 30 epochs on combined dataset (train + external) with equal head weights |
| - **Finetuning**: 50 epochs with Head 1 weighted 4Γ on OpenADMET train only |
| - **Loss**: MSE (binary tasks treated as regression to 0/1 targets) |
| |
| ### 4.2 Model Family B: LightGBM Kitchen-Sink (v4) |
| |
| - **Features**: 4,902-dimension vector = RDKit 217 + FCFP4-count-1024 + Mordred 1,613 + CheMeleon 2,048 |
| - **Model**: LightGBM with MSE loss, 5-fold CV, early stopping |
| - **Hyperparameters**: tuned via median fold-best-iteration heuristic for full-train model |
| |
| ### 4.3 Model Family C: AutoGluon on CheMeleon (T2) |
| |
| - **Features**: CheMeleon 2,048-dimension frozen embeddings |
| - **Model**: AutoGluon TabularPredictor with `best_quality` preset |
| - **Runtime**: ~3.5 hours for 5-fold OOF generation |
|
|
| ### 4.4 Model Family D: Chemprop + ChEMBL Multitask (T1v5) |
|
|
| Extends T1 with an additional head for ChEMBL NR1I2 PXR pchembl_value (907 compounds). 5-head Chemprop with the same architecture as T1. |
| |
| --- |
| |
| ## 5. Ensemble Strategy |
| |
| Our final submission (v43) is a **cascade-weighted blend** built iteratively: |
| |
| ``` |
| v7 = 0.50 Γ T1 + 0.38 Γ v4 + 0.12 Γ v3 (isotonic calibrated) |
| v22 = 0.975 Γ v7 + 0.025 Γ KERMT |
| v26 = 0.90 Γ v22 + 0.10 Γ CheMeleon-FT |
| S6 = 0.90 Γ v26 + 0.10 Γ TabPFN |
| v28 = S6 + T1v2 (CYP3A4 multitask) at w=0.12 |
| v29 = v28 + T16 (cliff-weighted Chemprop) at w=0.10 |
| v31 = 0.83 Γ v29 + 0.17 Γ T2 (AutoGluon-CheMeleon) |
| v43 = 0.78 Γ v31 + 0.22 Γ T1v5 (ChEMBL multitask) |
| ``` |
| |
| All weights were determined by grid search maximizing honest OOF RAE improvement with a minimum Ξ threshold of +0.003 vs the previous step. |
| |
| **Calibration**: IsotonicRegression (out_of_bounds='clip') applied to blend outputs. For honest CV evaluation, per-fold IsotonicRegression is fitted on each fold's OOF separately to avoid leakage. |
| |
| --- |
| |
| ## 6. Validation Strategy |
| |
| ### 6.1 Cross-Validation |
| |
| - **Scheme**: Butina clustering at Tanimoto cutoff 0.4 on ECFP4 2048-bit fingerprints, followed by GroupKFold (5 folds) |
| - **Rationale**: Groups chemically similar compounds together to simulate the challenge's analog-set test construction |
| - **Primary metric**: RAE (Relative Absolute Error) = Ξ£|y_true - y_pred| / Ξ£|y_true - mean(y_true)| |
| |
| ### 6.2 Honest Calibration Protocol |
| |
| We distinguish two calibration protocols: |
| - **In-sample isotonic**: Single IsotonicRegression fit on full-train OOF (used for test submission; slightly optimistic) |
| - **Honest per-fold isotonic**: Separate IsotonicRegression fit per CV fold on that fold's OOF only (used for candidate gating; conservative) |
| |
| All candidates must pass the honest per-fold iso gate (Ξ β₯ +0.003 vs v43 honest OOF 0.4798) before queueing for submission. |
| |
| ### 6.3 Statistical Rigor |
| |
| - **Cluster-bootstrap 95% CI**: Bootstrapped at the Butina-cluster level (not compound level) to account for chemical similarity structure |
| - **Bootstrap iterations**: 1,000 with replacement |
| |
| --- |
| |
| ## 7. Submission History & Performance |
| |
| | Tag | Date | LB RAE | LB Rank | Key Change | |
| |---|---|---|---|---| |
| | v2-baseline-xgb | 2026-05-06 | 0.7412 | 145/199 | XGBoost MAE on FCFP4 + RDKit | |
| | v3-baseline-lgbm | 2026-05-06 | 0.7249 | 131/199 | LightGBM MSE, Butina CV | |
| | v4-kitchen-sink | 2026-05-07 | 0.6889 | 112/201 | +Mordred +CheMeleon (4902d) | |
| | v7-ensemble | 2026-05-07 | 0.6039 | 42/202 | +Chemprop D-MPNN multitask | |
| | v31-blend | 2026-05-07 | 0.5966 | 42/207 | +AutoGluon-CheMeleon + KERMT + CheMeleon-FT | |
| | v43-final | 2026-05-08 | 0.586 | ~40/211 | +T1v5 ChEMBL multitask, clean hierarchy | |
| | v43-defensive | 2026-05-11 | 0.586 | ~40/211 | Defensive re-submit; no improved candidate found | |
| |
| **CV-to-LB shift trend**: 0.170 β 0.158 β 0.134 β 0.107. The shift narrows as model quality improves. |
| |
| **Final rank as of 2026-05-11**: 40-42 / 211 (top 19%). Gap to top-25: ~0.013 RAE. Gap to #1 (Yan): ~0.090 RAE. |
| |
| --- |
| |
| ## 8. Exploratory Work (Not in Final Ensemble) |
| |
| We tested but did not include the following approaches due to honest-CV gate failure: |
| |
| | Approach | Honest OOF RAE | Reason for Exclusion | |
| |---|---|---| |
| | Single-conc multitask (T1v7) | 0.6076 | Diluted primary task; log2fc scale mismatch | |
| | BindingDB + ChEMBL broad pretrain (T1v9) | 0.5538 | Distribution mismatch; no improvement over T1 | |
| | MaskMol / MAE ViT-Base | 0.7122 | Too weak solo; zero blend utility | |
| | TabPFN on CheMeleon | 0.5551 | Spearman vs v43 = 0.931 (too correlated) | |
| | GIN + ACtriplet (T11) | 0.5839 | Spearman vs v43 = 0.890 (too correlated) | |
| | Uni-MolV2 310M fine-tune | 0.630 | Poor convergence; high correlation | |
| | Boltz-2 structural confidence | 0.845 | Confidence scores β binding affinity | |
| | Differentiable ensemble (14 OOFs) | 0.4966 | Worse than v43 alone; correlated errors | |
| | ADMET-AI features + LGBM | 0.6206 | Too weak solo; minimal blend utility | |
| | Tail-weighted LGBM (Ξ±=5) | 0.5619 | Marginal improvement on weak baseline | |
| | Precision-weighted LGBM | 0.5534 | Best non-v43 LGBM; still far from ceiling | |
| | KERMT higher-weight blend | 0.4939 | Spearman vs v43 = 0.910; no improvement | |
| | Artifact removal (pEC50 < 2) | 0.5745 | Removing "artifacts" hurt generalization | |
| | Test-time SMILES augmentation | 0.0376 MAE | No benefit over canonical SMILES | |
| |
| --- |
| |
| ## 9. Key Learnings |
| |
| 1. **Chemprop D-MPNN is the dominant diversity contributor.** All attempts to replace or augment it with transformer-based models (ChemBERTa, GIN, Uni-Mol) converged to Spearman > 0.88 vs the D-MPNN representation, indicating equivalent 2D-graph learning on this dataset. |
| |
| 2. **External data quality > quantity.** ChEMBL broad (2,122 compounds) and BindingDB (364 compounds) did not improve pretrain quality because their pActivity distributions (median ~5.0β6.6) differed from OpenADMET (median ~4.3). Distribution mismatch outweighed volume. |
| |
| 3. **Honest per-fold calibration is essential.** In-sample isotonic calibration overestimates improvement by ~0.01 RAE. Our v50 catastrophic regression (CV 0.454 β LB 0.658) was caught only by the honest-CV gate. |
| |
| 4. **Tail compression is the structural bottleneck.** Our predictions max at ~6.3 vs train max 7.55. Only 11 train compounds (0.3%) have pEC50 β₯ 6.5. No loss-function tweak or ensemble method can extrapolate from 11 examples. Closing the gap to top-25 (Ξ = 0.013 LB RAE) would require either proprietary data, 3D structural features, or a genuinely new architecture β each requiring days of setup beyond our time budget. |
| |
| 5. **The ceiling is real.** After 35+ experiments (27 in headless loops + 8 in interactive sessions), every model class converged to Spearman > 0.87 vs v43. The 2D-graph + descriptor space is fully explored. v43 is the best achievable with public data and current infrastructure. |
| |
| --- |
| |
| ## 10. Code and Reproducibility |
| |
| All code, environment specifications, and trained model checkpoints are published at https://huggingface.co/RyeCatcher/openadmet-pxr-challenge-2026 under Apache 2.0. The repository includes: |
| |
| - `code/baseline/` β LGBM/XGBoost baselines (v2, v3, v4) |
| - `code/featurization/` β Mordred, CheMeleon, FCFP4 wrappers |
| - `code/multitask/T1_chemprop_external_pretrain.py` β Chemprop multitask training |
| - `code/multitask/T1v5_chemprop_chembl_nr1i2.py` β 5-head Chemprop with ChEMBL NR1I2 |
| - `code/multitask/T2_autogluon_chemeleon.py` β AutoGluon-tabular on CheMeleon |
| - `code/ensemble/v43_final_blend.py` β Final blend reproducing the submission |
| - `code/submit/submit_v43.py` β Gradio API submission script |
| - `models/` β Trained Chemprop and AutoGluon checkpoints (v43 lineage) |
| - `data/oof_predictions/` β Per-track out-of-fold and test predictions |
| - `submission/v43_final.parquet` β The 513-row submission |
| - `code/requirements.txt` β Pinned dependency list |
|
|
| **Dependencies**: Python 3.12, RDKit 2026.03.1, Chemprop 2.2.3, PyTorch 2.11.0+cu130, LightGBM 4.6.0, XGBoost 2.1.4, scikit-learn 1.8.0, pandas 3.0.2 |
|
|
| --- |
|
|
| ## 11. Hardware |
|
|
| Training was performed on an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory). Typical wall-clock times: |
| - Chemprop pretrain (30 epochs): ~3 min |
| - Chemprop 5-fold finetune: ~8 min |
| - LightGBM kitchen-sink: ~15 sec |
| - AutoGluon-CheMeleon: ~3.5 hours |
|
|
| --- |
|
|
| *Report finalized 2026-05-11. Best submission: v43 (RAE 0.586 LB, rank ~40/211). Phase 1 deadline: 2026-05-25.* |
|
|