dtSFM-v3 — Drug–Target Specificity Foundation Model (production encoder + generative decoder)
Paper: A drug–target specificity foundation model for off-target prediction, repurposing, and generative design · doi: 10.64898/2026.06.08.730844 Code: github.com/Reddy-BIIE-ETHZ/dtSFM All SFM models: huggingface.co/SFM-BIIE-ETHZ
This is the production dtSFM (v3): a full-scale cross-attention encoder paired with a cross-attentive autoregressive decoder. The smaller encoder-only prototype from the Vibe Coding SFMs paper lives separately at
SFM-BIIE-ETHZ/dtSFM_VC-SFM.
What it does
dtSFM maps a (drug SMILES, protein sequence) pair to a binding-compatibility score, and generates novel target-conditioned drug-like molecules — all from sequence, without constructing a 3-D structure. It is built on the SFM principle that transformer softmax attention is mathematically isomorphic to the Boltzmann distribution of molecular binding.
Three applications run on this single model:
- Off-target safety screening (drug → proteome): documented off-targets at median rank 30 / 4,910 genes (top 0.6%) vs the Klaeger 2017 chemoproteomic panel.
- Library repurposing (target → drug): 46 novel candidates clear the AlphaFold-3 binder gate across NLRP3 / CD73 / STING1.
- Generative design (target → novel drug): 850 / 1,200 (71%) of generated molecules match the AlphaFold-3 structural confidence of the approved drug.
| Component | Model |
|---|---|
| Drug encoder | MoLFormer-XL (frozen, SMILES → 768-d) |
| Protein encoder | ESM-2-650M (frozen, → 1,280-d per residue) |
| Cross-attention encoder | trainable, 2 layers · 8 heads · d=512 · 14.4 M params · 4 heads |
| Decoder | cross-attentive autoregressive SMILES generator (~28 M params) |
| Training data | PDBbind v2020 + SAIR (714,747 pairs · 522,776 drugs · 22,964 proteins) |
| Split | whole MMseqs2 protein clusters held out at 80% identity; zero pair/protein/cluster leakage |
Files in this repo
| File | Description |
|---|---|
encoder_b3_epoch010.pt |
locked production encoder (B-3, 4 heads) |
decoder_v02_step50000.pt |
cross-attentive generative decoder (checkpoint @ 50k steps) |
Quick start
from huggingface_hub import hf_hub_download
import torch, torch.nn.functional as F
from calm.encoder.model_v3 import CALMEncoderV3
from calm.decoder.model_dtsfm_v3 import CALMDecoderV3
# --- retrieval / scoring (drug ↔ target) ---
enc = CALMEncoderV3.from_pretrained(
hf_hub_download("SFM-BIIE-ETHZ/dtSFM-v3", "encoder_b3_epoch010.pt")).eval()
drug = enc.encode_drug("CC(=O)Oc1ccccc1C(=O)O") # aspirin
target = enc.encode_protein("MTEYKLVVVGAGGVGKSALTIQLIQ...")
score = F.cosine_similarity(drug, target, dim=-1)
# --- generation (target → novel molecules) ---
dec = CALMDecoderV3.from_pretrained(
hf_hub_download("SFM-BIIE-ETHZ/dtSFM-v3", "decoder_v02_step50000.pt")).eval()
smiles = dec.generate(target_sequence="MTEYKLVVVGAGG...", n=100, temperature=0.8)
Install the codebase from github.com/Reddy-BIIE-ETHZ/dtSFM
(conda env create -f environment.yml).
Orthogonal verification
Every structural claim is checked by AlphaFold-3 as an orthogonal referee — it shares no architecture, training data, or representation with dtSFM (dtSFM-cosine ↔ AF3-confidence correlation ≈ 0), so structural agreement is genuine corroboration, not circular confirmation. Binder gate: iPTM ≥ 0.7 AND interface-PAE ≤ 5 Å. Anchor-grade gate: iPTM ≥ 0.9 AND PAE ≤ 1.67 Å.
Citation
@article{reddy2026dtsfm,
title = {A drug–target specificity foundation model for off-target prediction, repurposing, and generative design},
author = {Reddy, Sai T.},
journal = {bioRxiv},
year = {2026},
doi = {10.64898/2026.06.08.730844}
}
License
Released under the SFM Research Preview License v1.0-preview (see LICENSE.md).
Free for research use — academic, non-profit, government, and industry research. The specific
molecules disclosed in the accompanying preprints are dedicated to the public. Commercial-use
and patent-licensing terms are deferred and being arranged with ETH Zürich / BIIE; the SFM
architectures and training methods are the subject of pending patent applications.
For commercial enquiries: sai.reddy@ethz.ch