Instructions to use entropy/erbb1_mlp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use entropy/erbb1_mlp with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="entropy/erbb1_mlp", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("entropy/erbb1_mlp", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - chemistry | |
| - molecule | |
| license: mit | |
| # Model Card for ErbB1 MLP | |
| ### Model Description | |
| `erbb1_mlp` is a MLP-style model trained to predict ErbB1 (EGFR) binding affinity from | |
| embeddings generated by the [roberta_zinc_480m](https://huggingface.co/entropy/roberta_zinc_480m) | |
| model. | |
| - **Developed by:** Karl Heyer | |
| - **License:** MIT | |
| ### Direct Use | |
| Usage examples. Note that input SMILES strings should be canonicalized. | |
| With the Transformers library: | |
| ```python | |
| from sentence_transformers import models, SentenceTransformer | |
| from transformers import AutoModel | |
| transformer = models.Transformer("entropy/roberta_zinc_480m", | |
| max_seq_length=256, | |
| model_args={"add_pooling_layer": False}) | |
| pooling = models.Pooling(transformer.get_word_embedding_dimension(), | |
| pooling_mode="mean") | |
| roberta_zinc = SentenceTransformer(modules=[transformer, pooling]) | |
| erbb1_mlp = AutoModel.from_pretrained("entropy/erbb1_mlp", trust_remote_code=True) | |
| # smiles should be canonicalized | |
| smiles = [ | |
| "Brc1cc2c(NCc3ccccc3)ncnc2s1", | |
| "Brc1cc2c(NCc3ccccn3)ncnc2s1", | |
| "Brc1cc2c(NCc3cccs3)ncnc2s1", | |
| "Brc1cc2c(NCc3ccncc3)ncnc2s1", | |
| "Brc1cc2c(Nc3ccccc3)ncnc2s1" | |
| ] | |
| embeddings = roberta_zinc.encode(smiles, convert_to_tensor=True) | |
| predictions = erbb1_mlp(embeddings).predictions | |
| ``` | |
| ### Training Procedure | |
| #### Preprocessing | |
| ErbB1 ligands were downloaded from ChEMBL (`target_chembl_id="CHEMBL203"`, `type="IC50"`, | |
| `relation="="`, `assay_type="B"`). Results were filtered for assays with IC50 values in nM | |
| for homo sapiens, canonicalized and deduplicated. IC50 values were converted to pIC50 values. | |
| The final dataset contains 7327 data points. | |
| Prior to training, pIC50 values were normalized. The model was trained on normalized values, | |
| and uses the saved mean/variance of the dataset to denormalize predictions. | |
| #### Training Hyperparameters | |
| The model was trained for 30 epochs with a batch size of 32, learing rate of 1e-3, weight decay of | |
| 1e-4 and cosine learning rate scheduling. | |
| ## Model Card Authors | |
| Karl Heyer | |
| ## Model Card Contact | |
| karl@darmatterai.xyz | |
| --- | |
| license: mit | |
| --- | |