| ## Setup |
|
|
| No setup is required. Simply fill in the input boxes with the necessary data and click the **Run** button. |
| You can find a list of examples at the bottom of the page; clicking on them will autofill the fields for you. |
| If the server remains idle for a period, it will enter standby mode. Running a calculation will wake the tool from standby, but note that the first run may take longer due to startup and model loading. |
|
|
| ## Input |
|
|
| **Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box. |
| Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualised. |
|
|
| **Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input: |
|
|
| - **Single Substitution**: Input one or more substitutions (e.g. `R218K R218W`) to score specific changes. |
| - **Residue Position**: Provide residue positions to evaluate all possible substitutions at those sites. |
| - **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length. |
| - **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed. |
|
|
| **Model Selection**: Choose an ESM model for calculations from those available on Hugging Face Model Hub. |
| The model `esm2_t33_650M_UR50D` offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574). |
|
|
| **Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference. |
| While this method is slower, it enhances accuracy. If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy. |
|
|
| **Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns—especially with longer sequences or during peak server usage times. |
| For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes. |
| Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritise reducing model size when optimizing for runtime. |
| The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length. |
|
|
| **Concurrent Substitutions**: To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation. Accuracy is not guaranteed as this use case is yet untested. |
|
|
| ## Output |
|
|
| Results are displayed in a color-coded table, except for deep mutational scans, which produce a heatmap. |
| In the table: |
|
|
| - Beneficial substitutions are highlighted in green with positive values. |
| - Detrimental substitutions appear in red with negative values. |
|
|
| As a rule of thumb, score differences of *4* or more are considered significant. For instance: |
|
|
| - A substitution scoring *-6* is likely detrimental to protein functionality. |
| - A score of *+2* is generally regarded as neutral. |
|
|
| The **Download raw data** button lets you download the output in CSV format. |
|
|
|
|
| **If you use this tool in your research, please cite**: |
|
|
| Totaro MG, Vide U, Zausinger R, Winkler A, Oberdorfer G. ESM-scan—A tool to guide amino acid substitutions. *Protein Science.* 2024; 33(12):e5221. [doi.org/10.1002/pro.5221](https://doi.org/10.1002/pro.5221) |
|
|