Instructions to use InstaDeepAI/nucleotide-transformer-v2-500m-multi-species with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InstaDeepAI/nucleotide-transformer-v2-500m-multi-species with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="InstaDeepAI/nucleotide-transformer-v2-500m-multi-species", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("InstaDeepAI/nucleotide-transformer-v2-500m-multi-species", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README with correct sequence length
Browse files
README.md
CHANGED
|
@@ -89,7 +89,7 @@ The DNA sequences are tokenized using the Nucleotide Transformer Tokenizer, whic
|
|
| 89 |
<CLS> <ACGTGT> <ACGTGC> <ACGGAC> <GACTAG> <TCAGCA>
|
| 90 |
```
|
| 91 |
|
| 92 |
-
The tokenized sequence have a maximum length of
|
| 93 |
|
| 94 |
The masking procedure used is the standard one for Bert-style training:
|
| 95 |
- 15% of the tokens are masked.
|
|
@@ -99,7 +99,7 @@ The masking procedure used is the standard one for Bert-style training:
|
|
| 99 |
|
| 100 |
### Pretraining
|
| 101 |
|
| 102 |
-
The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was
|
| 103 |
|
| 104 |
### Architecture
|
| 105 |
|
|
|
|
| 89 |
<CLS> <ACGTGT> <ACGTGC> <ACGGAC> <GACTAG> <TCAGCA>
|
| 90 |
```
|
| 91 |
|
| 92 |
+
The tokenized sequence have a maximum length of 2,048.
|
| 93 |
|
| 94 |
The masking procedure used is the standard one for Bert-style training:
|
| 95 |
- 15% of the tokens are masked.
|
|
|
|
| 99 |
|
| 100 |
### Pretraining
|
| 101 |
|
| 102 |
+
The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 2,048 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
|
| 103 |
|
| 104 |
### Architecture
|
| 105 |
|