InstaDeepAI
/

nucleotide-transformer-v2-500m-multi-species

@@ -89,7 +89,7 @@ The DNA sequences are tokenized using the Nucleotide Transformer Tokenizer, whic
 <CLS> <ACGTGT> <ACGTGC> <ACGGAC> <GACTAG> <TCAGCA>
 ```
-The tokenized sequence have a maximum length of 1,000.
 The masking procedure used is the standard one for Bert-style training:
 - 15% of the tokens are masked.
@@ -99,7 +99,7 @@ The masking procedure used is the standard one for Bert-style training:
 ### Pretraining
-The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
 ### Architecture

 <CLS> <ACGTGT> <ACGTGC> <ACGGAC> <GACTAG> <TCAGCA>
 ```
+The tokenized sequence have a maximum length of 2,048.
 The masking procedure used is the standard one for Bert-style training:
 - 15% of the tokens are masked.
 ### Pretraining
+The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 2,048 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
 ### Architecture