Fill-Mask
Transformers
PyTorch
Joblib
Safetensors
DNA
biology
genomics
custom_code
bernardo-de-almeida commited on
Commit
06615c1
·
verified ·
1 Parent(s): 09b05a4

Update README with correct sequence length

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -89,7 +89,7 @@ The DNA sequences are tokenized using the Nucleotide Transformer Tokenizer, whic
89
  <CLS> <ACGTGT> <ACGTGC> <ACGGAC> <GACTAG> <TCAGCA>
90
  ```
91
 
92
- The tokenized sequence have a maximum length of 1,000.
93
 
94
  The masking procedure used is the standard one for Bert-style training:
95
  - 15% of the tokens are masked.
@@ -99,7 +99,7 @@ The masking procedure used is the standard one for Bert-style training:
99
 
100
  ### Pretraining
101
 
102
- The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
103
 
104
  ### Architecture
105
 
 
89
  <CLS> <ACGTGT> <ACGTGC> <ACGGAC> <GACTAG> <TCAGCA>
90
  ```
91
 
92
+ The tokenized sequence have a maximum length of 2,048.
93
 
94
  The masking procedure used is the standard one for Bert-style training:
95
  - 15% of the tokens are masked.
 
99
 
100
  ### Pretraining
101
 
102
+ The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 2,048 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
103
 
104
  ### Architecture
105