ishathombre commited on
Commit
b054188
·
verified ·
1 Parent(s): def9b62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -7,4 +7,35 @@ base_model:
7
  - google-bert/bert-base-uncased
8
  pipeline_tag: fill-mask
9
  library_name: transformers
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - google-bert/bert-base-uncased
8
  pipeline_tag: fill-mask
9
  library_name: transformers
10
+ ---
11
+
12
+ # BERT from Scratch (1 Epoch, Training Loss: 4.13)
13
+
14
+ These are the scripts for creating the BERT model trained from scratch using a custom tokenizer with a 64,000-token vocabulary. The model can be found here: https://huggingface.co/ishathombre/monolingual-hindi-from-scratch
15
+
16
+ - **Training:** 1 epoch
17
+ - **Masked Language Modeling (MLM) loss:** 4.13
18
+ - **Tokenizer:** Custom-trained, vocab size, on iit-madras Hindi-monolingual dataset = 64,000
19
+ - **Architecture:**
20
+ Maximum position embeddings: 512
21
+ Hidden size: 312
22
+ Number of attention heads: 12
23
+ Number of transformer layers: 4
24
+ Intermediate (feed-forward) size: 1200
25
+ Type vocabulary size: 2 (for segment embeddings)
26
+
27
+ It is uploaded for checkpointing, experimentation, and community feedback.
28
+
29
+ ## Intended Use
30
+
31
+ - Research on training dynamics
32
+ - Continued pretraining
33
+ - Fine-tuning for downstream tasks (with caution)
34
+
35
+ ## Limitations
36
+
37
+ - Low training coverage (1 epoch)
38
+ - Not yet evaluated on downstream tasks
39
+
40
+
41
+ [More Information Needed]