File size: 975 Bytes
def9b62
 
 
 
 
 
 
 
 
b054188
 
 
 
1616c2b
b054188
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
datasets:
- ai4bharat/IndicCorpV2
language:
- hi
base_model:
- google-bert/bert-base-uncased
pipeline_tag: fill-mask
library_name: transformers
---

# BERT from Scratch (1 Epoch, Training Loss: 4.13)

BERT model trained from scratch using a custom tokenizer with a 64,000-token vocabulary. 

- **Training:** 1 epoch
- **Masked Language Modeling (MLM) loss:** 4.13
- **Tokenizer:** Custom-trained, vocab size, on iit-madras Hindi-monolingual dataset = 64,000
- **Architecture:**
Maximum position embeddings: 512
Hidden size: 312
Number of attention heads: 12
Number of transformer layers: 4
Intermediate (feed-forward) size: 1200
Type vocabulary size: 2 (for segment embeddings)

It is uploaded for checkpointing, experimentation, and community feedback.

## Intended Use

- Research on training dynamics
- Continued pretraining
- Fine-tuning for downstream tasks (with caution)

## Limitations

- Low training coverage (1 epoch)
- Not yet evaluated on downstream tasks