ishathombre commited on
Commit
def9b62
·
verified ·
1 Parent(s): 2ba2277

Create README.md

Browse files

# BERT from Scratch (1 Epoch, Training Loss: 4.13)

This is a BERT model trained from scratch using a custom tokenizer with a 64,000-token vocabulary.

- **Training:** 1 epoch
- **Masked Language Modeling (MLM) loss:** 4.13
- **Tokenizer:** Custom-trained, vocab size, on iit-madras Hindi-monolingual dataset = 64,000
- **Architecture:**
Maximum position embeddings: 512
Hidden size: 312
Number of attention heads: 12
Number of transformer layers: 4
Intermediate (feed-forward) size: 1200
Type vocabulary size: 2 (for segment embeddings)

It is uploaded for checkpointing, experimentation, and community feedback.

## Intended Use

- Research on training dynamics
- Continued pretraining
- Fine-tuning for downstream tasks (with caution)

## Limitations

- Low training coverage (1 epoch)
- Not yet evaluated on downstream tasks

Files changed (1) hide show
  1. README.md +10 -0
README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - ai4bharat/IndicCorpV2
4
+ language:
5
+ - hi
6
+ base_model:
7
+ - google-bert/bert-base-uncased
8
+ pipeline_tag: fill-mask
9
+ library_name: transformers
10
+ ---