--- datasets: - ai4bharat/IndicCorpV2 language: - hi base_model: - google-bert/bert-base-uncased pipeline_tag: fill-mask library_name: transformers --- # BERT from Scratch (1 Epoch, Training Loss: 4.13) BERT model trained from scratch using a custom tokenizer with a 64,000-token vocabulary. - **Training:** 1 epoch - **Masked Language Modeling (MLM) loss:** 4.13 - **Tokenizer:** Custom-trained, vocab size, on iit-madras Hindi-monolingual dataset = 64,000 - **Architecture:** Maximum position embeddings: 512 Hidden size: 312 Number of attention heads: 12 Number of transformer layers: 4 Intermediate (feed-forward) size: 1200 Type vocabulary size: 2 (for segment embeddings) It is uploaded for checkpointing, experimentation, and community feedback. ## Intended Use - Research on training dynamics - Continued pretraining - Fine-tuning for downstream tasks (with caution) ## Limitations - Low training coverage (1 epoch) - Not yet evaluated on downstream tasks