alina0195/ro-modernBERT-phase-d-nowwm
Romanian ModernBERT-base produced by continual pretraining of ModernBERT-base on the Romanian FineWeb2 corpus with a custom 42k Romanian tokenizer.
This is the Phase D checkpoint (no whole-word masking): context-extended to 8192 tokens in Phase C, then cooled down with a 1-sqrt LR schedule in Phase D.
- Backbone: ModernBERT-base (22 layers, hidden 768, 12 heads)
- Tokenizer: custom Romanian BPE, vocab=42240
- Max sequence length: 8192
- Training framework: MosaicML Composer
- Downloads last month
- 13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support