alina0195/ro-modernBERT-phase-d-nowwm

Backbone: ModernBERT-base (22 layers, hidden 768, 12 heads)
Tokenizer: custom Romanian BPE, vocab=42240
Max sequence length: 8192
Training framework: MosaicML Composer

Romanian ModernBERT-base produced by continual pretraining of ModernBERT-base on the Romanian FineWeb2 corpus with a custom 42k Romanian tokenizer.

This is the Phase D checkpoint (no whole-word masking): context-extended to 8192 tokens in Phase C, then cooled down with a 1-sqrt LR schedule in Phase D.