--- license: apache-2.0 library_name: pytorch tags: - gated-deltanet - gdn2 - language-model - pretraining --- # Loser-GDN-2-305M-20260601 Gated DeltaNet-2 checkpoint trained in `Jellyfish042/GatedDeltaNet-2`. - Architecture: Gated DeltaNet-2, `gdn2_12h_305M` - Parameters printed by the training script: `239,272,896` - Pretraining data: globally shuffled FineWeb-Edu packed 100B-token run - Sequence length: `4096` - Global batch size: `1024` - Seed: `3407` - Checkpoint file: `pytorch_model.bin` This is a project checkpoint from `pretrain.py`, not a Hugging Face Transformers checkpoint. ## lm-eval-harness snapshot | Task | Metric | Value | |---|---:|---:| | `lambada_openai` | acc | `0.3315` | | `lambada_openai` | perplexity | `30.1548` | | `wikitext` | word_perplexity | `25.4100` | | `wikitext` | byte_perplexity | `1.8312` | | `wikitext` | bits_per_byte | `0.8728` |