| license: apache-2.0 | |
| library_name: pytorch | |
| tags: | |
| - gated-deltanet | |
| - gdn2 | |
| - language-model | |
| - pretraining | |
| # Loser-GDN-2-305M-20260601 | |
| Gated DeltaNet-2 checkpoint trained in `Jellyfish042/GatedDeltaNet-2`. | |
| - Architecture: Gated DeltaNet-2, `gdn2_12h_305M` | |
| - Parameters printed by the training script: `239,272,896` | |
| - Pretraining data: globally shuffled FineWeb-Edu packed 100B-token run | |
| - Sequence length: `4096` | |
| - Global batch size: `1024` | |
| - Seed: `3407` | |
| - Checkpoint file: `pytorch_model.bin` | |
| This is a project checkpoint from `pretrain.py`, not a Hugging Face Transformers checkpoint. | |
| ## lm-eval-harness snapshot | |
| | Task | Metric | Value | | |
| |---|---:|---:| | |
| | `lambada_openai` | acc | `0.3315` | | |
| | `lambada_openai` | perplexity | `30.1548` | | |
| | `wikitext` | word_perplexity | `25.4100` | | |
| | `wikitext` | byte_perplexity | `1.8312` | | |
| | `wikitext` | bits_per_byte | `0.8728` | | |