metadata
license: apache-2.0
library_name: pytorch
tags:
- gated-deltanet
- gdn2
- language-model
- pretraining
Loser-GDN-2-305M-20260601
Gated DeltaNet-2 checkpoint trained in Jellyfish042/GatedDeltaNet-2.
- Architecture: Gated DeltaNet-2,
gdn2_12h_305M - Parameters printed by the training script:
239,272,896 - Pretraining data: globally shuffled FineWeb-Edu packed 100B-token run
- Sequence length:
4096 - Global batch size:
1024 - Seed:
3407 - Checkpoint file:
pytorch_model.bin
This is a project checkpoint from pretrain.py, not a Hugging Face Transformers checkpoint.
lm-eval-harness snapshot
| Task | Metric | Value |
|---|---|---|
lambada_openai |
acc | 0.3315 |
lambada_openai |
perplexity | 30.1548 |
wikitext |
word_perplexity | 25.4100 |
wikitext |
byte_perplexity | 1.8312 |
wikitext |
bits_per_byte | 0.8728 |