Jellyfish042's picture
Upload Loser-GDN-2-305M-20260601 pretrained checkpoint
2d98aff verified
|
Raw
History Blame Contribute Delete
891 Bytes
---
license: apache-2.0
library_name: pytorch
tags:
- gated-deltanet
- gdn2
- language-model
- pretraining
---
# Loser-GDN-2-305M-20260601
Gated DeltaNet-2 checkpoint trained in `Jellyfish042/GatedDeltaNet-2`.
- Architecture: Gated DeltaNet-2, `gdn2_12h_305M`
- Parameters printed by the training script: `239,272,896`
- Pretraining data: globally shuffled FineWeb-Edu packed 100B-token run
- Sequence length: `4096`
- Global batch size: `1024`
- Seed: `3407`
- Checkpoint file: `pytorch_model.bin`
This is a project checkpoint from `pretrain.py`, not a Hugging Face Transformers checkpoint.
## lm-eval-harness snapshot
| Task | Metric | Value |
|---|---:|---:|
| `lambada_openai` | acc | `0.3315` |
| `lambada_openai` | perplexity | `30.1548` |
| `wikitext` | word_perplexity | `25.4100` |
| `wikitext` | byte_perplexity | `1.8312` |
| `wikitext` | bits_per_byte | `0.8728` |