Jellyfish042's picture
Upload Loser-GDN-2-305M-20260601 pretrained checkpoint
2d98aff verified
|
Raw
History Blame Contribute Delete
891 Bytes
metadata
license: apache-2.0
library_name: pytorch
tags:
  - gated-deltanet
  - gdn2
  - language-model
  - pretraining

Loser-GDN-2-305M-20260601

Gated DeltaNet-2 checkpoint trained in Jellyfish042/GatedDeltaNet-2.

  • Architecture: Gated DeltaNet-2, gdn2_12h_305M
  • Parameters printed by the training script: 239,272,896
  • Pretraining data: globally shuffled FineWeb-Edu packed 100B-token run
  • Sequence length: 4096
  • Global batch size: 1024
  • Seed: 3407
  • Checkpoint file: pytorch_model.bin

This is a project checkpoint from pretrain.py, not a Hugging Face Transformers checkpoint.

lm-eval-harness snapshot

Task Metric Value
lambada_openai acc 0.3315
lambada_openai perplexity 30.1548
wikitext word_perplexity 25.4100
wikitext byte_perplexity 1.8312
wikitext bits_per_byte 0.8728