Loser-GDN-2-305M-20260601

Gated DeltaNet-2 checkpoint trained in Jellyfish042/GatedDeltaNet-2.

  • Architecture: Gated DeltaNet-2, gdn2_12h_305M
  • Parameters printed by the training script: 239,272,896
  • Pretraining data: globally shuffled FineWeb-Edu packed 100B-token run
  • Sequence length: 4096
  • Global batch size: 1024
  • Seed: 3407
  • Checkpoint file: pytorch_model.bin

This is a project checkpoint from pretrain.py, not a Hugging Face Transformers checkpoint.

lm-eval-harness snapshot

Task Metric Value
lambada_openai acc 0.3315
lambada_openai perplexity 30.1548
wikitext word_perplexity 25.4100
wikitext byte_perplexity 1.8312
wikitext bits_per_byte 0.8728
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Jellyfish042/Loser-GDN-2-305M-20260601