FWE - DeltaNet 340M
Collection
OpenTome optimizer benchmark — FWE - DeltaNet 340M • 12 items • Updated
This is a 340M DeltaNet language model trained on FineWeb-Edu using the Lion optimizer, as part of the OpenTome optimizer benchmark.
| Field | Value |
|---|---|
| Architecture | DeltaNet |
| Model Type | delta_net |
| Scale | 340M |
| Hidden Size | 1024 |
| Layers | 24 |
| Attention Heads | 8 |
| Vocab Size | 32000 |
| Max Sequence Length | 2048 |
| Training Dataset | FineWeb-Edu |
| Optimizer | Lion |
| Hyperparameters | lr=3e-3, β₁=0.9, β₂=0.99 |
| Timestamp | 20260505_194333 |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"OpenRaiser/fwe_deltanet_340m_lion_lr3e_3_b1_0_9_b2_0_99_20260505_194333",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"OpenRaiser/fwe_deltanet_340m_lion_lr3e_3_b1_0_9_b2_0_99_20260505_194333",
trust_remote_code=True,
)
See train.sh for the full training command.
Training was conducted on H200 GPUs using the
FLAME framework.
If you use this model, please cite the OpenTome benchmark paper.