fwe_deltanet_340m_lion_lr3e_3_b1_0_9_b2_0_99_20260505_194333

This is a 340M DeltaNet language model trained on FineWeb-Edu using the Lion optimizer, as part of the OpenTome optimizer benchmark.

Model Details

Field Value
Architecture DeltaNet
Model Type delta_net
Scale 340M
Hidden Size 1024
Layers 24
Attention Heads 8
Vocab Size 32000
Max Sequence Length 2048
Training Dataset FineWeb-Edu
Optimizer Lion
Hyperparameters lr=3e-3, β₁=0.9, β₂=0.99
Timestamp 20260505_194333

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "OpenRaiser/fwe_deltanet_340m_lion_lr3e_3_b1_0_9_b2_0_99_20260505_194333",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "OpenRaiser/fwe_deltanet_340m_lion_lr3e_3_b1_0_9_b2_0_99_20260505_194333",
    trust_remote_code=True,
)

Training

See train.sh for the full training command. Training was conducted on H200 GPUs using the FLAME framework.

Citation

If you use this model, please cite the OpenTome benchmark paper.

Downloads last month
16
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including OpenRaiser/fwe_deltanet_340m_lion_lr3e_3_b1_0_9_b2_0_99