fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434

This is a 1B GLA language model trained on FineWeb-Edu using the Lion optimizer, as part of the OpenTome optimizer benchmark.

Model Details

Field Value
Architecture GLA
Model Type gla
Scale 1B
Hidden Size 2048
Layers 24
Attention Heads 4
Vocab Size 32000
Max Sequence Length 2048
Training Dataset FineWeb-Edu
Optimizer Lion
Hyperparameters lr=3e-4, β₁=0.9, β₂=0.99
Timestamp 20260519_131434

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434",
    trust_remote_code=True,
)

Training

See train.sh for the full training command. Training was conducted on H200 GPUs using the FLAME framework.

Citation

If you use this model, please cite the OpenTome benchmark paper.

Downloads last month
14
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99