fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434

This is a 1B GLA language model trained on FineWeb-Edu using the Lion optimizer, as part of the OpenTome optimizer benchmark.

Model Details

Field	Value
Architecture	GLA
Model Type	`gla`
Scale	1B
Hidden Size	2048
Layers	24
Attention Heads	4
Vocab Size	32000
Max Sequence Length	2048
Training Dataset	FineWeb-Edu
Optimizer	Lion
Hyperparameters	lr=3e-4, β₁=0.9, β₂=0.99
Timestamp	20260519_131434

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434",
    trust_remote_code=True,
)

Training

See train.sh for the full training command. Training was conducted on H200 GPUs using the FLAME framework.

Citation

If you use this model, please cite the OpenTome benchmark paper.

Downloads last month: 14

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99

FWE - GLA 1B

Collection

OpenTome optimizer benchmark — FWE - GLA 1B • 12 items • Updated 20 days ago