FWE - GLA 1B
Collection
OpenTome optimizer benchmark — FWE - GLA 1B • 12 items • Updated
This is a 1B GLA language model trained on FineWeb-Edu using the Lion optimizer, as part of the OpenTome optimizer benchmark.
| Field | Value |
|---|---|
| Architecture | GLA |
| Model Type | gla |
| Scale | 1B |
| Hidden Size | 2048 |
| Layers | 24 |
| Attention Heads | 4 |
| Vocab Size | 32000 |
| Max Sequence Length | 2048 |
| Training Dataset | FineWeb-Edu |
| Optimizer | Lion |
| Hyperparameters | lr=3e-4, β₁=0.9, β₂=0.99 |
| Timestamp | 20260519_131434 |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"OpenRaiser/fwe_gla_1b_lion_lr3e_4_b1_0_9_b2_0_99_20260519_131434",
trust_remote_code=True,
)
See train.sh for the full training command.
Training was conducted on H200 GPUs using the
FLAME framework.
If you use this model, please cite the OpenTome benchmark paper.