cronica-jax-5m
A 5.26M parameter decoder-only Transformer, hand-written in pure JAX
(no Flax, no Equinox), that learns the data-to-text task: take a
structured football match <STATS> block and produce a Spanish-language
crónica in one of 8 regional commentator styles.
This model is a portfolio piece that demonstrates the craft of implementing a Transformer from first principles in JAX. It is intentionally small and is not a state-of-the-art language model.
- Code: https://github.com/DanielRegaladoUMiami/cronica-jax
- Training data: DanielRegaladoCardoso/cronicas-d2t
Architecture
- Decoder-only Transformer, 5.26M params
- vocab=8000 (byte-level BPE), d_model=256, n_layers=4, n_heads=4, d_head=64, d_ff=704, max_seq_len=768
- RoPE positional encoding, RMSNorm pre-norm, SwiGLU MLP, tied embeddings
- Loss masking: cross-entropy applied only to tokens inside the
<cronica>...</cronica>span; prompt tokens contribute zero loss.
All forward-pass primitives — attention, RoPE, RMSNorm, SwiGLU, the training step — are hand-implemented in pure JAX, as a deliberate exercise in understanding the math.
Training
- 5,000 (stats, crónica) pairs from cronicas-d2t
- Optax AdamW, peak_lr=3e-4, cosine schedule, warmup_steps=100, weight_decay=0.1
- Gradient clipping global_norm=1.0
- 2,000 steps, batch_size=8, seq_len=768
- Trained on Apple M4 CPU in ~28 minutes (no GPU/TPU)
- Loss: 8.74 (step 25) → 2.80 (step 2000), perplexity ≈ 16.4
How to use
import jax
from tokenizers import Tokenizer
from cronica.train import load_ckpt
from cronica.sample import generate_cronica
from huggingface_hub import hf_hub_download
tok_path = hf_hub_download("DanielRegaladoCardoso/cronica-jax-5m", "tokenizer.json")
ckpt = hf_hub_download("DanielRegaladoCardoso/cronica-jax-5m", "ckpt_002000.pkl")
tok = Tokenizer.from_file(tok_path)
params, cfg, step = load_ckpt(ckpt)
stats = ("<STATS>\n"
"liga: La Liga\n"
"fecha: 2024-03-15\n"
"local: Real Madrid\n"
"visitante: Atletico Madrid\n"
"resultado: 2-1\n"
"goles:\n"
" - 23' Vinicius Junior (Real Madrid)\n"
" - 67' Antoine Griezmann (Atletico Madrid)\n"
" - 88' Jude Bellingham (Real Madrid)\n"
"</STATS>")
text = generate_cronica(params, cfg, tok, stats,
style_label="rioplatense_apasionado",
temperature=0.85, top_k=50, top_p=0.9,
max_new_tokens=300)
print(text)
Style labels
| label | region / flavor |
|---|---|
rioplatense_apasionado |
Argentina, emotive, "gol" lengthened |
rioplatense_tecnico |
Argentina, analytical |
rioplatense_literario |
Uruguay, evocative prose |
mexicano_irreverente |
Mexico, sarcastic |
mexicano_clasico |
Mexico, formal |
centroamericano_espn |
El Salvador / ESPN Latam, polished |
espanol_radiofonico |
Spain, radio broadcast style |
comentario_tecnico |
tactical analysis |
Limitations and honest caveats
- Small model. 5M params on 1.4M training tokens (~0.28 tokens/param, ~1000× over-parameterized vs. Chinchilla-optimal). Expect grammatical but unpolished output; do not expect the prose quality of a billion- parameter model.
- Hallucinated context. Training crónicas were generated by
gpt-4o-mini, which sometimes added manager names, stadium nicknames, or derby references not present in the<STATS>. Our small model can reproduce this hallucination tendency. Use with care if grounding matters. - Coverage bias. Training matches are weighted toward big European leagues + CONMEBOL competitions. Smaller leagues are under-represented.
- Style overlap. Three of the eight styles (
rioplatense_apasionado,rioplatense_literario,comentario_tecnico) have strong distinctive vocab (gooool,barrilete,transiciones); the other five share more neutral journalistic vocabulary.
License
Apache 2.0. See LICENSE.