cronica-jax-5m

A 5.26M parameter decoder-only Transformer, hand-written in pure JAX (no Flax, no Equinox), that learns the data-to-text task: take a structured football match <STATS> block and produce a Spanish-language crónica in one of 8 regional commentator styles.

This model is a portfolio piece that demonstrates the craft of implementing a Transformer from first principles in JAX. It is intentionally small and is not a state-of-the-art language model.

Code: https://github.com/DanielRegaladoUMiami/cronica-jax
Training data: DanielRegaladoCardoso/cronicas-d2t

Architecture

Decoder-only Transformer, 5.26M params
vocab=8000 (byte-level BPE), d_model=256, n_layers=4, n_heads=4, d_head=64, d_ff=704, max_seq_len=768
RoPE positional encoding, RMSNorm pre-norm, SwiGLU MLP, tied embeddings
Loss masking: cross-entropy applied only to tokens inside the <cronica>...</cronica> span; prompt tokens contribute zero loss.

All forward-pass primitives — attention, RoPE, RMSNorm, SwiGLU, the training step — are hand-implemented in pure JAX, as a deliberate exercise in understanding the math.

Training

5,000 (stats, crónica) pairs from cronicas-d2t
Optax AdamW, peak_lr=3e-4, cosine schedule, warmup_steps=100, weight_decay=0.1
Gradient clipping global_norm=1.0
2,000 steps, batch_size=8, seq_len=768
Trained on Apple M4 CPU in ~28 minutes (no GPU/TPU)
Loss: 8.74 (step 25) → 2.80 (step 2000), perplexity ≈ 16.4

How to use

import jax
from tokenizers import Tokenizer
from cronica.train import load_ckpt
from cronica.sample import generate_cronica
from huggingface_hub import hf_hub_download

tok_path = hf_hub_download("DanielRegaladoCardoso/cronica-jax-5m", "tokenizer.json")
ckpt   = hf_hub_download("DanielRegaladoCardoso/cronica-jax-5m", "ckpt_002000.pkl")

tok = Tokenizer.from_file(tok_path)
params, cfg, step = load_ckpt(ckpt)

stats = ("<STATS>\n"
         "liga: La Liga\n"
         "fecha: 2024-03-15\n"
         "local: Real Madrid\n"
         "visitante: Atletico Madrid\n"
         "resultado: 2-1\n"
         "goles:\n"
         "  - 23' Vinicius Junior (Real Madrid)\n"
         "  - 67' Antoine Griezmann (Atletico Madrid)\n"
         "  - 88' Jude Bellingham (Real Madrid)\n"
         "</STATS>")

text = generate_cronica(params, cfg, tok, stats,
                       style_label="rioplatense_apasionado",
                       temperature=0.85, top_k=50, top_p=0.9,
                       max_new_tokens=300)
print(text)

Style labels

label	region / flavor
`rioplatense_apasionado`	Argentina, emotive, "gol" lengthened
`rioplatense_tecnico`	Argentina, analytical
`rioplatense_literario`	Uruguay, evocative prose
`mexicano_irreverente`	Mexico, sarcastic
`mexicano_clasico`	Mexico, formal
`centroamericano_espn`	El Salvador / ESPN Latam, polished
`espanol_radiofonico`	Spain, radio broadcast style
`comentario_tecnico`	tactical analysis

Limitations and honest caveats

Small model. 5M params on 1.4M training tokens (~0.28 tokens/param, ~1000× over-parameterized vs. Chinchilla-optimal). Expect grammatical but unpolished output; do not expect the prose quality of a billion- parameter model.
Hallucinated context. Training crónicas were generated by gpt-4o-mini, which sometimes added manager names, stadium nicknames, or derby references not present in the <STATS>. Our small model can reproduce this hallucination tendency. Use with care if grounding matters.
Coverage bias. Training matches are weighted toward big European leagues + CONMEBOL competitions. Smaller leagues are under-represented.
Style overlap. Three of the eight styles (rioplatense_apasionado, rioplatense_literario, comentario_tecnico) have strong distinctive vocab (gooool, barrilete, transiciones); the other five share more neutral journalistic vocabulary.

License

Apache 2.0. See LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track

DanielRegaladoCardoso
/

cronica-jax-5m