cronica-jax-5m

A 5.26M parameter decoder-only Transformer, hand-written in pure JAX (no Flax, no Equinox), that learns the data-to-text task: take a structured football match <STATS> block and produce a Spanish-language crónica in one of 8 regional commentator styles.

This model is a portfolio piece that demonstrates the craft of implementing a Transformer from first principles in JAX. It is intentionally small and is not a state-of-the-art language model.

Architecture

  • Decoder-only Transformer, 5.26M params
  • vocab=8000 (byte-level BPE), d_model=256, n_layers=4, n_heads=4, d_head=64, d_ff=704, max_seq_len=768
  • RoPE positional encoding, RMSNorm pre-norm, SwiGLU MLP, tied embeddings
  • Loss masking: cross-entropy applied only to tokens inside the <cronica>...</cronica> span; prompt tokens contribute zero loss.

All forward-pass primitives — attention, RoPE, RMSNorm, SwiGLU, the training step — are hand-implemented in pure JAX, as a deliberate exercise in understanding the math.

Training

  • 5,000 (stats, crónica) pairs from cronicas-d2t
  • Optax AdamW, peak_lr=3e-4, cosine schedule, warmup_steps=100, weight_decay=0.1
  • Gradient clipping global_norm=1.0
  • 2,000 steps, batch_size=8, seq_len=768
  • Trained on Apple M4 CPU in ~28 minutes (no GPU/TPU)
  • Loss: 8.74 (step 25) → 2.80 (step 2000), perplexity ≈ 16.4

How to use

import jax
from tokenizers import Tokenizer
from cronica.train import load_ckpt
from cronica.sample import generate_cronica
from huggingface_hub import hf_hub_download

tok_path = hf_hub_download("DanielRegaladoCardoso/cronica-jax-5m", "tokenizer.json")
ckpt   = hf_hub_download("DanielRegaladoCardoso/cronica-jax-5m", "ckpt_002000.pkl")

tok = Tokenizer.from_file(tok_path)
params, cfg, step = load_ckpt(ckpt)

stats = ("<STATS>\n"
         "liga: La Liga\n"
         "fecha: 2024-03-15\n"
         "local: Real Madrid\n"
         "visitante: Atletico Madrid\n"
         "resultado: 2-1\n"
         "goles:\n"
         "  - 23' Vinicius Junior (Real Madrid)\n"
         "  - 67' Antoine Griezmann (Atletico Madrid)\n"
         "  - 88' Jude Bellingham (Real Madrid)\n"
         "</STATS>")

text = generate_cronica(params, cfg, tok, stats,
                       style_label="rioplatense_apasionado",
                       temperature=0.85, top_k=50, top_p=0.9,
                       max_new_tokens=300)
print(text)

Style labels

label region / flavor
rioplatense_apasionado Argentina, emotive, "gol" lengthened
rioplatense_tecnico Argentina, analytical
rioplatense_literario Uruguay, evocative prose
mexicano_irreverente Mexico, sarcastic
mexicano_clasico Mexico, formal
centroamericano_espn El Salvador / ESPN Latam, polished
espanol_radiofonico Spain, radio broadcast style
comentario_tecnico tactical analysis

Limitations and honest caveats

  • Small model. 5M params on 1.4M training tokens (~0.28 tokens/param, ~1000× over-parameterized vs. Chinchilla-optimal). Expect grammatical but unpolished output; do not expect the prose quality of a billion- parameter model.
  • Hallucinated context. Training crónicas were generated by gpt-4o-mini, which sometimes added manager names, stadium nicknames, or derby references not present in the <STATS>. Our small model can reproduce this hallucination tendency. Use with care if grounding matters.
  • Coverage bias. Training matches are weighted toward big European leagues + CONMEBOL competitions. Smaller leagues are under-represented.
  • Style overlap. Three of the eight styles (rioplatense_apasionado, rioplatense_literario, comentario_tecnico) have strong distinctive vocab (gooool, barrilete, transiciones); the other five share more neutral journalistic vocabulary.

License

Apache 2.0. See LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using DanielRegaladoCardoso/cronica-jax-5m 1