--- language: - en - it license: other library_name: custom pipeline_tag: text-generation tags: - nanochat - gpt2-small - bilingual - english - italian - pretraining --- # gpt2small-en-it-nanochat-lr2e4-bs6-wsd-fastdecay-step10000 This repo stages the best saved checkpoint from the local NanoChat EN/IT GPT-2-small-like run `20260517_stable-config-recipe-v5-gpt2small-lr2e4-batchmaxpossible-bs6-wsd-fastdecay`. ## What this is - model family: GPT-2-small-like decoder-only LM - parameters: ~136M - languages: English + Italian - context length: 2500 - selected checkpoint: `step_10000.pt` - selection reason: lowest recorded validation loss among saved checkpoints in `best_validation.json` ## Best validation - step: 10000 - validation loss: 3.8945770748 - validation perplexity: 49.1352684243 - validation batches: 128 ## Important caveat This checkpoint is the best validation checkpoint **within this run family**. It is a useful intermediate bilingual pretraining artifact, not a polished factual assistant model. ## Training/data provenance - training config: `training_config.yaml` - tokenizer: `tokenizer.json` + `tokenizer_meta.json` - packed dataset root used by the run: `/mnt/apps/llm-nanochat/datasets/202605011052_fresh_50_50_score100_2500_sourcebalanced` - tokenizer root used by the run: `/mnt/apps/llm-nanochat/tokenizers/tok_202605011052_fresh_50_50_score100_32k_fromscratch` ## Included files - `step_10000.pt` - `step_10000.safetensors` - `step_10000.safetensors.json` - `training_config.yaml` - `tokenizer.json` - `tokenizer_meta.json` - `best_validation.json` - `eval_summary.json` - `probe_step10000_summary.json` - full run telemetry snapshots: `eval_metrics.jsonl`, `metrics.jsonl`, `probe_generations.jsonl` ## Probe reading at step 10000 The run includes probe telemetry, but the stored payload for this experiment is legacy/partial: the `probe_generations.jsonl` entries at step `10000` keep prompts and expected continuations, while generated text / target-rank fields are null. So this release does **not** make strong probe-quality claims from those rows. ## Usage This project uses a custom NanoChat inference/training stack. The easiest local UI in the source repo is the Chainlit checkpoint tester documented in the repo README. ## Limitations - factual recall is still limited - generations may become repetitive - the model was selected by validation loss inside this run family, not by broad downstream benchmark performance - dataset redistribution for the full training corpus may have separate licensing constraints; this repo contains model artifacts, not the raw/prepared training corpus